Skip to content
ceaksan

Why Scroll Depth Tracking Gives You False Confidence

Scroll depth is the most popular engagement metric that tells you almost nothing. Academic research proves behavioral signals like velocity, dwell time, and direction changes are far more reliable. Here is why threshold-based tracking fails and what to use instead.

Mar 28, 2026 8 min read
TL;DR

Scroll depth percentages (25/50/75/100) tell you where users stopped but not how they got there. Two users at 75% can represent completely different behaviors: one read carefully, the other scrolled past in 2 seconds. Academic research shows scroll velocity and dwell time predict engagement with 77-92% accuracy. Threshold-based tools like Hotjar and Clarity cannot capture this.

Scroll depth is the most popular engagement metric that tells you almost nothing.

Every analytics setup includes some form of scroll tracking. GA4 fires at 90%. GTM lets you set custom thresholds. Hotjar and Clarity paint colorful heatmaps. And teams make content decisions based on these numbers every day.

The problem: two users at 75% scroll depth can represent completely opposite behaviors. One read every word. The other held the spacebar for 2 seconds. Both register identically in every threshold-based tool.

This is not a minor gap. It is a structural flaw in how content engagement is measured. And academic research has known this for years.

The Threshold Illusion

The standard approach to scroll tracking works like this: define percentage thresholds (25%, 50%, 75%, 100%), fire an event when the user crosses each one, and aggregate the data. The resulting funnel looks clean and actionable. “56% of users reach the bottom of the page.”

But this funnel hides more than it reveals:

What it tells you: How far users scrolled. What it cannot tell you: Whether they read, skimmed, or skipped past the content.

Consider a blog post with four sections. Your scroll funnel shows strong retention through 75% but a sharp drop at 100%. The standard interpretation: “The fourth section needs improvement.” But without behavioral data, you cannot know if:

  • Users engaged deeply with sections 1-3 and lost interest at section 4
  • Users skipped sections 2 and 3 entirely and stopped scrolling when they found what they needed in section 1
  • Users scrolled to 75% in 3 seconds looking for a specific keyword and bounced

Each scenario demands a completely different response. Threshold tracking gives you one number for all three.

What Academic Research Actually Shows

This is not a theoretical concern. Controlled studies have measured exactly how much information scroll behavior contains.

Scroll Velocity Predicts Reading Behavior

A joint study by Google Research, Cambridge, and MIT recorded scroll interactions from 518 participants reading 60 articles at varying difficulty levels 1. Using only scroll-derived features (velocity, acceleration, pause frequency, maximum reading speed), they achieved:

  • f-score 0.77 for predicting text readability from scrolling alone
  • f-score 0.96 when combined with vocabulary features

The most predictive features were total read time and maximum reading speeds. Not scroll depth. Not percentage thresholds. The speed and rhythm of scrolling.

Velocity Stability Encodes Interest

The I3 (Intelligent Interest Inference) system analyzed scroll velocity stability and velocity sequences to predict user interest 2. Using a combination of Naive Bayes for coarse classification and deep learning for fine-grained rating:

  • 92.4% accuracy in interest inference from scroll interactions alone

This means: the consistency of your scroll speed, whether you speed up or slow down, and how your velocity changes over time contain enough information to predict whether you are interested in the content. No eye tracking required. No surveys. Just scroll events.

Dwell Time and Scroll Depth Are Independent

A critical finding from Toyohashi University of Technology: reaching the bottom of a page does not mean the content was read 3. Dwell time and scroll depth measure different things and must be treated as independent signals. A user who scrolls to 100% in 5 seconds has a very different engagement pattern than one who takes 3 minutes to reach the same point.

High Dwell Time Is Ambiguous

Mobile app engagement research showed that high dwell time does not always indicate engagement 4. Users who are confused, lost, or distracted also produce high dwell times. Without additional signals (velocity patterns, direction changes, interaction events), dwell time alone cannot distinguish productive engagement from idle confusion.

Background Tabs Corrupt Everything

Passive browsing research demonstrated that background tabs significantly inflate dwell time and engagement metrics 5. If your scroll tracker does not pause when the tab loses focus, every metric it produces includes noise from users who switched tabs and came back minutes later.

Hotjar and Clarity: Structural Limits

Hotjar and Microsoft Clarity are the most widely used scroll visualization tools. They serve a purpose: quick visual overview of how far users scroll. But they have structural limitations that no feature update can fix.

What They Do

Both tools capture scroll depth at predefined thresholds and render the data as a color-gradient heatmap. The output is a visual representation showing what percentage of users reached each vertical position on the page.

What They Cannot Do

CapabilityHotjar/ClarityBehavioral Approach
Scroll velocity analysisNoPer-zone velocity tracking
Acceleration patternsNoVelocity variance, smoothness
Zone-specific dwell timeNoMillisecond precision per threshold
Engagement classificationNoengaged / scanned / skipped
Tab visibility handlingNoPause/resume on visibility change
Dynamic content adaptationBreaks with lazy loadHeight versioning system
Structured data outputVisual heatmap onlydataLayer events for any pipeline
Bot vs. human distinctionNo behavioral signalsVelocity variance, micro-movements 6

The core issue is not that these tools are bad at what they do. They are good at showing where people scrolled. The issue is that where is not enough. How people scrolled contains far more actionable information.

Screenshot-Based Rendering

Both tools rely on screenshot-based DOM rendering. This means:

  • Canvas and WebGL content is invisible
  • overflow-x: hidden on html and body breaks detection
  • Shadow DOM elements are missed
  • Parallax scrolling produces inaccurate positioning
  • Dynamic content (lazy load, accordion, infinite scroll) shifts the heatmap

These are not edge cases. Modern web pages routinely use these patterns.

What Behavioral Signals Reveal

When you track scroll behavior beyond thresholds, patterns emerge that percentages cannot show.

Six Signals from Scroll Movement

Each scroll event produces measurable properties:

  1. Velocity variance: Standard deviation of scroll speed. Humans vary naturally. Automated scrolling is uniform. High variance often indicates active reading with pauses.

  2. Event frequency: Scroll events per second. Varies significantly by input device: trackpads and touch screens produce high-frequency, continuous events while mouse wheels produce lower-frequency, discrete events. This signal also distinguishes input devices.

  3. Micro-movement ratio: Proportion of scroll movements under 5 pixels. Reading produces many tiny adjustments. Scanning produces larger, consistent movements.

  4. Direction changes: How often the user reverses scroll direction. Re-reading a paragraph, checking a heading, or comparing sections all produce direction changes. Fast scanning does not.

  5. Acceleration smoothness: How gradually velocity changes. Trackpads produce smooth acceleration curves. Mouse wheels produce sharp steps. Within the same device, engaged reading shows gradual deceleration near interesting content.

  6. Scroll uniformity: Coefficient of variation in scroll distances. Uniform distances suggest mechanical scrolling (or boredom). Variable distances suggest active engagement with different content sections.

Temporal Patterns

Aggregate signals miss an important dimension: how behavior changes over time. Research on predicting engagement from interaction data found that temporal metrics (time between events, pause durations) are the strongest predictors of engagement levels 7. A user who scrolls fast initially then slows down has been “caught” by the content. A user who starts slow and accelerates is losing interest. Furthermore, the sequence of implicit actions encodes more intent information than aggregated statistics alone 8. These temporal patterns (decelerating, accelerating, steady, erratic) add a layer of intent classification that no single metric captures.

The Classification That Matters

When velocity, dwell time, and behavioral signals are combined, three engagement types become distinguishable:

  • Engaged: Low velocity, high dwell time, micro-movements present, direction changes. The user is reading.
  • Scanned: Moderate velocity, moderate dwell. The user is skimming headings and highlighted content.
  • Skipped: High velocity, low dwell. The user passed through without stopping.

A fourth pattern emerges from the research on dwell time ambiguity: Confused. High dwell time but no active scrolling behavior (low velocity variance, few direction changes, few micro-movements). The user is in the zone but not engaging.

The Market Gap

Here is what is striking: academic research has proven since at least 2019 that scroll velocity patterns predict engagement with 77-92% accuracy. Yet no commercial tool performs real-time behavioral scroll classification.

  • Hotjar/Clarity: Threshold heatmaps
  • Adelaide: Ad-tech attention scoring (impression level, not content level)
  • Amplitude: Connects scroll to funnels (no behavioral classification)
  • Behavioral biometrics (TypingDNA, etc.): Use scroll for identity, not engagement

The tools either visualize thresholds or score attention at the advertising impression level. Content-level behavioral classification, the thing that would tell you whether section 3 of your blog post is being read or skipped, does not exist as a product.

Practical Impact for CRO

Why does this matter beyond academic interest? Because the optimization response is completely different depending on the engagement type.

Zone is being skipped: The content is not relevant to what users are looking for. Rewrite the section, move it, or remove it. More traffic will not help.

Zone is being scanned: Users see the content but do not find it worth reading deeply. Improve formatting: better headings, bullet points, visual breaks. The information may be there but not accessible.

Zone is being engaged: This is working. Do not change it. Use it as a reference for what good content looks like on your site.

Zone shows confusion: High dwell but no activity. The content may be unclear, the layout confusing, or the user lost. Simplify.

Threshold tracking gives you one response: “more people should scroll further.” Behavioral classification gives you four different, specific responses based on what is actually happening.

Real Data: First 72 Hours

To test the theory, I deployed ScrollTracker on ceaksan.com production. Loaded as a GTM Custom HTML tag, sending events to GA4 and exporting to BigQuery. The first 72 hours (March 27-29, 2026) confirmed every claim in this article:

  • 31% of users who reached 100% were classified as skipped. They scrolled to the bottom but did not read the content. Threshold tracking would count this as “full engagement.”
  • Skipped users scrolled 68x faster than engaged users (72.89 vs 1.06 px/ms).
  • Dwell time difference was 195x: engaged users spent an average of 31.6 seconds per zone while skipped users spent 162 milliseconds.
  • Direction changes differed 3x (4.5 vs 1.45): reading users go back and re-read, skipping users pass through without stopping.
  • The homepage heatmap showed a clear pattern: top half (0-50%) was 67-80% engaged, bottom half (50-100%) was 60% skipped.

For the full analysis with charts and page-level heatmaps, see the case study section.

Where to Go from Here

If you want to implement behavioral scroll tracking, I built ScrollTracker as an open-source library that implements content-scoped measurement with dwell time, velocity classification, and the engaged/scanned/skipped distinction. For the foundational concepts of scroll depth tracking in GTM, see Google Tag Manager Scroll Depth Actions.

The code is a starting point. The real value comes from collecting data, analyzing the signal distributions for your specific content types, and calibrating thresholds against actual conversion data.

Footnotes

  1. Predicting Text Readability from Scrolling Interactions. arXiv:2105.06354, 2021.
  2. I3: Intelligent Interest Inference. ACM IMWUT, 2019.
  3. Analysis of User Dwell Time on Non-News Pages. arXiv:1903.00213, 2019.
  4. What and How long: Prediction of Mobile App Engagement. arXiv:2106.01490, 2021.
  5. Analysing Parallel and Passive Web Browsing Behavior. arXiv:1402.05255, 2014.
  6. BeCAPTCHA-Mouse: Synthetic Mouse Trajectories and Improved Bot Detection. arXiv:2005.00890, 2021.
  7. Using Interaction Data to Predict Engagement with Interactive Media. arXiv:2108.01949, 2021.
  8. From Implicit to Explicit Feedback: A Deep Neural Network for Modeling Sequential Behaviours. arXiv:2107.12325, 2021.
Key Takeaways
  • 01 The same scroll depth percentage can represent completely opposite user behaviors: careful reading vs. fast skipping
  • 02 Academic research (Google Research, I3) achieves 77-92% accuracy in engagement detection using scroll velocity and dwell time alone
  • 03 Hotjar and Clarity are structurally limited to threshold-based heatmaps with no velocity, acceleration, or dwell analysis
  • 04 Behavioral scroll signals (velocity variance, direction changes, micro-movements) encode reading patterns that percentages cannot
  • 05 No commercial tool currently performs real-time behavioral scroll classification, despite proven academic foundations
  • 06 Content zones classified as skipped need rewriting, not more impressions. This distinction is impossible with threshold tracking
Frequently Asked Questions (FAQ)
+ What is wrong with tracking scroll depth at 25%, 50%, 75%, 100%?

These thresholds tell you that a user reached a point on the page, but not how they got there. A user who carefully read to 75% and a user who scrolled past 75% in 2 seconds both register the same metric. Without velocity and dwell time data, you cannot distinguish engagement quality.

+ Can Hotjar or Clarity detect if a user actually read the content?

No. Both tools use threshold-based heatmaps that show where users scrolled to, not how they scrolled. Neither captures scroll velocity, acceleration patterns, or zone-specific dwell time. They show visual summaries without analytical parameters for behavioral classification.

+ Is there academic evidence that scroll behavior predicts engagement?

Yes. Google Research demonstrated that scroll features alone predict reading difficulty with f-score 0.77. The I3 study achieved 92.4% accuracy in interest inference from scroll velocity patterns. Multiple studies confirm that velocity variance, pause frequency, and direction changes encode reading behavior.

+ What are behavioral scroll signals?

Measurable properties of scroll movement beyond simple depth: velocity variance (how consistent the speed is), event frequency (scroll events per second), micro-movement ratio (small adjustments indicating reading), direction changes (re-reading patterns), and acceleration smoothness (input device characteristics).

+ How does behavioral classification help CRO?

When you know a content zone is being skipped rather than scanned or engaged, you can take specific action: rewrite that section, restructure the page, or move critical information. Threshold tracking only tells you people reached that point, not whether the content worked.