Most Tracking Data Loss Starts at the URL
Direct traffic is inflated, GA4 and Meta Ads Manager refuse to reconcile, the Shopify Custom Pixel keeps reporting a sandbox URL instead of the real product page. These symptoms share one root cause: the URL is not being read correctly for tracking.
A URL is not just a string in the browser address bar; it is a structure of scheme, host, path, query and fragment, and each part produces different tracking outcomes. Which part reaches the server, which parameter the browser strips, whether a CNAME subdomain actually guarantees first-party context, how the sandbox iframe hides the real URL: this article goes through each in turn, with field evidence.
URL Anatomy
https://shop.example.com/products/widget?utm_source=newsletter&gclid=abc#size-l
└─┬─┘ └──────┬───────┘└─────┬─────┘└──────────────┬───────────┘└──┬──┘
│ │ │ │ │
scheme host path query fragment
| Part | Role in Tracking |
|---|---|
scheme | https mandatory (cookie Secure flag, ITP compliance, HTTP/2 multiplexing) |
host | Determines eTLD+1, the basis of cookie scope and first-party context |
path | Gateway endpoint identity, routing decision |
query | The attribution backbone: UTMs, click IDs, custom parameters |
fragment | Never reaches the server. Hash-only SPA navigation loses every server-side pageview |
Origin: the scheme + host + port triple defines the “origin” that anchors browser security. Same-origin policy, CORS, postMessage targetOrigin checks, cookie scope, Content Security Policy rules: all of them work off this triple.
Query String: The Attribution Backbone and the Leakage Surface
The query string is the richest tracking surface and the most fragile one. Every platform expects a different parameter set, every browser strips a different list, and personal data leaks usually happen through query.
Standard Parameters
- UTM standard:
utm_source,utm_medium,utm_campaign,utm_content,utm_term. GA4 source/medium resolution depends entirely on these. - Platform click IDs:
gclid(Google Ads),wbraid/gbraid(Google iOS app, transitional parameters1),fbclid(Meta),ttclid(TikTok),msclkid(Microsoft),dclid(DoubleClick),yclid(Yahoo). - Consent Mode v2 signals: values like
gcs=G100andgcs=G111carry the consent state to the server, and the server-side container uses them to control tag firing. If consent never reaches the server you risk legal exposure and platform-side penalties (Google: targeting degradation; Meta CAPI: deduplication failures).
Encoding Pitfalls
Different platforms encode non-ASCII characters differently. GA4 expects UTF-8 raw, some conversion API endpoints expect percent-encoded UTF-8, email platforms can double-encode when their templates pass through several systems. Practical rule: keep UTM values ASCII and lowercase, and put the human-readable campaign title in a separate field.
%20 and + are different space encodings. & or = inside a campaign name will break the query string parser unless encoded. Double encoding (%2520) is usually a copy-paste artifact, and it is the most common breakage in hand-built link templates.
PII Leakage
When ?email=user@example.com, ?phone=..., ?order_id=... end up in URLs, they can leak into:
- Browser history
- Same-origin server access logs
- CDN edge logs
- Error tracking services (Sentry and others)
- The GA4
page_locationdimension - Third-party scripts running on the page (heatmaps, embedded ads)
Modern browsers default to Referrer-Policy: strict-origin-when-cross-origin, so path and query no longer leak in cross-origin Referer headers. But Referrer-Policy: unsafe-url or any same-origin flow preserves the full URL. Putting personal data in a URL is a direct leak under GDPR and similar regimes. Carry personal data in the POST body or as a hashed identifier, never in the query string. At the gateway, filter query parameters through an allow-list before ingestion.
How Browsers Filter URLs
Privacy-first browsers strip known tracking parameters automatically, but the scope varies a lot, and it has changed enough over the years that most panic headlines are wrong. The actual 2026 picture:
| Browser | Default Normal Browsing | Stripping Active When | Parameters Stripped |
|---|---|---|---|
| Safari | No stripping | Private Browsing, links opened from Mail/Messages, manual ATFP enabled2 3 | gclid, fbclid, msclkid, ttclid, dclid, yclid (UTMs preserved) |
| Firefox | No stripping | ETP Strict (must be enabled manually) | fbclid, mc_eid, oly_enc_id, __s, vero_id, _hsenc, mkt_tok (gclid not on the list) |
| Brave | Default on | Default Shields | Click IDs and some mailer parameters |
| DuckDuckGo | Default on | Browser app and extension | Click IDs and a tracking parameter set |
| Chrome | No stripping | No automatic stripping | - |
Safari 26 Advanced Fingerprinting Protection (AFP): introduced in September 2025, Safari 26 turns AFP on by default in every browsing mode4. AFP does three things: (1) restricts tracker-classified scripts from reaching high-entropy device APIs, (2) prevents those scripts from writing long-lived storage, (3) blocks tracker scripts from reading the page’s URL query string and document.referrer through the browser APIs.
The critical nuance: AFP restricts in-page scripts; the full URL still reaches the server via the HTTP request5. A server-side gateway receives the pageview URL intact, including gclid and UTMs. AFP is a browser-level decision that directly increases the value of server-side tracking: the client-side world keeps narrowing while the server-side channel stays open.
About the iOS 26 Panic Headlines
When the iOS 26 release notes shipped in September 20256, much of the marketing press ran with “Apple kills UTMs.” Side-by-side iOS 18 vs iOS 26 tests by Triple Whale and Northbeam7 showed the panic was misplaced:
- Standard UTMs (
utm_source,utm_medium,utm_campaign,utm_content,utm_term) are not stripped - Default Safari normal browsing does not strip any parameter
- The original ad click itself (Facebook app → product page) keeps its click ID
- Stripping still applies in Private Browsing and links opened from Mail and Messages (same behavior as iOS 18)
Apple is testing extending full Link Tracking Protection to every browsing mode in Safari Technology Preview8. It could ship in a near-term release; the way to avoid panic when it does is to prepare the gateway infrastructure today.
Custom Parameter Transformation Pattern
Most analytics platforms, GA4 included, hardcode attribution to UTM names. Stape’s recommended pattern to work around this:
- Send ad traffic with your own prefix instead of UTMs:
?st_src=newsletter&st_mdm=email&st_cmp=spring269 - In the server-side container, read the incoming
page_location - Transform
st_*parameters back toutm_*(Stape’s Query Replacer template implements this off the shelf10) - The augmented event sent to the GA4 tag now carries proper UTMs
Upside: custom names are not on any browser’s stripping list, so they survive. Trade-off: more per-platform configuration, and you still need disciplined UTM conventions on top.
First-Party Context and Custom Domains
As ITP, ETP and adblock lists keep dismantling third-party context, first-party context is the only durable foundation for tracking. The key concept here is eTLD+1.
eTLD+1 and Cookie Scope
eTLD+1 (effective Top-Level Domain plus one label) is the boundary the browser uses to decide cookie ownership. gw.dnomia.app (subdomain) and dnomia.app (apex) share the same eTLD+1; cookies are shared. dnomia-tracking.com is a different eTLD+1, so it is not first-party and Safari, Firefox and Brave will delete its cookies by default.
The 2026 picture for third-party cookies:
| Browser | Third-Party Cookies | Note |
|---|---|---|
| Safari | Default block (2020+) | ITP |
| Firefox | Default block (2019+) | ETP Strict |
| Brave | Default block | Shields |
| Chrome | Still supported | Google walked back forced deprecation in July 2024 and dropped the user-choice prompt in April 2025; Privacy Sandbox APIs run as a parallel track |
CNAME Subdomain vs Path-Based Proxy
| Property | CNAME Subdomain (gw.example.com → vendor) | Path-Based Proxy (example.com/_track → vendor) |
|---|---|---|
| First-party cookie scope | Full (eTLD+1 matches) | Full (same domain) |
| SSL management | Vendor handles it | Your own CDN/worker layer required |
| CORS | Cross-origin, preflight required | Same-origin, no preflight |
| CNAME cloaking risk | Yes (Safari ITP 2.3+ can detect it, 7-day cap kicks in) | None |
| Operational load | Low (vendor runs the infrastructure) | High (you run your own worker/edge) |
| Vendor lock-in | Lower | Switching vendors means rewriting the proxy |
CNAME cloaking caveat: starting with Safari ITP 2.3, if Safari’s classifier flags a CNAME-based tracking subdomain, even server-side Set-Cookie headers fall under the 7-day cap. CNAME first-party is not immunity, it is relative resilience.
Storage Cap and CHIPS
Safari ITP 2.3 applies a 7-day cap to all script-writable storage: localStorage, IndexedDB, service worker registrations. Trying to bypass ITP by falling back from cookies to localStorage does not work for this reason.
CHIPS (Cookies Having Independent Partitioned State): Chrome’s Partitioned cookie attribute isolates third-party cookies per top-level site inside embed/iframe contexts. Relevant for stacks that rely on cross-origin embeds rather than subdomain CNAMEs.
Picking a Gateway Domain
Be careful with the subdomain name. Adblock filter lists block analytics., tracking., pixel. and similar. Use a generic prefix: gw., c., t., m.. On the DNS side, keep the CNAME chain under three hops; resolution latency hurts pageview measurement.
Gateway Design: Path vs Query, Limits, CORS
The dominant implementations are GTM Server-Side Container (Google’s official server container, sGTM), Cloudflare Zaraz (path-based, edge), and Stape (CNAME-based managed sGTM hosting).
Splitting Responsibility Between Path and Query
- Path-based gateway:
/cdn-cgi/zaraz/track?event=purchase— easy CDN-level routing - Subdomain-based gateway:
gw.example.com/track— independent worker/proxy
Keep event-specific data in the query and endpoint identity in the path. Mixing them breaks the cache key and makes log analysis harder.
URL Length
The “safe legacy limit” is 2 KB (the IE 2083 legacy plus a margin for SEO and sharing). Modern browser ceilings are much higher: Chrome 32 KB–2 MB, Firefox ~64 KB, Safari ~80 KB. The real constraint is server/CDN/proxy configuration: Cloudflare default header 8 KB, Nginx default 4–8 KB.
For large payloads use fetch(url, { method: 'POST', keepalive: true }). sendBeacon is still standard (WHATWG living spec), but keepalive is preferred because it supports POST/PUT and custom headers.
CORS
For a cross-origin gateway endpoint:
Access-Control-Allow-Originmust be set correctly- Preflight (
OPTIONS) must be answered - For credentialled requests,
Access-Control-Allow-Credentials: trueplus a specific origin (wildcards are rejected)
A path-based proxy avoids all of this: same origin, no preflight. For solo setups this is one less operational headache.
Fragment and SPA: Data That Never Reaches the Server
The browser never sends anything after # to the server. Server-side logs, CDN analytics and traditional pageview tracking see every SPA route as the same pageview. Use pushState and path-based routing, not hash-based.
Attribution scenario: ?utm_source=x#campaign-detail keeps the query server-side and loses the fragment. Hashtag campaigns should put the hashtag after the query so the link does not break.
Mini Case: Shopify Custom Pixel Sandbox
The Shopify Custom Pixel (Web Pixels API) runs inside a sandboxed iframe. URL access is not what you would expect.
| URL Access Method | Returns | Useful for Tracking? |
|---|---|---|
window.location.href | Sandbox iframe URL | No (fake) |
event.context.document.location.href | Contains the sandbox URL | No (fake) |
event.context.window.location.href (in page_viewed) | Real storefront URL | Yes (the only reliable source) |
Cookie and storage isolation: window.localStorage and document.cookie work inside the sandbox iframe but do not touch the top frame. To access storefront cookies use browser.cookie and browser.localStorage (async APIs).
URL implication: when the sandbox URL replaces the real URL, every GA4 pageview goes to the fake URL. Turn off GA4 auto page_view and send pageviews manually using event.context.window.location.href.
Primary domain trap: if Settings → Domains has the primary domain set to a host that redirects from another domain, the page navigates before the pixel finishes and sandbox tracking breaks. Domain alignment is mandatory: Shopify primary = GA4 Web Stream = GTM container domain.
End-to-End Scenario: Shopify + Cloudflare + GA4 + Meta CAPI
A Shopify store sitting behind Cloudflare, sending data through the Custom Pixel to GA4 and Meta CAPI.
- The customer lands on
https://shop.example.com/products/widget?utm_source=newsletter&fbclid=.... - Inside
page_viewed, the Custom Pixel sandbox readsevent.context.window.location.hrefand gets the real URL. - UTMs and
fbclidare filtered through an allow-list (no personal data).fbclidis wrapped into the format Meta expects:fb.1.<unix_timestamp_ms>.<fbclid>. A rawfbclidis not enough. - The pixel posts to the path-based gateway:
fetch('https://shop.example.com/cdn-cgi/zaraz/track', { method: 'POST', keepalive: true, body: JSON.stringify(payload) }). - Same origin, so no CORS preflight. Cookies are fully first-party (eTLD+1 matches).
- The Cloudflare Worker duplicates the payload and fans out to GA4 Measurement Protocol and the Meta CAPI endpoint. The
gcsconsent flag rides along on the GA4 request. - The
fbccookie joins_fbpand lands in theuser_datablock of the CAPI event.
Every step depends on reading the URL anatomy correctly: wrong host means losing first-party context; PII in the query becomes a log leak; UTMs in the fragment disappear; a sandbox URL produces fake pageviews; a raw fbclid breaks Meta deduplication.
URI, URL, URN: Short Reference
A quick note for readers who came here looking for the old article:
- URI: the broadest umbrella; any string that identifies a resource.
- URL: the “locator” variant of URI; tells you how to reach a resource (
https://...,ftp://...). - URN: the “name” variant of URI; a persistent identifier independent of location (
urn:isbn:...,urn:uuid:...).
In practice, “URL” is enough on the web. The URI/URN distinction shows up in RFC 398611 and W3C standards, and matters when you design resource management or metadata systems. Day-to-day tracking work is entirely about reading URL parts; URNs rarely come up. The canonical reference for URI schemes is the IANA registry12.
Summary: Which URL Part Affects What
| URL Part | Tracking Effect |
|---|---|
scheme (https) | Cookie Secure flag, HTTP/2, ITP compliance |
host | eTLD+1, first-party scope, CNAME cloaking risk |
path | Gateway endpoint identity, CORS behavior |
query | Attribution backbone, browser stripping surface, PII leakage vector |
fragment (#) | Never reaches the server, source of SPA pageview loss |
Gateway design starts with a deliberate decision on each of these parts. Reading the URL as a five-part data structure rather than as a single browser-bar string is the foundation of clean tracking data.
Footnotes
-
Update to iOS 14 campaign measurement. Google Ads Help
wbraidandgbraidwere introduced in March 2021 for iOS app campaign measurement; they carry campaign-level data without binding to an individual user. ↩ - Private Browsing 2.0. WebKit Blog On Apple’s Link Tracking Protection scope: “campaign parameter that’s only used for campaign attribution, as opposed to click or user-level tracking. Safari allows such parameters to pass through.” ↩
- iOS 26 UTM tracking tests. Triple Whale Side-by-side iOS 18 vs iOS 26 testing: “Safari still preserves common tracking tags (like utm_source, utm_medium, and utm_campaign).” ↩
- WebKit Features in Safari 26.0. WebKit Blog Safari 26 turns Advanced Fingerprinting Protection on by default. ↩
- Safari 26 tracking changes explained. Taggrs January 2026 field validation: AFP blocks tracker scripts from reading URL and referrer, but the full URL still reaches the server. ↩
- iOS & iPadOS 26 Release Notes. Apple Developer Apple’s official iOS 26 release notes. ↩
- iOS 26 Won’t Kill Your UTMs. Northbeam Controlled iOS 18 vs iOS 26 test results; click IDs are stripped in Messages/Mail/Private mode, UTMs survive across all modes. ↩
- Safari vs UTM: Do We Really Need to Panic? Alex Ignatenko Safari Technology Preview tests indicating full LTP could expand to all modes in a future release. ↩
-
How to avoid UTM parameters removal. Stape The custom parameter transformation pattern: send ads with
st_src/st_mdm/st_cmp, rewrite toutm_*inside the server-side container. ↩ - Query Replacer Variable. Stape on GitHub Stape’s open-source sGTM Query Replacer template. ↩
- RFC 3986: Uniform Resource Identifier (URI): Generic Syntax. IETF The canonical specification of URI syntax. ↩
- Uniform Resource Identifier (URI) Schemes. IANA IANA’s URI scheme registry. ↩
- 01 Fragments (
#) never reach the server; SPAs that use hash routing make pageviews invisible in server-side logs - 02 Safari Link Tracking Protection only strips click IDs (
gclid,fbclid,msclkid,ttclid,dclid,yclid); UTM parameters survive in default normal browsing - 03 Safari 26 AFP blocks tracker-classified scripts from reading URL query and
document.referrerat the browser API layer, but the full URL still reaches the server - 04 Inside the Custom Pixel sandbox
window.location.hrefreturns a fake URL; the real storefront URL isevent.context.window.location.href - 05 CNAME first-party subdomains are not bulletproof; when cloaking is detected, even server-side cookies fall to a 7-day cap
+ Does Safari strip UTM parameters?
No. Safari Link Tracking Protection only strips click IDs (gclid, fbclid, msclkid, ttclid, dclid, yclid). UTM parameters survive in default normal browsing. iOS 26 behaves the same as iOS 18 here, despite the panic headlines. Stripping only kicks in for Private Browsing, links opened from Mail and Messages, or when the user manually enables Advanced Tracking and Fingerprinting Protection.
+ Why is the URL fragment (#) a problem for server-side tracking?
Browsers never send anything after # in the HTTP request. If a SPA uses hash routing (#/page1, #/page2), server-side logs, CDN analytics and classic pageview tracking see every route as the same pageview. Use pushState with path-based routing instead. Attribution parameters belong in the query, not the fragment.
+ Should my server-side gateway be path-based or subdomain-based?
Path-based proxies (e.g. example.com/_track) live on the same origin: no CORS preflight, no CNAME cloaking detection risk. Subdomain-based gateways (e.g. gw.example.com) push operational load onto the vendor but are exposed to CNAME cloaking classifiers. Cloudflare Zaraz is path-based, Stape is CNAME-based, GTM Server-Side supports both. For solo teams the path-based route usually costs less to run.
+ How do I read the real URL inside a Shopify Custom Pixel?
Because the Custom Pixel runs in a sandbox iframe, window.location.href and event.context.document.location.href both return a fake sandbox URL. The only reliable source for the real storefront URL is event.context.window.location.href inside the page_viewed event payload. Capture it there and propagate it to other events as a custom dimension. Turn off GA4 auto page_view and send pageviews manually with the captured URL.
+ What is the custom parameter transformation pattern?
Most analytics platforms, GA4 included, hardcode attribution to the UTM names. To avoid browser stripping you send traffic with your own prefix instead of UTMs (e.g. st_src, st_mdm, st_cmp), then transform them back to utm_* inside the server-side container by reading page_location and rewriting it. Custom names are not on any browser's stripping list, so they survive. Stape's Query Replacer template implements this officially.