Reading URLs for Tracking: Gateway Design, ITP and Pixel Sandbox

Q: Why is the URL fragment (#) a problem for server-side tracking?

Browsers never send anything after `#` in the HTTP request. If a SPA uses hash routing (`#/page1`, `#/page2`), server-side logs, CDN analytics and classic pageview tracking see every route as the same pageview. Use `pushState` with path-based routing instead. Attribution parameters belong in the query, not the fragment.

Q: Should my server-side gateway be path-based or subdomain-based?

Path-based proxies (e.g. `example.com/_track`) live on the same origin: no CORS preflight, no CNAME cloaking detection risk. Subdomain-based gateways (e.g. `gw.example.com`) push operational load onto the vendor but are exposed to CNAME cloaking classifiers. Cloudflare Zaraz is path-based, Stape is CNAME-based, GTM Server-Side supports both. For solo teams the path-based route usually costs less to run.

Q: How do I read the real URL inside a Shopify Custom Pixel?

Because the Custom Pixel runs in a sandbox iframe, `window.location.href` and `event.context.document.location.href` both return a fake sandbox URL. The only reliable source for the real storefront URL is `event.context.window.location.href` inside the `page_viewed` event payload. Capture it there and propagate it to other events as a custom dimension. Turn off GA4 auto page_view and send pageviews manually with the captured URL.

TL;DR

A large share of tracking data loss comes from misreading the URL. Fragments never reach the server, query parameters get stripped differently by each browser, sandbox iframes return fake URLs from window.location.href, and once Safari classifies a domain as CNAME cloaking even server-side cookies fall to 7 days. Developers who know URL anatomy design gateway endpoints correctly, retain click IDs, and read the right context inside the Shopify Custom Pixel.

Most Tracking Data Loss Starts at the URL

Direct traffic is inflated, GA4 and Meta Ads Manager refuse to reconcile, the Shopify Custom Pixel keeps reporting a sandbox URL instead of the real product page. These symptoms share one root cause: the URL is not being read correctly for tracking.

A URL is not just a string in the browser address bar; it is a structure of scheme, host, path, query and fragment, and each part produces different tracking outcomes. Which part reaches the server, which parameter the browser strips, whether a CNAME subdomain actually guarantees first-party context, how the sandbox iframe hides the real URL: this article goes through each in turn, with field evidence.

URL Anatomy

https://shop.example.com/products/widget?utm_source=newsletter&gclid=abc#size-l
└─┬─┘   └──────┬───────┘└─────┬─────┘└──────────────┬───────────┘└──┬──┘
  │            │              │                     │                │
scheme       host           path                 query           fragment

Part	Role in Tracking
`scheme`	`https` mandatory (cookie `Secure` flag, ITP compliance, HTTP/2 multiplexing)
`host`	Determines eTLD+1, the basis of cookie scope and first-party context
`path`	Gateway endpoint identity, routing decision
`query`	The attribution backbone: UTMs, click IDs, custom parameters
`fragment`	Never reaches the server. Hash-only SPA navigation loses every server-side pageview

Origin: the scheme + host + port triple defines the “origin” that anchors browser security. Same-origin policy, CORS, postMessage targetOrigin checks, cookie scope, Content Security Policy rules: all of them work off this triple.

Query String: The Attribution Backbone and the Leakage Surface

The query string is the richest tracking surface and the most fragile one. Every platform expects a different parameter set, every browser strips a different list, and personal data leaks usually happen through query.

Standard Parameters

UTM standard: utm_source, utm_medium, utm_campaign, utm_content, utm_term. GA4 source/medium resolution depends entirely on these.
Platform click IDs: gclid (Google Ads), wbraid/gbraid (Google iOS app, transitional parameters¹), fbclid (Meta), ttclid (TikTok), msclkid (Microsoft), dclid (DoubleClick), yclid (Yahoo).
Consent Mode v2 signals: values like gcs=G100 and gcs=G111 carry the consent state to the server, and the server-side container uses them to control tag firing. If consent never reaches the server you risk legal exposure and platform-side penalties (Google: targeting degradation; Meta CAPI: deduplication failures).

Encoding Pitfalls

Different platforms encode non-ASCII characters differently. GA4 expects UTF-8 raw, some conversion API endpoints expect percent-encoded UTF-8, email platforms can double-encode when their templates pass through several systems. Practical rule: keep UTM values ASCII and lowercase, and put the human-readable campaign title in a separate field.

%20 and + are different space encodings. & or = inside a campaign name will break the query string parser unless encoded. Double encoding (%2520) is usually a copy-paste artifact, and it is the most common breakage in hand-built link templates.

PII Leakage

When ?email=user@example.com, ?phone=..., ?order_id=... end up in URLs, they can leak into:

Browser history
Same-origin server access logs
CDN edge logs
Error tracking services (Sentry and others)
The GA4 page_location dimension
Third-party scripts running on the page (heatmaps, embedded ads)

Modern browsers default to Referrer-Policy: strict-origin-when-cross-origin, so path and query no longer leak in cross-origin Referer headers. But Referrer-Policy: unsafe-url or any same-origin flow preserves the full URL. Putting personal data in a URL is a direct leak under GDPR and similar regimes. Carry personal data in the POST body or as a hashed identifier, never in the query string. At the gateway, filter query parameters through an allow-list before ingestion.

How Browsers Filter URLs

Privacy-first browsers strip known tracking parameters automatically, but the scope varies a lot, and it has changed enough over the years that most panic headlines are wrong. The actual 2026 picture:

Browser	Default Normal Browsing	Stripping Active When	Parameters Stripped
Safari	No stripping	Private Browsing, links opened from Mail/Messages, manual ATFP enabled² ³	`gclid`, `fbclid`, `msclkid`, `ttclid`, `dclid`, `yclid` (UTMs preserved)
Firefox	No stripping	ETP Strict (must be enabled manually)	`fbclid`, `mc_eid`, `oly_enc_id`, `__s`, `vero_id`, `_hsenc`, `mkt_tok` (`gclid` not on the list)
Brave	Default on	Default Shields	Click IDs and some mailer parameters
DuckDuckGo	Default on	Browser app and extension	Click IDs and a tracking parameter set
Chrome	No stripping	No automatic stripping	-

Safari 26 Advanced Fingerprinting Protection (AFP): introduced in September 2025, Safari 26 turns AFP on by default in every browsing mode⁴. AFP does three things: (1) restricts tracker-classified scripts from reaching high-entropy device APIs, (2) prevents those scripts from writing long-lived storage, (3) blocks tracker scripts from reading the page’s URL query string and document.referrer through the browser APIs.

The critical nuance: AFP restricts in-page scripts; the full URL still reaches the server via the HTTP request⁵. A server-side gateway receives the pageview URL intact, including gclid and UTMs. AFP is a browser-level decision that directly increases the value of server-side tracking: the client-side world keeps narrowing while the server-side channel stays open.

About the iOS 26 Panic Headlines

When the iOS 26 release notes shipped in September 2025⁶, much of the marketing press ran with “Apple kills UTMs.” Side-by-side iOS 18 vs iOS 26 tests by Triple Whale and Northbeam⁷ showed the panic was misplaced:

Standard UTMs (utm_source, utm_medium, utm_campaign, utm_content, utm_term) are not stripped
Default Safari normal browsing does not strip any parameter
The original ad click itself (Facebook app → product page) keeps its click ID
Stripping still applies in Private Browsing and links opened from Mail and Messages (same behavior as iOS 18)

Apple is testing extending full Link Tracking Protection to every browsing mode in Safari Technology Preview⁸. It could ship in a near-term release; the way to avoid panic when it does is to prepare the gateway infrastructure today.

Custom Parameter Transformation Pattern

Most analytics platforms, GA4 included, hardcode attribution to UTM names. Stape’s recommended pattern to work around this:

Send ad traffic with your own prefix instead of UTMs: ?st_src=newsletter&st_mdm=email&st_cmp=spring26⁹
In the server-side container, read the incoming page_location
Transform st_* parameters back to utm_* (Stape’s Query Replacer template implements this off the shelf¹⁰)
The augmented event sent to the GA4 tag now carries proper UTMs

Upside: custom names are not on any browser’s stripping list, so they survive. Trade-off: more per-platform configuration, and you still need disciplined UTM conventions on top.

First-Party Context and Custom Domains

As ITP, ETP and adblock lists keep dismantling third-party context, first-party context is the only durable foundation for tracking. The key concept here is eTLD+1.

eTLD+1 (effective Top-Level Domain plus one label) is the boundary the browser uses to decide cookie ownership. gw.dnomia.app (subdomain) and dnomia.app (apex) share the same eTLD+1; cookies are shared. dnomia-tracking.com is a different eTLD+1, so it is not first-party and Safari, Firefox and Brave will delete its cookies by default.

The 2026 picture for third-party cookies:

Browser	Third-Party Cookies	Note
Safari	Default block (2020+)	ITP
Firefox	Default block (2019+)	ETP Strict
Brave	Default block	Shields
Chrome	Still supported	Google walked back forced deprecation in July 2024 and dropped the user-choice prompt in April 2025; Privacy Sandbox APIs run as a parallel track

CNAME Subdomain vs Path-Based Proxy

Property	CNAME Subdomain (gw.example.com → vendor)	Path-Based Proxy (example.com/_track → vendor)
First-party cookie scope	Full (eTLD+1 matches)	Full (same domain)
SSL management	Vendor handles it	Your own CDN/worker layer required
CORS	Cross-origin, preflight required	Same-origin, no preflight
CNAME cloaking risk	Yes (Safari ITP 2.3+ can detect it, 7-day cap kicks in)	None
Operational load	Low (vendor runs the infrastructure)	High (you run your own worker/edge)
Vendor lock-in	Lower	Switching vendors means rewriting the proxy

CNAME cloaking caveat: starting with Safari ITP 2.3, if Safari’s classifier flags a CNAME-based tracking subdomain, even server-side Set-Cookie headers fall under the 7-day cap. CNAME first-party is not immunity, it is relative resilience.

Storage Cap and CHIPS

Safari ITP 2.3 applies a 7-day cap to all script-writable storage: localStorage, IndexedDB, service worker registrations. Trying to bypass ITP by falling back from cookies to localStorage does not work for this reason.

CHIPS (Cookies Having Independent Partitioned State): Chrome’s Partitioned cookie attribute isolates third-party cookies per top-level site inside embed/iframe contexts. Relevant for stacks that rely on cross-origin embeds rather than subdomain CNAMEs.

Picking a Gateway Domain

Be careful with the subdomain name. Adblock filter lists block analytics., tracking., pixel. and similar. Use a generic prefix: gw., c., t., m.. On the DNS side, keep the CNAME chain under three hops; resolution latency hurts pageview measurement.

Gateway Design: Path vs Query, Limits, CORS

The dominant implementations are GTM Server-Side Container (Google’s official server container, sGTM), Cloudflare Zaraz (path-based, edge), and Stape (CNAME-based managed sGTM hosting).

Splitting Responsibility Between Path and Query

Path-based gateway: /cdn-cgi/zaraz/track?event=purchase — easy CDN-level routing
Subdomain-based gateway: gw.example.com/track — independent worker/proxy

Keep event-specific data in the query and endpoint identity in the path. Mixing them breaks the cache key and makes log analysis harder.

URL Length

The “safe legacy limit” is 2 KB (the IE 2083 legacy plus a margin for SEO and sharing). Modern browser ceilings are much higher: Chrome 32 KB–2 MB, Firefox ~64 KB, Safari ~80 KB. The real constraint is server/CDN/proxy configuration: Cloudflare default header 8 KB, Nginx default 4–8 KB.

For large payloads use fetch(url, { method: 'POST', keepalive: true }). sendBeacon is still standard (WHATWG living spec), but keepalive is preferred because it supports POST/PUT and custom headers.

CORS

For a cross-origin gateway endpoint:

Access-Control-Allow-Origin must be set correctly
Preflight (OPTIONS) must be answered
For credentialled requests, Access-Control-Allow-Credentials: true plus a specific origin (wildcards are rejected)

A path-based proxy avoids all of this: same origin, no preflight. For solo setups this is one less operational headache.

Fragment and SPA: Data That Never Reaches the Server

The browser never sends anything after # to the server. Server-side logs, CDN analytics and traditional pageview tracking see every SPA route as the same pageview. Use pushState and path-based routing, not hash-based.

Attribution scenario: ?utm_source=x#campaign-detail keeps the query server-side and loses the fragment. Hashtag campaigns should put the hashtag after the query so the link does not break.

Mini Case: Shopify Custom Pixel Sandbox

The Shopify Custom Pixel (Web Pixels API) runs inside a sandboxed iframe. URL access is not what you would expect.

URL Access Method	Returns	Useful for Tracking?
`window.location.href`	Sandbox iframe URL	No (fake)
`event.context.document.location.href`	Contains the sandbox URL	No (fake)
`event.context.window.location.href` (in `page_viewed`)	Real storefront URL	Yes (the only reliable source)

Cookie and storage isolation: window.localStorage and document.cookie work inside the sandbox iframe but do not touch the top frame. To access storefront cookies use browser.cookie and browser.localStorage (async APIs).

URL implication: when the sandbox URL replaces the real URL, every GA4 pageview goes to the fake URL. Turn off GA4 auto page_view and send pageviews manually using event.context.window.location.href.

Primary domain trap: if Settings → Domains has the primary domain set to a host that redirects from another domain, the page navigates before the pixel finishes and sandbox tracking breaks. Domain alignment is mandatory: Shopify primary = GA4 Web Stream = GTM container domain.

End-to-End Scenario: Shopify + Cloudflare + GA4 + Meta CAPI

A Shopify store sitting behind Cloudflare, sending data through the Custom Pixel to GA4 and Meta CAPI.

The customer lands on https://shop.example.com/products/widget?utm_source=newsletter&fbclid=....
Inside page_viewed, the Custom Pixel sandbox reads event.context.window.location.href and gets the real URL.
UTMs and fbclid are filtered through an allow-list (no personal data). fbclid is wrapped into the format Meta expects: fb.1.<unix_timestamp_ms>.<fbclid>. A raw fbclid is not enough.
The pixel posts to the path-based gateway: fetch('https://shop.example.com/cdn-cgi/zaraz/track', { method: 'POST', keepalive: true, body: JSON.stringify(payload) }).
Same origin, so no CORS preflight. Cookies are fully first-party (eTLD+1 matches).
The Cloudflare Worker duplicates the payload and fans out to GA4 Measurement Protocol and the Meta CAPI endpoint. The gcs consent flag rides along on the GA4 request.
The fbc cookie joins _fbp and lands in the user_data block of the CAPI event.

Every step depends on reading the URL anatomy correctly: wrong host means losing first-party context; PII in the query becomes a log leak; UTMs in the fragment disappear; a sandbox URL produces fake pageviews; a raw fbclid breaks Meta deduplication.

URI, URL, URN: Short Reference

A quick note for readers who came here looking for the old article:

URI: the broadest umbrella; any string that identifies a resource.
URL: the “locator” variant of URI; tells you how to reach a resource (https://..., ftp://...).
URN: the “name” variant of URI; a persistent identifier independent of location (urn:isbn:..., urn:uuid:...).

In practice, “URL” is enough on the web. The URI/URN distinction shows up in RFC 3986¹¹ and W3C standards, and matters when you design resource management or metadata systems. Day-to-day tracking work is entirely about reading URL parts; URNs rarely come up. The canonical reference for URI schemes is the IANA registry¹².

Summary: Which URL Part Affects What

URL Part	Tracking Effect
`scheme` (https)	Cookie Secure flag, HTTP/2, ITP compliance
`host`	eTLD+1, first-party scope, CNAME cloaking risk
`path`	Gateway endpoint identity, CORS behavior
`query`	Attribution backbone, browser stripping surface, PII leakage vector
`fragment` (`#`)	Never reaches the server, source of SPA pageview loss

Gateway design starts with a deliberate decision on each of these parts. Reading the URL as a five-part data structure rather than as a single browser-bar string is the foundation of clean tracking data.

Footnotes

Update to iOS 14 campaign measurement. Google Ads Help wbraid and gbraid were introduced in March 2021 for iOS app campaign measurement; they carry campaign-level data without binding to an individual user. ↩
Private Browsing 2.0. WebKit Blog On Apple’s Link Tracking Protection scope: “campaign parameter that’s only used for campaign attribution, as opposed to click or user-level tracking. Safari allows such parameters to pass through.” ↩
iOS 26 UTM tracking tests. Triple Whale Side-by-side iOS 18 vs iOS 26 testing: “Safari still preserves common tracking tags (like utm_source, utm_medium, and utm_campaign).” ↩
WebKit Features in Safari 26.0. WebKit Blog Safari 26 turns Advanced Fingerprinting Protection on by default. ↩
Safari 26 tracking changes explained. Taggrs January 2026 field validation: AFP blocks tracker scripts from reading URL and referrer, but the full URL still reaches the server. ↩
iOS & iPadOS 26 Release Notes. Apple Developer Apple’s official iOS 26 release notes. ↩
iOS 26 Won’t Kill Your UTMs. Northbeam Controlled iOS 18 vs iOS 26 test results; click IDs are stripped in Messages/Mail/Private mode, UTMs survive across all modes. ↩
Safari vs UTM: Do We Really Need to Panic? Alex Ignatenko Safari Technology Preview tests indicating full LTP could expand to all modes in a future release. ↩
How to avoid UTM parameters removal. Stape The custom parameter transformation pattern: send ads with st_src/st_mdm/st_cmp, rewrite to utm_* inside the server-side container. ↩
Query Replacer Variable. Stape on GitHub Stape’s open-source sGTM Query Replacer template. ↩
RFC 3986: Uniform Resource Identifier (URI): Generic Syntax. IETF The canonical specification of URI syntax. ↩
Uniform Resource Identifier (URI) Schemes. IANA IANA’s URI scheme registry. ↩

Key Takeaways

01 Fragments (#) never reach the server; SPAs that use hash routing make pageviews invisible in server-side logs
02 Safari Link Tracking Protection only strips click IDs (gclid, fbclid, msclkid, ttclid, dclid, yclid); UTM parameters survive in default normal browsing
03 Safari 26 AFP blocks tracker-classified scripts from reading URL query and document.referrer at the browser API layer, but the full URL still reaches the server
04 Inside the Custom Pixel sandbox window.location.href returns a fake URL; the real storefront URL is event.context.window.location.href
05 CNAME first-party subdomains are not bulletproof; when cloaking is detected, even server-side cookies fall to a 7-day cap

Frequently Asked Questions (FAQ)

+ Does Safari strip UTM parameters?

No. Safari Link Tracking Protection only strips click IDs (gclid, fbclid, msclkid, ttclid, dclid, yclid). UTM parameters survive in default normal browsing. iOS 26 behaves the same as iOS 18 here, despite the panic headlines. Stripping only kicks in for Private Browsing, links opened from Mail and Messages, or when the user manually enables Advanced Tracking and Fingerprinting Protection.

+ Why is the URL fragment (#) a problem for server-side tracking?

Browsers never send anything after # in the HTTP request. If a SPA uses hash routing (#/page1, #/page2), server-side logs, CDN analytics and classic pageview tracking see every route as the same pageview. Use pushState with path-based routing instead. Attribution parameters belong in the query, not the fragment.

+ Should my server-side gateway be path-based or subdomain-based?

Path-based proxies (e.g. example.com/_track) live on the same origin: no CORS preflight, no CNAME cloaking detection risk. Subdomain-based gateways (e.g. gw.example.com) push operational load onto the vendor but are exposed to CNAME cloaking classifiers. Cloudflare Zaraz is path-based, Stape is CNAME-based, GTM Server-Side supports both. For solo teams the path-based route usually costs less to run.

+ How do I read the real URL inside a Shopify Custom Pixel?

Because the Custom Pixel runs in a sandbox iframe, window.location.href and event.context.document.location.href both return a fake sandbox URL. The only reliable source for the real storefront URL is event.context.window.location.href inside the page_viewed event payload. Capture it there and propagate it to other events as a custom dimension. Turn off GA4 auto page_view and send pageviews manually with the captured URL.

+ What is the custom parameter transformation pattern?

Most analytics platforms, GA4 included, hardcode attribution to the UTM names. To avoid browser stripping you send traffic with your own prefix instead of UTMs (e.g. st_src, st_mdm, st_cmp), then transform them back to utm_* inside the server-side container by reading page_location and rewriting it. Custom names are not on any browser's stripping list, so they survive. Stape's Query Replacer template implements this officially.

analytics tracking