Skip to content
ceaksan

Reading URLs for Tracking: Gateway Design, ITP and Pixel Sandbox

How URL parts (scheme, host, path, query, fragment) impact tracking. Server-side gateway design, first-party context, Safari LTP and Safari 26 AFP, Shopify Custom Pixel sandbox pitfalls.

Jun 11, 2019 10 min read Updated: May 15, 2026
TL;DR

A large share of tracking data loss comes from misreading the URL. Fragments never reach the server, query parameters get stripped differently by each browser, sandbox iframes return fake URLs from window.location.href, and once Safari classifies a domain as CNAME cloaking even server-side cookies fall to 7 days. Developers who know URL anatomy design gateway endpoints correctly, retain click IDs, and read the right context inside the Shopify Custom Pixel.

Most Tracking Data Loss Starts at the URL

Direct traffic is inflated, GA4 and Meta Ads Manager refuse to reconcile, the Shopify Custom Pixel keeps reporting a sandbox URL instead of the real product page. These symptoms share one root cause: the URL is not being read correctly for tracking.

A URL is not just a string in the browser address bar; it is a structure of scheme, host, path, query and fragment, and each part produces different tracking outcomes. Which part reaches the server, which parameter the browser strips, whether a CNAME subdomain actually guarantees first-party context, how the sandbox iframe hides the real URL: this article goes through each in turn, with field evidence.

URL Anatomy

https://shop.example.com/products/widget?utm_source=newsletter&gclid=abc#size-l
└─┬─┘   └──────┬───────┘└─────┬─────┘└──────────────┬───────────┘└──┬──┘
  │            │              │                     │                │
scheme       host           path                 query           fragment
PartRole in Tracking
schemehttps mandatory (cookie Secure flag, ITP compliance, HTTP/2 multiplexing)
hostDetermines eTLD+1, the basis of cookie scope and first-party context
pathGateway endpoint identity, routing decision
queryThe attribution backbone: UTMs, click IDs, custom parameters
fragmentNever reaches the server. Hash-only SPA navigation loses every server-side pageview

Origin: the scheme + host + port triple defines the “origin” that anchors browser security. Same-origin policy, CORS, postMessage targetOrigin checks, cookie scope, Content Security Policy rules: all of them work off this triple.

Query String: The Attribution Backbone and the Leakage Surface

The query string is the richest tracking surface and the most fragile one. Every platform expects a different parameter set, every browser strips a different list, and personal data leaks usually happen through query.

Standard Parameters

  • UTM standard: utm_source, utm_medium, utm_campaign, utm_content, utm_term. GA4 source/medium resolution depends entirely on these.
  • Platform click IDs: gclid (Google Ads), wbraid/gbraid (Google iOS app, transitional parameters1), fbclid (Meta), ttclid (TikTok), msclkid (Microsoft), dclid (DoubleClick), yclid (Yahoo).
  • Consent Mode v2 signals: values like gcs=G100 and gcs=G111 carry the consent state to the server, and the server-side container uses them to control tag firing. If consent never reaches the server you risk legal exposure and platform-side penalties (Google: targeting degradation; Meta CAPI: deduplication failures).

Encoding Pitfalls

Different platforms encode non-ASCII characters differently. GA4 expects UTF-8 raw, some conversion API endpoints expect percent-encoded UTF-8, email platforms can double-encode when their templates pass through several systems. Practical rule: keep UTM values ASCII and lowercase, and put the human-readable campaign title in a separate field.

%20 and + are different space encodings. & or = inside a campaign name will break the query string parser unless encoded. Double encoding (%2520) is usually a copy-paste artifact, and it is the most common breakage in hand-built link templates.

PII Leakage

When ?email=user@example.com, ?phone=..., ?order_id=... end up in URLs, they can leak into:

  • Browser history
  • Same-origin server access logs
  • CDN edge logs
  • Error tracking services (Sentry and others)
  • The GA4 page_location dimension
  • Third-party scripts running on the page (heatmaps, embedded ads)

Modern browsers default to Referrer-Policy: strict-origin-when-cross-origin, so path and query no longer leak in cross-origin Referer headers. But Referrer-Policy: unsafe-url or any same-origin flow preserves the full URL. Putting personal data in a URL is a direct leak under GDPR and similar regimes. Carry personal data in the POST body or as a hashed identifier, never in the query string. At the gateway, filter query parameters through an allow-list before ingestion.

How Browsers Filter URLs

Privacy-first browsers strip known tracking parameters automatically, but the scope varies a lot, and it has changed enough over the years that most panic headlines are wrong. The actual 2026 picture:

BrowserDefault Normal BrowsingStripping Active WhenParameters Stripped
SafariNo strippingPrivate Browsing, links opened from Mail/Messages, manual ATFP enabled2 3gclid, fbclid, msclkid, ttclid, dclid, yclid (UTMs preserved)
FirefoxNo strippingETP Strict (must be enabled manually)fbclid, mc_eid, oly_enc_id, __s, vero_id, _hsenc, mkt_tok (gclid not on the list)
BraveDefault onDefault ShieldsClick IDs and some mailer parameters
DuckDuckGoDefault onBrowser app and extensionClick IDs and a tracking parameter set
ChromeNo strippingNo automatic stripping-

Safari 26 Advanced Fingerprinting Protection (AFP): introduced in September 2025, Safari 26 turns AFP on by default in every browsing mode4. AFP does three things: (1) restricts tracker-classified scripts from reaching high-entropy device APIs, (2) prevents those scripts from writing long-lived storage, (3) blocks tracker scripts from reading the page’s URL query string and document.referrer through the browser APIs.

The critical nuance: AFP restricts in-page scripts; the full URL still reaches the server via the HTTP request5. A server-side gateway receives the pageview URL intact, including gclid and UTMs. AFP is a browser-level decision that directly increases the value of server-side tracking: the client-side world keeps narrowing while the server-side channel stays open.

About the iOS 26 Panic Headlines

When the iOS 26 release notes shipped in September 20256, much of the marketing press ran with “Apple kills UTMs.” Side-by-side iOS 18 vs iOS 26 tests by Triple Whale and Northbeam7 showed the panic was misplaced:

  • Standard UTMs (utm_source, utm_medium, utm_campaign, utm_content, utm_term) are not stripped
  • Default Safari normal browsing does not strip any parameter
  • The original ad click itself (Facebook app → product page) keeps its click ID
  • Stripping still applies in Private Browsing and links opened from Mail and Messages (same behavior as iOS 18)

Apple is testing extending full Link Tracking Protection to every browsing mode in Safari Technology Preview8. It could ship in a near-term release; the way to avoid panic when it does is to prepare the gateway infrastructure today.

Custom Parameter Transformation Pattern

Most analytics platforms, GA4 included, hardcode attribution to UTM names. Stape’s recommended pattern to work around this:

  1. Send ad traffic with your own prefix instead of UTMs: ?st_src=newsletter&st_mdm=email&st_cmp=spring269
  2. In the server-side container, read the incoming page_location
  3. Transform st_* parameters back to utm_* (Stape’s Query Replacer template implements this off the shelf10)
  4. The augmented event sent to the GA4 tag now carries proper UTMs

Upside: custom names are not on any browser’s stripping list, so they survive. Trade-off: more per-platform configuration, and you still need disciplined UTM conventions on top.

First-Party Context and Custom Domains

As ITP, ETP and adblock lists keep dismantling third-party context, first-party context is the only durable foundation for tracking. The key concept here is eTLD+1.

eTLD+1 (effective Top-Level Domain plus one label) is the boundary the browser uses to decide cookie ownership. gw.dnomia.app (subdomain) and dnomia.app (apex) share the same eTLD+1; cookies are shared. dnomia-tracking.com is a different eTLD+1, so it is not first-party and Safari, Firefox and Brave will delete its cookies by default.

The 2026 picture for third-party cookies:

BrowserThird-Party CookiesNote
SafariDefault block (2020+)ITP
FirefoxDefault block (2019+)ETP Strict
BraveDefault blockShields
ChromeStill supportedGoogle walked back forced deprecation in July 2024 and dropped the user-choice prompt in April 2025; Privacy Sandbox APIs run as a parallel track

CNAME Subdomain vs Path-Based Proxy

PropertyCNAME Subdomain (gw.example.com → vendor)Path-Based Proxy (example.com/_track → vendor)
First-party cookie scopeFull (eTLD+1 matches)Full (same domain)
SSL managementVendor handles itYour own CDN/worker layer required
CORSCross-origin, preflight requiredSame-origin, no preflight
CNAME cloaking riskYes (Safari ITP 2.3+ can detect it, 7-day cap kicks in)None
Operational loadLow (vendor runs the infrastructure)High (you run your own worker/edge)
Vendor lock-inLowerSwitching vendors means rewriting the proxy

CNAME cloaking caveat: starting with Safari ITP 2.3, if Safari’s classifier flags a CNAME-based tracking subdomain, even server-side Set-Cookie headers fall under the 7-day cap. CNAME first-party is not immunity, it is relative resilience.

Storage Cap and CHIPS

Safari ITP 2.3 applies a 7-day cap to all script-writable storage: localStorage, IndexedDB, service worker registrations. Trying to bypass ITP by falling back from cookies to localStorage does not work for this reason.

CHIPS (Cookies Having Independent Partitioned State): Chrome’s Partitioned cookie attribute isolates third-party cookies per top-level site inside embed/iframe contexts. Relevant for stacks that rely on cross-origin embeds rather than subdomain CNAMEs.

Picking a Gateway Domain

Be careful with the subdomain name. Adblock filter lists block analytics., tracking., pixel. and similar. Use a generic prefix: gw., c., t., m.. On the DNS side, keep the CNAME chain under three hops; resolution latency hurts pageview measurement.

Gateway Design: Path vs Query, Limits, CORS

The dominant implementations are GTM Server-Side Container (Google’s official server container, sGTM), Cloudflare Zaraz (path-based, edge), and Stape (CNAME-based managed sGTM hosting).

Splitting Responsibility Between Path and Query

  • Path-based gateway: /cdn-cgi/zaraz/track?event=purchase — easy CDN-level routing
  • Subdomain-based gateway: gw.example.com/track — independent worker/proxy

Keep event-specific data in the query and endpoint identity in the path. Mixing them breaks the cache key and makes log analysis harder.

URL Length

The “safe legacy limit” is 2 KB (the IE 2083 legacy plus a margin for SEO and sharing). Modern browser ceilings are much higher: Chrome 32 KB–2 MB, Firefox ~64 KB, Safari ~80 KB. The real constraint is server/CDN/proxy configuration: Cloudflare default header 8 KB, Nginx default 4–8 KB.

For large payloads use fetch(url, { method: 'POST', keepalive: true }). sendBeacon is still standard (WHATWG living spec), but keepalive is preferred because it supports POST/PUT and custom headers.

CORS

For a cross-origin gateway endpoint:

  • Access-Control-Allow-Origin must be set correctly
  • Preflight (OPTIONS) must be answered
  • For credentialled requests, Access-Control-Allow-Credentials: true plus a specific origin (wildcards are rejected)

A path-based proxy avoids all of this: same origin, no preflight. For solo setups this is one less operational headache.

Fragment and SPA: Data That Never Reaches the Server

The browser never sends anything after # to the server. Server-side logs, CDN analytics and traditional pageview tracking see every SPA route as the same pageview. Use pushState and path-based routing, not hash-based.

Attribution scenario: ?utm_source=x#campaign-detail keeps the query server-side and loses the fragment. Hashtag campaigns should put the hashtag after the query so the link does not break.

Mini Case: Shopify Custom Pixel Sandbox

The Shopify Custom Pixel (Web Pixels API) runs inside a sandboxed iframe. URL access is not what you would expect.

URL Access MethodReturnsUseful for Tracking?
window.location.hrefSandbox iframe URLNo (fake)
event.context.document.location.hrefContains the sandbox URLNo (fake)
event.context.window.location.href (in page_viewed)Real storefront URLYes (the only reliable source)

Cookie and storage isolation: window.localStorage and document.cookie work inside the sandbox iframe but do not touch the top frame. To access storefront cookies use browser.cookie and browser.localStorage (async APIs).

URL implication: when the sandbox URL replaces the real URL, every GA4 pageview goes to the fake URL. Turn off GA4 auto page_view and send pageviews manually using event.context.window.location.href.

Primary domain trap: if Settings → Domains has the primary domain set to a host that redirects from another domain, the page navigates before the pixel finishes and sandbox tracking breaks. Domain alignment is mandatory: Shopify primary = GA4 Web Stream = GTM container domain.

End-to-End Scenario: Shopify + Cloudflare + GA4 + Meta CAPI

A Shopify store sitting behind Cloudflare, sending data through the Custom Pixel to GA4 and Meta CAPI.

  1. The customer lands on https://shop.example.com/products/widget?utm_source=newsletter&fbclid=....
  2. Inside page_viewed, the Custom Pixel sandbox reads event.context.window.location.href and gets the real URL.
  3. UTMs and fbclid are filtered through an allow-list (no personal data). fbclid is wrapped into the format Meta expects: fb.1.<unix_timestamp_ms>.<fbclid>. A raw fbclid is not enough.
  4. The pixel posts to the path-based gateway: fetch('https://shop.example.com/cdn-cgi/zaraz/track', { method: 'POST', keepalive: true, body: JSON.stringify(payload) }).
  5. Same origin, so no CORS preflight. Cookies are fully first-party (eTLD+1 matches).
  6. The Cloudflare Worker duplicates the payload and fans out to GA4 Measurement Protocol and the Meta CAPI endpoint. The gcs consent flag rides along on the GA4 request.
  7. The fbc cookie joins _fbp and lands in the user_data block of the CAPI event.

Every step depends on reading the URL anatomy correctly: wrong host means losing first-party context; PII in the query becomes a log leak; UTMs in the fragment disappear; a sandbox URL produces fake pageviews; a raw fbclid breaks Meta deduplication.

URI, URL, URN: Short Reference

A quick note for readers who came here looking for the old article:

  • URI: the broadest umbrella; any string that identifies a resource.
  • URL: the “locator” variant of URI; tells you how to reach a resource (https://..., ftp://...).
  • URN: the “name” variant of URI; a persistent identifier independent of location (urn:isbn:..., urn:uuid:...).

In practice, “URL” is enough on the web. The URI/URN distinction shows up in RFC 398611 and W3C standards, and matters when you design resource management or metadata systems. Day-to-day tracking work is entirely about reading URL parts; URNs rarely come up. The canonical reference for URI schemes is the IANA registry12.

Summary: Which URL Part Affects What

URL PartTracking Effect
scheme (https)Cookie Secure flag, HTTP/2, ITP compliance
hosteTLD+1, first-party scope, CNAME cloaking risk
pathGateway endpoint identity, CORS behavior
queryAttribution backbone, browser stripping surface, PII leakage vector
fragment (#)Never reaches the server, source of SPA pageview loss

Gateway design starts with a deliberate decision on each of these parts. Reading the URL as a five-part data structure rather than as a single browser-bar string is the foundation of clean tracking data.

Footnotes

  1. Update to iOS 14 campaign measurement. Google Ads Help wbraid and gbraid were introduced in March 2021 for iOS app campaign measurement; they carry campaign-level data without binding to an individual user.
  2. Private Browsing 2.0. WebKit Blog On Apple’s Link Tracking Protection scope: “campaign parameter that’s only used for campaign attribution, as opposed to click or user-level tracking. Safari allows such parameters to pass through.”
  3. iOS 26 UTM tracking tests. Triple Whale Side-by-side iOS 18 vs iOS 26 testing: “Safari still preserves common tracking tags (like utm_source, utm_medium, and utm_campaign).”
  4. WebKit Features in Safari 26.0. WebKit Blog Safari 26 turns Advanced Fingerprinting Protection on by default.
  5. Safari 26 tracking changes explained. Taggrs January 2026 field validation: AFP blocks tracker scripts from reading URL and referrer, but the full URL still reaches the server.
  6. iOS & iPadOS 26 Release Notes. Apple Developer Apple’s official iOS 26 release notes.
  7. iOS 26 Won’t Kill Your UTMs. Northbeam Controlled iOS 18 vs iOS 26 test results; click IDs are stripped in Messages/Mail/Private mode, UTMs survive across all modes.
  8. Safari vs UTM: Do We Really Need to Panic? Alex Ignatenko Safari Technology Preview tests indicating full LTP could expand to all modes in a future release.
  9. How to avoid UTM parameters removal. Stape The custom parameter transformation pattern: send ads with st_src/st_mdm/st_cmp, rewrite to utm_* inside the server-side container.
  10. Query Replacer Variable. Stape on GitHub Stape’s open-source sGTM Query Replacer template.
  11. RFC 3986: Uniform Resource Identifier (URI): Generic Syntax. IETF The canonical specification of URI syntax.
  12. Uniform Resource Identifier (URI) Schemes. IANA IANA’s URI scheme registry.
Key Takeaways
  • 01 Fragments (#) never reach the server; SPAs that use hash routing make pageviews invisible in server-side logs
  • 02 Safari Link Tracking Protection only strips click IDs (gclid, fbclid, msclkid, ttclid, dclid, yclid); UTM parameters survive in default normal browsing
  • 03 Safari 26 AFP blocks tracker-classified scripts from reading URL query and document.referrer at the browser API layer, but the full URL still reaches the server
  • 04 Inside the Custom Pixel sandbox window.location.href returns a fake URL; the real storefront URL is event.context.window.location.href
  • 05 CNAME first-party subdomains are not bulletproof; when cloaking is detected, even server-side cookies fall to a 7-day cap
Frequently Asked Questions (FAQ)
+ Does Safari strip UTM parameters?

No. Safari Link Tracking Protection only strips click IDs (gclid, fbclid, msclkid, ttclid, dclid, yclid). UTM parameters survive in default normal browsing. iOS 26 behaves the same as iOS 18 here, despite the panic headlines. Stripping only kicks in for Private Browsing, links opened from Mail and Messages, or when the user manually enables Advanced Tracking and Fingerprinting Protection.

+ Why is the URL fragment (#) a problem for server-side tracking?

Browsers never send anything after # in the HTTP request. If a SPA uses hash routing (#/page1, #/page2), server-side logs, CDN analytics and classic pageview tracking see every route as the same pageview. Use pushState with path-based routing instead. Attribution parameters belong in the query, not the fragment.

+ Should my server-side gateway be path-based or subdomain-based?

Path-based proxies (e.g. example.com/_track) live on the same origin: no CORS preflight, no CNAME cloaking detection risk. Subdomain-based gateways (e.g. gw.example.com) push operational load onto the vendor but are exposed to CNAME cloaking classifiers. Cloudflare Zaraz is path-based, Stape is CNAME-based, GTM Server-Side supports both. For solo teams the path-based route usually costs less to run.

+ How do I read the real URL inside a Shopify Custom Pixel?

Because the Custom Pixel runs in a sandbox iframe, window.location.href and event.context.document.location.href both return a fake sandbox URL. The only reliable source for the real storefront URL is event.context.window.location.href inside the page_viewed event payload. Capture it there and propagate it to other events as a custom dimension. Turn off GA4 auto page_view and send pageviews manually with the captured URL.

+ What is the custom parameter transformation pattern?

Most analytics platforms, GA4 included, hardcode attribution to the UTM names. To avoid browser stripping you send traffic with your own prefix instead of UTMs (e.g. st_src, st_mdm, st_cmp), then transform them back to utm_* inside the server-side container by reading page_location and rewriting it. Custom names are not on any browser's stripping list, so they survive. Stape's Query Replacer template implements this officially.