Skip to content
ceaksan

Pivot Tables Guide: From Excel to Polars, Automated with AI

Pivot tables in Excel, Google Sheets, Pandas/Polars and AI. Polars 5-10x speed, LLM hallucination pitfalls, privacy-friendly local LLM examples in a practical guide.

May 25, 2020 18 min read Updated: Apr 27, 2026
TL;DR

Pivot table is no longer just a row-and-column transposition tool but a data engine at the center of the data analytics ecosystem. Pivot tables can now be created through natural language via Excel Copilot and Google Gemini, Polars delivers 5-10x faster pivot operations than pandas. LLM limitations with tabular data (transposition accuracy at 50%, numerical hallucination at 59-82%) mean computation should be delegated to deterministic tools, with LLMs positioned only for insight generation. Local LLMs (Ollama + Qwen2.5-Coder) enable KVKK-compliant, zero-cost e-commerce data analysis; hybrid architecture achieves 85-95% cost reduction.

The first version of this article was published in 2020 and covered basic pivot table creation using Excel and Google Sheets. Since then, pivot table has evolved from a “row and column transposition” tool to a data engine at the center of the data analytics ecosystem. Especially with the proliferation of AI agents, pivot table is no longer the final result but serves as an intermediate layer that artificial intelligence uses to make sense of data.

Pivot Table Fundamentals

A Pivot Table (Summary Table) is a tool used to analyze, summarize, and report records in a data table. Its fundamental function is to designate a column as a grouping key, transform rows into columns, and apply aggregation functions (sum, average, count).

Through this process, you can:

  • Summarize large data tables with grouping, filtering, and conditional formatting
  • Calculate totals and subtotals of numerical data at category and subcategory levels
  • Adjust detail scope by expanding or collapsing operations
  • Make comparisons using different calculation functions (numerical, text, date)

Key rules when creating pivot tables: columns must have headers, each column should contain a single data type, and data entry consistency must be maintained (e.g., “Email” should not appear as “E-mail” in another row).

Modern Excel: Power Pivot, DAX and Copilot

Power Pivot and Data Model

In 2020, Excel’s row limit (1,048,576) was one of the biggest constraints when working with pivot tables. Power Pivot’s Data Model eliminates this limit: the xVelocity (VertiPaq) in-memory columnar engine supports approximately 2 billion rows per table1. Data is stored in the backend Data Model rather than the Excel worksheet and reported via pivot.

DAX (Data Analysis Expressions)

DAX, which comes with Power Pivot, takes pivot table calculations beyond standard Excel formulas2:

DAX FunctionDescriptionUse Case
CALCULATEDynamic aggregation by changing filter contextConditional sales totals
SUMXRow-by-row iteration and summationSUMX(Sales, Sales[Qty] * Sales[Margin])
RELATEDFetch data from related tablesMulti-table pivot
SAMEPERIODLASTYEARSame period last year comparisonYoY analysis
ALL / ALLEXCEPTRemove filters for grand totalsPercentage calculations

Copilot Agent Mode

Agent Mode, generally available since December 2025, brings pivot table creation to natural language in Excel3:

  • PivotTables can be automatically created with commands like “Create a quarterly sales pivot by region”
  • User control is provided at planning, creation, and validation stages
  • OpenAI GPT 5.2 or Anthropic Claude Opus 4.5 models can be used in the backend
  • The =COPILOT() formula enables cell-level forecasting, summarization, and aggregation

Excel’s former “Analyze Data” (App Skills) feature is being removed by late February 2026; Agent Mode and Analyst are taking its place.

Google Sheets: Connected Sheets and Gemini

Connected Sheets + BigQuery

Google Connected Sheets brings BigQuery’s scale to the Sheets interface. With the October 2025 update4:

  • Pivot table row capacity increased from 100,000 to 200,000
  • Drill-down feature added by double-clicking pivot cells
  • CSV files from Drive can be imported directly to BigQuery and analyzed via Connected Sheets

This enables billions of rows of e-commerce data to be analyzed as pivot tables in the Sheets interface.

Gemini Integration

With the May and October 2025 updates, Gemini in Google Sheets enables5:

  • Creating pivot tables with natural language (“Create a pivot table showing sum of sales by region”)
  • Multi-step execution from a single prompt (data cleaning + formatting + pivot creation)
  • Cross-pivot analysis across multiple tables
  • Focused analysis on selected ranges

Google Sheets Limits

LimitValue
Cells10 million
Columns18,278
Tabs200
Characters per cell50,000

Python: pandas, Polars and DuckDB

pandas pivot_table()

pandas 3.0.0 (January 2026) introduces a significant behavioral change to pivot_table(): the observed parameter now defaults to True; only observed categories are shown. Copy-on-Write has also become the default and only mode6.

import pandas as pd

df = pd.DataFrame({
    'Date': ['2026-01', '2026-01', '2026-02', '2026-02'],
    'Category': ['Electronics', 'Fashion', 'Electronics', 'Fashion'],
    'Sales': [1200, 450, 1500, 300],
    'Returns': [1, 2, 0, 5]
})

pivot = df.pivot_table(
    index='Category',
    columns='Date',
    values='Sales',
    aggfunc='sum'
)

Polars: The Performance Revolution

Polars 1.38.1 (February 2026) is a DataFrame library written in Rust, positioned as an alternative to pandas. It runs 1.6 to 8.6 times faster for pivot operations depending on data size7:

Data SizepandasPolarsDifference
10K rowsBaseline~1.6x fasterSmall difference
100M rowsBaseline~8.6x fasterSignificant difference
1 GB CSV loadingBaseline~5x faster, 87% less memoryCritical difference
import polars as pl

df = pl.DataFrame({
    'Category': ['Electronics', 'Fashion', 'Electronics', 'Fashion'],
    'Month': ['January', 'January', 'February', 'February'],
    'Sales': [1200, 450, 1500, 300]
})

pivot = df.pivot(
    on='Month',
    index='Category',
    values='Sales',
    aggregate_function='sum'
)

Polars’ streaming engine introduced in 2025 provides an additional 3-7x improvement over the in-memory engine. For scenarios where e-commerce data reaches millions of rows, Polars is the biggest best practice of 2026.

Real-World Benchmark: 1 Million Row E-Commerce Simulation

To make the “Polars is faster than pandas” claim concrete, I ran a benchmark on a 1 million row synthetic e-commerce dataset (167K customers; order_id, date, region, category, quantity, price with realistic distributions). The notebook is open source and readers can verify the results on their own machines.

Three core operations were benchmarked (each run 3 times, taking the best wall-clock time and RSS memory delta):

OperationPandasPolarsSpeedupMemory (Pandas / Polars)
group_by (region x category revenue)75 ms14 ms5.4x7.2x less
pivot_table (matrix view)73 ms28 ms2.6x1.4x less
RFM segmentation (167K groups)47 ms17 ms2.8x2.4x less

Polars LazyFrame advantage: Running the same RFM logic with LazyFrame and a “last 90 days” filter, the Polars query optimizer (via filter and projection pushdown) avoided scanning the full table and completed in 4 milliseconds, using only 1.4MB extra memory. Across the entire benchmark, Polars used 2 to 7 times less RAM than pandas.

The gap widens with data size: official Polars benchmarks report up to 8.6x speedup on 100M row datasets7. That alone is not a reason to switch from pandas to Polars, but in large-data plus high-cardinality scenarios (RFM, cohort analysis, per-user aggregation), Polars’ parallel hash groupby and Rust SIMD provide a clear advantage.

I email the Pivot Benchmark Notebook to subscribers

Subscribers get the Colab link plus a privacy-friendly Ollama setup script. Polars vs Pandas timing, RFM, local LLM pipeline in one drop.

No spam, unsubscribe anytime.

What's inside
  • One-click Colab run (no install)
  • Pandas vs Polars: group_by, pivot_table, RFM benchmark
  • Polars LazyFrame + filter pushdown example
  • Privacy-friendly PII masking + Ollama Qwen2.5 pipeline

DuckDB: Pivot with SQL

DuckDB 1.4.4 (January 2026) offers first-class PIVOT/UNPIVOT SQL syntax8:

-- Simple pivot
PIVOT sales ON month USING sum(amount);

-- SQL standard syntax
SELECT * FROM sales
PIVOT (sum(amount) AS total FOR month IN ('January', 'February', 'March'));

DuckDB can run SQL directly on pandas DataFrames and provides advantages in memory management for large datasets.

R: tidyverse and data.table

pivot_wider() and pivot_longer()

Introduced in tidyr 1.0.0 (September 2019), pivot_wider() and pivot_longer() replaced the old spread() and gather() functions9. The naming is more intuitive: pivot_longer for long format, pivot_wider for wide format.

library(tidyr)

# Convert long format to wide format (pivot)
df %>%
  pivot_wider(
    names_from = Month,
    values_from = Sales,
    values_fn = sum
  )

data.table Alternative

data.table 1.18.0 (December 2025) offers reshaping operations with dcast() (wide) and melt() (long). Thanks to internal fast radix sorting and binary search, it runs approximately 5 times faster than tidyr10.

library(data.table)

dcast(dt, Category ~ Month, value.var = "Sales", fun.aggregate = sum)

JavaScript: Browser-Based Pivot

Current State

ToolStatusCapacityLicense
PivotTable.jsUnmaintained (~7 years since last update)Small dataMIT
WebDataRocksActive~1 MBFree
FlexmonsterActive~1 GB, millions of rowsCommercial
AG Grid (Enterprise)ActiveServer-side, unlimitedCommercial
ArqueroActive1M+ rows (in browser)BSD

Arquero, developed by the UW Interactive Data Lab, is a columnar JavaScript data transformation library. It can process over 1 million rows in the browser and offers pivot/reshape operations11.

AG Grid Enterprise offers the most comprehensive pivot features with server-side row models. React, Angular, and Vue integrations are available.

AI Agents and Pivot Tables

With the proliferation of AI agents, the role of pivot tables is changing: no longer just a report for humans to read, but a method for artificial intelligence to break data into digestible chunks.

Creating Pivots with Natural Language

As of 2026 there are four distinct AI approaches to pivot creation, each suited to a different use case:

CapabilityExcel Copilot Agent ModeClaude for Excel (sidebar)Gemini in SheetsClaude / ChatGPT (web + Code Interpreter)
Natural-language pivot”Quarterly sales pivot by region”Reads multi-tab, preserves formula depsMulti-table pivot (March 2026)Upload CSV, generate code
Cell-level citation=COPILOT() formulaCell-level citation: “source C12:G45”Description textNone
Cross-app contextWithin Office 365Excel ↔ PowerPoint shared contextWithin WorkspaceSkill invocation
Data privacyMicrosoft CloudAnthropic APIGoogle CloudCloud via API
GDPR / private dataIn cloudIn cloudIn cloudIn cloud (alt: see Local LLM section below)
Typical userExcel power user, financeMulti-tab financial modelsSheets + BigQuery analystDeveloper, ad-hoc analysis

Excel Copilot Agent Mode (December 2025): Natural-language pivot creation inside Microsoft 365. The =COPILOT() formula enables cell-level AI usage. Backed by OpenAI GPT 5.2 or Anthropic Claude Opus 4.5.

Claude for Excel (February 2026 add-in): Runs in an Excel sidebar. The only tool that reads multi-tab workbooks, preserves formula dependencies on update, and provides cell-level citations. Anthropic Skills 2.0 lets you run workflow packages like pivot rebuilds or financial models. As of March 2026 it supports shared context between Excel and PowerPoint.

Gemini in Sheets (multi-table pivot, March-April 2026): Builds a single pivot summarizing multiple tables. Multi-step tasks like P&L dashboards, scorecards, and bar charts can run from one prompt. Connected Sheets reaches billions of rows in BigQuery.

Claude Code Interpreter / ChatGPT Advanced Data Analysis: Upload CSV/TSV, generate Python code for pivot, visualization, and Excel output12. Bypasses numerical hallucination by running deterministic code rather than predicting numbers.

Decision tree:

  • Data lives in Excel and formula dependencies matter? -> Claude for Excel
  • Data lives in Excel and you are deep in the Microsoft 365 ecosystem? -> Excel Copilot Agent Mode
  • Data lives in Sheets and spans multiple tables? -> Gemini Sheets multi-table
  • Data contains PII and cannot leave the perimeter? -> Local LLM pipeline (Polars + Ollama Qwen2.5)
  • Ad-hoc, one-off CSV? -> Claude Code Interpreter or ChatGPT

AI-Assisted Pivot Recommendation

Microsoft Research has developed a system that uses BERT embeddings to understand attribute significance in pivot tables and recommend meaningful pivot configurations to users13. In this approach, the agent evaluates not just numbers but also the semantic meaning of column names. For example, in the command “Pivot return rates during discount periods,” it logically matches which column represents discounts and which represents returns.

Token Optimization: Pre-Pivot Strategy

Sending pre-pivoted summaries instead of raw data to AI agents reduces token costs by 60-80%14:

import pandas as pd
import json

# Raw data: 10,000 rows of e-commerce data
df = pd.read_csv('sales.csv')

# Pre-pivot summary: For AI agent
pivot = df.pivot_table(
    index='Category',
    values=['Sales', 'Returns'],
    aggfunc={'Sales': 'sum', 'Returns': 'mean'}
).reset_index()

# Add flags: Points for agent to focus on
pivot['Status'] = pivot.apply(
    lambda x: 'Critical' if x['Returns'] > 3 else 'Stable', axis=1
)

# Send to agent in JSON format
context = pivot.to_json(orient='records', force_ascii=False)

This approach provides two critical advantages:

  1. Token savings: 10-20 row pivot summary instead of 10,000 row raw log
  2. Improved accuracy: Flag columns (Status, Trend) answer the agent’s “where should I look?” question and reduce hallucination risk

Browser-Based Agents

Browser-based AI agents navigating an e-commerce panel (Shopify, WooCommerce) can instantly transform complex on-screen tables into pivot structures and perform anomaly detection: generating alerts like “Sales are increasing while profit is declining.”

E-Commerce Pivot Analysis Practices

Common Pivot Patterns

PatternRow (Index)ColumnValuePurpose
Category performanceProduct categoryMonth/QuarterSales totalSeasonal trends
Cohort analysisAcquisition monthSubsequent monthsRetention rateCustomer loyalty
Channel x RegionMarketing channelGeographic regionRevenueChannel effectiveness
Product matrixSKUMetrics (revenue, returns, margin)ValuesProduct health
Customer LTVCustomer segmentPeriodLifetime valueSegment comparison

Pivot in the Modern Data Stack

In the modern data stack architecture (dbt, Looker, BigQuery), pivot operations typically occur in the transformation layer15:

  1. Data ingestion: Raw e-commerce data brought to warehouse via Fivetran/Stitch
  2. Transformation (dbt): dbt_utils.pivot() macro for SQL pivot operations, dynamic column creation without hardcoding
  3. BI layer: Looker/Tableau consumes dbt-transformed data
  4. Reverse ETL: Census/Hightouch pushes pivoted results back to CRM and marketing tools

ShopifyQL and Pivot

ShopifyQL, introduced by Shopify in June 2022, significantly simplifies querying e-commerce data compared to standard SQL16. Period comparison with COMPARE TO previous_year (an operation requiring self-joins in SQL), row/column totals with WITH TOTALS and GROUP_TOTALS, and percentage change calculation with WITH PERCENT_CHANGE can all be done in a single line:

FROM sales
  SHOW net_sales
  GROUP BY product_type
  SINCE -12m
  COMPARE TO previous_year
  WITH TOTALS, PERCENT_CHANGE
  VISUALIZE

However, ShopifyQL does not yet support true pivot/crosstab operations (transforming rows into columns). Subquery and CTE support is also absent. The Sidekick AI integration introduced in December 2025 enables generating ShopifyQL queries through natural language, and Sidekick Pulse provides proactive anomaly detection17.

For complex pivot scenarios, the recommended approach is: extract a GROUP BY summary via ShopifyQL, transfer to BigQuery via Fivetran or Airbyte, and apply true pivot operations there using DuckDB PIVOT or dbt_utils.pivot(). Analyzing BigQuery data directly as a Sheets pivot table via Google Connected Sheets also provides an effective alternative.

Web Scraping and Competitive Intelligence

In e-commerce, analyzing competitor pricing and market data through web scraping combined with pivot tables is a common practice.

Tool Landscape (2025-2026)

ToolTypeUse CaseStatus
Scrapy 2.12+FrameworkLarge-scale structured crawlingActive
Playwright 1.49+Browser automationJavaScript-heavy sites, SPAsActive
ScrapeGraphAIAI-poweredSemantic extraction, low maintenanceNew
Crawl4AIOpen sourceLLM-ready output, freeNew

Anti-Bot Challenges

As of 2025-2026, anti-bot protections have hardened significantly. Cloudflare Turnstile uses TLS/HTTP2 fingerprinting and JavaScript challenges; DataDome performs ML-based real-time bot detection. Puppeteer-stealth was discontinued in February 2025; active alternatives are Nodriver, SeleniumBase UC Mode, and Camoufox18.

ScrapeGraphAI’s AI approach requires 70% less maintenance than CSS selector-based scrapers. LLMs can understand page semantics regardless of HTML structure changes.

Scraping to Pivot Pipeline

from playwright.async_api import async_playwright
import duckdb

# Store data in DuckDB
con = duckdb.connect("competitor_prices.duckdb")

# Competitor price comparison: PIVOT
con.execute("""
    PIVOT price_history
    ON competitor
    USING AVG(price) AS avg_price, MIN(price) AS min_price
    GROUP BY product_sku
    ORDER BY product_sku
""")

From a legal perspective: respecting robots.txt, maintaining minimum 1-2 second delays between requests, and checking ToS are recommended. Pricing data is generally not considered personal data, though KVKK applies when scraping involves personal data.

WooCommerce Data Analytics

WooCommerce REST API v3 enables programmatic extraction of order, product, customer, and coupon data. Unlike Shopify’s 40 API requests per minute limit, WooCommerce offers unlimited access through direct database access19. For order filtering operations, see WooCommerce REST API - Filtering Orders.

API to Pivot

from woocommerce import API
import pandas as pd

wcapi = API(
    url="https://yourstore.com",
    consumer_key="ck_xxx",
    consumer_secret="cs_xxx",
    version="wc/v3"
)

# Fetch all orders with pagination
all_orders = []
page = 1
while True:
    response = wcapi.get("orders", params={"per_page": 100, "page": page})
    orders = response.json()
    if not orders:
        break
    all_orders.extend(orders)
    page += 1

df = pd.json_normalize(all_orders, sep='_')

RFM Analysis

import numpy as np

snapshot = df["order_date"].max() + pd.Timedelta(days=1)

rfm = df.groupby("customer_id").agg(
    recency=("order_date", lambda x: (snapshot - x.max()).days),
    frequency=("order_id", "nunique"),
    monetary=("line_total", "sum")
).reset_index()

# Score 1-5
for col in ["recency", "frequency", "monetary"]:
    labels = range(5, 0, -1) if col == "recency" else range(1, 6)
    rfm[f"{col}_score"] = pd.qcut(
        rfm[col], q=5, labels=labels, duplicates="drop"
    ).astype(int)

# Segment pivot
rfm_pivot = rfm.pivot_table(
    values="monetary",
    index="segment",
    aggfunc=["count", "mean", "sum"]
).round(2)

WooCommerce MCP Integration

WooCommerce 10.3 (2025) introduced the first official MCP (Model Context Protocol) integration. This enables AI agents to directly search, add, and update products.

BigQuery Integration Paths

ToolMethodCost
AirbyteOpen-source ETLFree (self-hosted)
n8nWorkflow automationFree (self-hosted)
Coupler.ioNo-code schedulerFrom $49/mo
Direct DBServer-side tracking -> BigQueryInfrastructure cost only

BigQuery E-commerce Pivot Operations

BigQuery’s native PIVOT operator enables powerful analysis on GA4 e-commerce data20. If you are familiar with the data warehouse concept, BigQuery can be thought of as its cloud-native implementation.

GA4 Data PIVOT

SELECT *
FROM (
    SELECT
        items.item_category AS category,
        event_name,
        items.price * items.quantity AS revenue
    FROM `project.analytics_XXXXXX.events_*`,
    UNNEST(items) AS items
    WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
    AND event_name IN ('purchase', 'add_to_cart', 'view_item')
)
PIVOT (
    SUM(revenue) AS total
    FOR event_name IN ('purchase', 'add_to_cart', 'view_item')
);

BigQuery ML

BigQuery ML enables model creation directly via SQL for e-commerce predictions:

  • ARIMA_PLUS: Daily revenue forecasts, Turkish holiday calendar support (holiday_region = 'TR'), up to 100 million time series simultaneously
  • K-means: Customer segmentation from RFM scores, automatic scaling with standardize_features = TRUE

Cost Optimization

TechniqueSavingsImplementation
PartitioningUp to 90% scan reductionPARTITION BY DATE(event_timestamp)
Clustering30-60% additional reductionCLUSTER BY event_name, user_id
Materialized ViewsAuto-refreshed pre-aggregationsCREATE MATERIALIZED VIEW
Column selectionVariableNever use SELECT *

BigQuery on-demand pricing: first 1 TB/month free, then $6.25/TB. A typical e-commerce store (500K orders/year) with monthly query volume of 500 GB stays within the free tier.

BI Engine automatically caches frequently accessed data, providing sub-second query response and 4-10x performance improvement.

Privacy-Focused Analytics with Local LLMs

Sending customer data (names, emails, addresses, purchase history) to cloud LLM APIs constitutes cross-border data transfer under KVKK. Local LLMs solve this by design21.

Ollama Setup

ollama pull qwen2.5-coder:14b    # Code generation + data analysis
ollama pull mistral-nemo:12b      # Multilingual, general analysis
ollama pull phi4:14b              # Strong reasoning

VRAM Requirements (Q4_K_M Quantization)

ModelParametersVRAMBest For
Qwen2.5-Coder 7B7B6-7 GBCode generation, SQL writing
Qwen2.5-Coder 14B14B10-12 GBComplex analytics code
Mistral Nemo 12B12B8-10 GBMultilingual, general analysis
Phi-4 14B14B10-12 GBReasoning, math
DeepSeek-R1 14B14B10-12 GBChain-of-thought reasoning

Hardware tiers: 8 GB VRAM (RTX 4060) for 7B models, 12-16 GB (RTX 4070 Ti, M2 Pro/Max) for 12-14B models, 24 GB (RTX 4090, M3 Max) for 32B models.

Pivot Summary with Local LLM

import ollama
import json

pivot_summary = """
Monthly Revenue (Category):
| Category    | Jan    | Feb    | Mar    |
|-------------|--------|--------|--------|
| Electronics | 45,200 | 38,100 | 52,300 |
| Clothing    | 22,100 | 19,800 | 28,500 |

RFM Segments:
| Segment     | Count | Avg Spend | Avg Recency |
|-------------|-------|-----------|-------------|
| Champions   | 342   | $1,250    | 8 days      |
| At Risk     | 891   | $480      | 67 days     |
"""

response = ollama.chat(
    model="qwen2.5-coder:14b",
    messages=[{
        "role": "user",
        "content": f"Analyze this e-commerce pivot data. "
                   f"Return JSON only.\n{pivot_summary}"
    }],
    format="json"
)

KVKK Compliance

KVKK (Turkey’s Personal Data Protection Law), with its 2025 update, raised administrative fines to a range of 68,083 TL to 13.6 million TL (43.93% increase). The “Generative AI and Personal Data Protection” guideline published in December 2025 mandates “privacy by design” principles for LLM developers22.

When using local LLMs: embedding generation, vector search, and LLM inference all occur in the local environment. No data leaves the premises.

Hybrid Architecture

Layer 1: Data Extraction (Local)
  WooCommerce API -> Python -> Polars -> DuckDB
  Cost: $0 (compute only)

Layer 2: Pivot + Summarization (Local LLM)
  Raw data -> pivot tables -> Ollama (Qwen2.5-Coder 14B)
  Cost: ~$0 (electricity, ~$0.02/hr GPU)
  KVKK compliant: all PII stays on-premises

Layer 3: Strategic Insights (Cloud LLM, optional)
  Pre-pivoted, anonymized summaries -> Claude / GPT-4o
  Cost: ~$0.02-0.04 per query

This approach delivers 85-95% cost reduction while maintaining KVKK compliance.

LLM Limitations with Tabular Data

As AI agents increasingly leverage pivot tables, understanding these models’ limitations with tabular data becomes critical.

Structural Mismatch

Tabular data has a two-dimensional, relational structure, while LLMs are trained with a one-dimensional, autoregressive objective. When a table is serialized to text, row and column boundaries blur, the model becomes sensitive to row/column shuffles that should not matter, and context window limits are hit with large datasets23.

Numerical Precision Failures

Numbers are processed as character sequences during tokenization, losing magnitude and numerical order information. Research across six major LLMs found factual hallucination rates between 59% and 82%, with numerical recall reliability near zero24.

Table Understanding Benchmarks

BenchmarkYearScopeFinding
SUC (Microsoft)WSDM 20247 basic table tasksBest accuracy: 65.43%
TableBenchAAAI 2025886 instances, 18 subcategoriesEven GPT-4 significantly behind humans
RealHiTBenchACL 2025708 hierarchical tables, 3,752 QAHierarchical headers challenged all models
MMQAICLR 2025Multi-table, multi-hop reasoningAll models behind human performance
MMTU202530,000+ questions, 25 tasksReasoning models +10pp advantage
TReB202526 sub-tasks, 3 inference modesICoT (interleaved chain-of-thought) best

Table transposition accuracy is approximately 50% (near random chance), while header identification is 94-97% accurate. This means models can understand table content but fail at structural transformation25.

Serialization Format Comparison

Based on tests across 11 different formats26:

FormatAccuracyToken Usage
Markdown KV60.7%2.7x (baseline)
XML56.0%2.3x
YAML54.7%1.9x
Natural language49.6%3.0x
CSV44.3%1.0x (lowest)

Markdown KV provides the highest accuracy but consumes 2.7x more tokens than CSV. When token budget is constrained, CSV is recommended; when accuracy is the priority, Markdown KV is preferred.

Solution Strategies

  1. Code generation approach: Have the LLM write pandas/SQL code rather than analyze data directly. Delegates computation to deterministic tools
  2. Schema + sample rows: Send column names, data types, and first 5 rows instead of the entire table
  3. ICoT (Interleaved Chain-of-Thought): Alternate between text reasoning and code execution. Achieved the best results in the TReB benchmark
  4. Code Interpreter: Claude and GPT’s code execution tools completely eliminate numerical hallucination
  5. SpreadsheetLLM (Microsoft, EMNLP 2024): A compression approach achieving 96% reduction in token usage and 25.6% improvement in table detection27

Cost and Token Optimization

Raw Data vs. Pivot Summary Comparison

ApproachRowsTokens (Approx.)Cost (GPT-4o)
Raw order data (10K)10,000~250,000~$0.625
Category+Month pre-grouping~200~10,000~$0.025
Summary statistics only~20~2,000~$0.005

Pre-pivot enables 95-99% token savings. CSV format consumes 40-50% fewer tokens than JSON.

API Pricing (February 2026)

ModelInput/1M TokensOutput/1M TokensBatch
GPT-4o$2.50$10.0050% off
GPT-4o mini$0.15$0.6050% off
Claude Sonnet 4.5$3.00$15.0050% off
Claude Haiku 4.5$1.00$5.0050% off
Gemini 2.0 Flash$0.10$0.40-
Local (Ollama)~$0~$0-

Anthropic offers a prompt caching mechanism that reduces input token costs by 90% for repeated queries. OpenAI Batch API delivers results within 24 hours at a 50% discount.

Decision Matrix

ScenarioRecommended Approach
Data contains PIILocal LLM only (KVKK requirement)
Data < 100K rows, simple pivotpandas/Polars, no LLM needed
Data > 1M rows, complex pivotBigQuery ($6.25/TB) or DuckDB (free)
Natural language insights from pivotsPre-pivot locally, send summary to cloud LLM
Batch processing (daily reports)Batch API (50% off), overnight processing

Practical Guide: Which Tool for Which Scenario?

ScenarioRecommended ToolWhy
Quick exploration, small dataExcel / Google SheetsDrag-and-drop, visual
Natural language pivotExcel Copilot / Gemini”Sales pivot by region”
Analysis on BigQueryGoogle Connected SheetsBillions of rows, Sheets interface
Programmatic analysis (medium data)pandas pivot_table()Widespread, well-documented
Big data (million+ rows)Polars5-10x faster, low memory
SQL-based pivotDuckDB PIVOTNative SQL syntax
Statistical analysisR tidyverse / data.tablepivot_wider, ggplot2
Browser pivotAG Grid / FlexmonsterLarge data, enterprise
AI agent preparationpandas/Polars + JSONPre-pivot, flag addition
Data engineeringdbt_utils.pivot()Repeatable, version-controlled
Competitor price trackingScrapy/Playwright + DuckDBScraping -> pivot pipeline
WooCommerce analyticswc-api-python + PolarsREST API, unlimited access
BigQuery pivotPIVOT operator + BI EngineGA4, billions of rows
Privacy-focused analyticsOllama + Qwen2.5-CoderKVKK compliant, free

Three Pitfalls AI Overview Skips

AI Overview can summarize the topic of pivot tables, but in production I keep hitting three quiet pitfalls it leaves out. Each carries measurable error cost.

Pitfall 1 — Asking the LLM to do the math on a raw table. LLMs hallucinate numbers 59-82% of the time and table transposition accuracy is around 50% (no better than random)23. Telling ChatGPT or Claude “give me totals by region from this CSV” gets you fabricated numbers because the model invents the math. Fix: never let the LLM compute. Use pandas, polars, or DuckDB deterministically; let the LLM interpret. Tools like Code Interpreter and Claude for Excel already follow this rule (they write and execute code rather than guessing values).

Pitfall 2 — Sending raw data to the context instead of pivoting first. A 100K-row CSV is roughly 500K tokens. Cloud API costs spike to dollars per call, the context window blows up, and “lost in the middle” effects let the model ignore the heart of the data. The same data pivoted comes in at 2-5K tokens, a 95-99% savings with better answers and lower cost. Pre-pivot plus flag enrichment (mark anomalies in the pivot, anchor the LLM commentary to those flags) is the highest-ROI pattern in this article.

Pitfall 3 — Sending customer PII to a cloud LLM raw. Cross-border transfer falls under KVKK Article 9 and requires explicit consent. Pasting names and emails into ChatGPT may be convenient but it crosses a compliance line. Two-layer fix: (a) if cloud LLM is necessary, hash and mask PII columns (as in the notebook companion); the pivot distribution stays intact while identifiers do not; (b) if you want the data to stay on-premise entirely, build a pipeline with Ollama Qwen2.5-Coder so nothing leaves the machine.

These three are the most common, least discussed failure modes in pivot + AI workflows. The companion notebook demonstrates working code for all three.

Conclusion

Pivot table maintains its position as one of the fundamental tools of data analytics. However, as of 2026, its role has evolved significantly: no longer just a report table for humans to read, but an intermediate layer where AI agents break data into digestible chunks. Pivot tables can now be created through natural language via Excel Copilot and Google Gemini, millions of rows can be pivoted within seconds with Polars, and pre-pivot summaries can reduce AI agent token costs by 60-80%.

In e-commerce analytics, pivot table is the most direct way to answer critical business questions such as cohort analysis, category performance, and customer segmentation. In the modern data stack, pivot operations are defined once in the dbt transformation layer and consumed by all downstream consumers.

However, LLMs have structural limitations when working with tabular data: transposition accuracy at 50%, numerical hallucination rate at 59-82%. Therefore, rather than having LLMs perform calculations, having them generate code, using deterministic tools (pandas, DuckDB, BigQuery) for computation, and positioning LLMs solely for content generation and natural language insights remains the most effective strategy. Local LLMs, with their KVKK compliance and zero marginal cost advantages, offer an increasingly powerful alternative for e-commerce data analysis.

Pivot table usage is also covered in detail across different platforms: Looker Studio Pivot Table and Google Analytics Pivot Table.

Footnotes

  1. Data Model specification and limits - Microsoft Support
  2. DAX function reference - Microsoft Learn
  3. Agent Mode in Excel - Microsoft Tech Community
  4. Powerful pivot tables in Connected Sheets - Google Workspace Updates
  5. Gemini in Google Sheets - Google Workspace Updates
  6. pandas 3.0.0 What’s New
  7. Polars PDS-H Benchmark 2
  8. DuckDB PIVOT Statement
  9. tidyr Pivoting Vignette
  10. data.table CRAN
  11. Arquero - Columnar JavaScript Data
  12. Claude Code Interpreter
  13. Pivot Table Recommendation with Semantic Embeddings
  14. A Guide to Token-Efficient Data Prep for LLM Workloads - The New Stack
  15. dbt_utils pivot macro
  16. ShopifyQL - Commerce Data Querying Language
  17. Shopify Editions Winter ‘26
  18. Web Scraping Trends for 2025 and 2026
  19. WooCommerce REST API Documentation
  20. How to Create Pivot Table with GA4 Data in BigQuery
  21. Run LLMs Locally with Ollama: Privacy-First AI 2025
  22. KVKK 2025 Updates: A Compliance Guide for Companies
  23. Why LLMs Struggle with Your Spreadsheet Data - Tryolabs 2
  24. Guide to Hallucinations in Large Language Models - Lakera
  25. Table Meets LLM: Can Large Language Models Understand Structured Table Data? - WSDM 2024
  26. Which Table Format Do LLMs Understand Best?
  27. SpreadsheetLLM - Microsoft Research, EMNLP 2024
Key Takeaways
  • 01 Pivot table is no longer a static summary tool but a method for AI agents to break data into digestible chunks
  • 02 Polars offers 5-10x faster pivot operations than pandas on large datasets
  • 03 LLM table transposition accuracy is 50%, numerical hallucination rate is 59-82% -- computation should be delegated to deterministic tools
  • 04 Local LLMs (Ollama + Qwen2.5-Coder) provide KVKK-compliant, zero marginal cost e-commerce data analysis
  • 05 Pre-pivot achieves 95-99% token savings; hybrid architecture (local pivot + cloud LLM insight) is the most cost-effective approach
Frequently Asked Questions (FAQ)
+ How much faster is Polars than pandas for pivot operations?

According to benchmarks, Polars runs 1.6 to 8.6 times faster than pandas for grouping and pivot operations depending on data size. Approximately 8.6x faster for 100 million row datasets and approximately 1.6x faster for smaller 10,000 row datasets.

+ Why should pre-pivoted summaries be sent to AI agents instead of raw data?

Sending raw data quickly consumes token limits and increases hallucination risk. Pre-pivoted summaries with added flags reduce token costs by 60-80% while also improving model accuracy.

+ How can pivot tables be created with Excel Copilot?

Microsoft 365 Copilot Agent Mode (generally available since December 2025) enables pivot table creation using natural language. For example, the command 'Create a quarterly sales pivot by region' causes Copilot to analyze the data and automatically create the appropriate PivotTable.

+ Can BigQuery data be pivoted in Google Sheets?

Yes. Through Google Connected Sheets integration, billions of rows of BigQuery data can be analyzed as pivot tables in the Sheets interface. The October 2025 update increased pivot table row capacity to 200,000.

+ What problems do LLMs face when working with pivot table data?

LLMs show approximately 50% accuracy in table transposition (near random chance) and 59-82% numerical hallucination rates. The solution is to have LLMs generate code rather than perform calculations directly, using deterministic tools (pandas, DuckDB) for computation.

+ What is the advantage of analyzing e-commerce data with local LLMs?

Local LLMs (Ollama + Qwen2.5-Coder) ensure KVKK compliance as no data leaves the premises. Marginal cost is near zero, and with hybrid architecture, 85-95% cost reduction compared to cloud LLMs can be achieved.