From AI Content to Agentic Commerce: How Generative AI Has Changed for Retailers Since 2023
May 1, 2025
Watch the original 2023 video, or keep reading for the 2026 update below.
This post was originally published in 2023, when generative AI's role in ecommerce was a much narrower question than it is today. We're keeping the original framing as a record of what we — and the industry — believed at the time, but the bottom-line advice has shifted dramatically. Read on for what's still true, what's not, and what retailers should actually be doing with AI in 2026.
What We Believed About AI in 2023
In a breakthrough study published that year, Boston Consulting Group (BCG) partnered with Harvard, MIT, the University of Pennsylvania, and the University of Warwick to test the efficacy of generative AI on workplace productivity across 750 consultants at BCG. The findings were striking: according to the study, generative AI created powerful competitive advantages in areas where it had a firm competence (for example, creative product innovation) and dramatically reduced performance in areas where the models were underdeveloped (like business problem-solving).
The study found that, counterintuitively, GenAI models tended to perform better on tasks that required them to "come up with creative, novel, or useful ideas based on the vast amounts of data on which they have been trained." When participants used GPT-4, about 90 percent improved their performance and reached a level of performance roughly 40 percent higher than participants working on the same task without GenAI.
The same study found that GenAIs broke down and underperformed their human counterparts when the models "are asked to weigh nuanced qualitative and quantitative data to answer a complex question." On those tasks, human participants using GenAI performed 23 percent worse than peers who used no AI at all.
On the back of findings like these, our 2023 recommendation to retailers was straightforward: use generative AI for the right kinds of work — laborious creative tasks within your realm of comprehension — and avoid using it for anything that required complex analysis. The most consistent marketing application we'd found at the time was rapid generation of SEO topics and content. We also warned readers about wrapped AI tools (UI layers around foundational models like ChatGPT) and stressed that AI-generated content always needed a human editor.
In 2026, almost none of that advice holds up.
Five Things That Have Changed Since 2023
1. Reasoning Models Closed the Analytical Gap
The single biggest shift is that frontier models now spend extra compute reasoning step-by-step before answering. OpenAI's o-series, Anthropic's extended-thinking modes, and Google's Gemini reasoning modes will all "think" for seconds or minutes on hard problems, checking their own work as they go. The result: today's frontier models routinely beat human experts on PhD-level math, science, and quantitative reasoning benchmarks like GPQA, AIME, and FrontierMath. The exact task BCG found GenAI failing at — weighing nuanced quantitative and qualitative data to answer a complex question — is now a strength, not a weakness.
2. Tool Use Made Outputs Verifiable
In 2023, the only thing inside an AI's response was whatever it had been trained on. By 2026, every major model can run code, query databases, search the live web, and check its own work against external sources. That doesn't eliminate hallucinations — but it changes the failure mode. The 2023 caveat "AI cannot be trusted without human oversight" has matured into "AI cannot be trusted without the right tools and verification loops." For retailers, the practical difference is huge: an AI agent can now look up real inventory, real prices, and real shipping times instead of guessing.
3. Multimodal Capabilities Are Now Standard
Modern frontier models read images, watch video, listen to audio, and produce structured outputs natively. For ecommerce, that means a model can look at a product image and answer questions about it, watch an unboxing video and summarize the customer's reaction, or process voice input from a shopper as easily as text. Use cases that required separate vision or speech infrastructure in 2023 are one API call today.
4. Agents Can Actually Take Action
In 2023, AI was a tool you queried. In 2026, AI is a worker you delegate to. OpenAI's Operator can drive a browser; Anthropic's Claude has Computer Use built in; Google's Mariner navigates and interacts with web pages on a user's behalf. These agents can fill forms, complete checkouts, manage subscriptions, and run multi-step workflows. The implications for retailers are profound — and we'll come back to them below.
5. Inference Costs Collapsed
Per-token costs for capability comparable to 2023's GPT-4 have fallen by roughly two orders of magnitude. What used to cost $30 per million input tokens now costs $1–3 — and the new frontier models are dramatically more capable than what was at the frontier three years ago. This isn't a technical curiosity; it's what makes agentic workflows, real-time personalization, and AI-driven customer service economically viable at scale.
What This Means for AI in Ecommerce
Two pieces of our 2023 advice have aged especially badly.
"Use AI to rapidly generate SEO content." Since Google's Helpful Content updates in 2023 and 2024 and the continued tightening of AI-content quality signals through 2025, pure AI-generated SEO content has become a liability rather than an asset. Sites publishing it at scale have seen rankings drop. The current best practice is using AI to research and outline content, with human writers — or carefully scaffolded multi-agent workflows — producing the final copy. The goal has flipped from "produce more content faster" to "produce content that holds up to E-E-A-T scrutiny."
"Avoid using AI for complex analysis." This is now wrong on its face. AI is genuinely useful in 2026 for the kinds of tasks 2023 said it couldn't handle: market research, cohort analysis, pricing optimization, fraud detection, customer service triage. The right caveat isn't "don't use AI for analysis" — it's "use the right model with the right tools, and verify against ground truth."
What hasn't changed: AI still needs human oversight on high-stakes decisions, you still want to use foundational models (or platforms built on them) rather than thin wrappers, and you still need to audit outputs in domains you actually understand. Those are evergreen.
The Bigger Shift: From AI Content to Agentic Commerce
The most important change isn't the capability gains in any single area — it's that the conversation in commerce has shifted from "how do we use AI to write content?" to "how do AI agents transact?" Agentic commerce — AI systems that browse, reason, compare, and increasingly complete purchases on a user's behalf — is reshaping how shoppers discover products and check out. Each of the leading foundational models is approaching it from a different angle, and where you should be paying attention depends on what part of your customer journey you're trying to influence.
How ChatGPT, Claude, and Gemini Compare for Agentic Commerce
OpenAI's ChatGPT (GPT-5 family)
ChatGPT remains the consumer-facing default — hundreds of millions of weekly users now treat it as a starting point for product research, especially in high-consideration categories. OpenAI's Operator agent can drive a browser to complete real purchases on a user's behalf, and an expanding set of payments partnerships is making ChatGPT-initiated checkout increasingly seamless. Strongest for: consumer-facing product discovery, native checkout flows initiated inside ChatGPT, and reaching the largest pool of AI-mediated shoppers.
Anthropic's Claude (Opus 4.6, Sonnet 4.6, and Haiku 4.5)
Claude leads on careful, multi-step reasoning — useful when an agent has to compare specs, read reviews, and weigh trade-offs across many products before recommending or buying. It also natively supports the Model Context Protocol (MCP), an open standard that lets agents talk to live merchant systems — inventory, order history, customer records — without custom integrations. Strongest for: complex agentic workflows, developer-built commerce agents, and merchant-side automation where reliability and safety matter.
Google's Gemini (Gemini 2.5 family)
Gemini is tightly stitched into Google's commerce stack — Search, Shopping, Maps, Workspace, Android — giving it the broadest "view" of what a shopper has actually looked at across the open web. Its million-token context window can ingest an entire product catalog or a long order history in a single prompt. Strongest for: high-volume catalog-level reasoning, multimodal (especially video) product understanding, and discovery through Google Shopping surfaces.
Preparing Your Store for AI Shoppers
For most retailers, the practical takeaway isn't to pick one model — it's to make sure your store, your product data, and your post-purchase systems are legible to all three. The merchants who set up cleanly for agentic discovery and checkout now will be the ones AI-driven shoppers find, and transact with, first.
If you're thinking through how to position your store for agentic commerce, book a free demo using the form on this page and we'll show you how to get ready.