Most agentic commerce coverage focuses on the closing step. Agent reaches checkout, agent completes purchase, money moves. It’s the most measurable thing, so it gets the headlines.
The more important shift is happening upstream. In the last six months, every major AI platform shipped a product variant that does multi-step research on behalf of a buyer. ChatGPT’s agentic mode that can navigate the web and synthesize across sources. Anthropic’s Computer Use successors that string together sequences of actions. Google’s Mariner-derived products. Apple’s Intelligence layer that’s now actually scheduling, planning, and recommending across apps. These aren’t checkout tools. They’re discovery tools. They’re going to reshape what “top of funnel” means.
If you’re a merchant who has been investing in the closing step (agent-ready checkout, ACP integration, instant-payment flows), that work is necessary. It isn’t sufficient. The buyer who hands a personal agent the full job (“plan our anniversary weekend, including gift”) is going to be a buyer your closing infrastructure never sees, because by the time anything close to checkout happens, the agent has already done the discovery, narrowed the field, and made a recommendation. Your store either earned a spot in that narrowing or it didn’t.
This post is about what the upstream shift looks like, what the platforms are actually shipping, and what merchants need to think about that they probably aren’t thinking about yet.
What “personal agent” actually means now
The phrase has been around for years. The product category was vapor until late 2024. What’s different now is that the agents can hold context across multiple steps, navigate the web (or apps) without breaking, and remember preferences across sessions.
The buyer experience that didn’t exist twelve months ago and exists now: “I want to plan a long-weekend getaway in May for my anniversary. We like food. Budget around $1,500 not including flights. Surprise me with a destination and put together an itinerary.” The agent goes away for several minutes, comes back with a destination, a hotel, three restaurant reservations, an activity recommendation, and a list of items to pack. The buyer didn’t browse anything. The buyer briefed the agent.
This pattern was a research demo in 2024. It became a consumer-grade product in 2025. By Q4 2025 it was reliable enough that the early adopters were actually using it. The early signals across the agent surface suggest that for certain query categories (gifts, travel, multi-product purchases, “plan-the-thing” requests), the share of buyers who use a personal agent to do the discovery is climbing meaningfully among younger buyers.
The platforms
A non-exhaustive snapshot of the personal-agent landscape as of early 2026:
ChatGPT’s agentic mode is the most-used product in this category. It does long-running task execution: research a topic across many sources, plan multi-step itineraries, compare products with constraints, summarize and recommend. It can take actions (browse, fill forms, draft documents). It hooks into Instant Checkout for the purchase step where the merchant supports it.
Anthropic’s Computer Use line is the most reliable for vision-and-DOM-based interaction with apps that don’t have clean APIs. It’s the agent most likely to actually operate your storefront the way a human would, including clicking buttons, scrolling, and reading visible content. The buyers using it are often using it for tasks that involve apps and services that haven’t built agent-specific endpoints.
Google’s Mariner-derived products are integrated tightly with Search, Maps, Shopping, and the rest of the Google surface. The strength is the breadth of data Google can pull in (location, history, calendar). The weakness is the lock-in: the agent works best inside Google’s ecosystem.
Apple Intelligence is the highest-distribution variant by raw user count, because it’s on hundreds of millions of iPhones by default. It’s also the most constrained. The actions it can take are limited to apps that have integrated, and the discovery surface skews toward the apps Apple has prioritized. For brands with strong Apple ecosystem alignment, the surface is meaningful. For brands without, it’s harder to reach.
Perplexity Comet is a research-first agent that excels at the synthesis step. It’s particularly strong on multi-source comparisons. The conversion paths are narrower than the others, but the quality of the recommendation when Comet does recommend is high.
Each of these works differently. The buyer experience is similar enough that consumers are starting to treat them as interchangeable for “agent-mediated discovery.” Merchants who think about them as one category will misunderstand the integration work each one demands. Merchants who think about them as five different platforms will end up with five different integration projects. The right framing is in between: there’s a common surface (structured data, honest disclosure, fast endpoints) plus per-platform refinements.
What this means for discovery
When a buyer goes from “I want shoes” to “the agent recommended these three pairs,” many decisions have already happened that your storefront isn’t part of.
The shortlist is built from public signals: editorial coverage that the agent surfaced, structured data the agent indexed, llms.txt files that the agent parsed, AggregateRating data the agent weighted. Your storefront contributed if it was discoverable and legible. If it wasn’t, the shortlist was built without you.
The narrowing is done by the agent applying the buyer’s constraints to the shortlist. Budget, color, size, style, brand preferences, ethical constraints (sustainability, made-in, etc.), shipping cutoffs. The agent does this against structured data and policy disclosure. Your storefront contributed if your data was clean, accurate, and specific. If it wasn’t, the narrowing dropped you for being ambiguous or contradicted.
The recommendation is presented to the buyer with reasoning. The reasoning cites specific attributes (“rated 4.6 stars, ships in 2 days, includes a 60-day return window”). The buyer sees the recommendation, reads the reasoning, and decides. Your storefront contributed if the cited attributes are favorable and verifiable. If they aren’t, the buyer reads about a competitor and clicks that link instead.
By the time anyone “lands” on your site, the work of being chosen is mostly done. The site visit might be a confirmation visit (“let me see the product page before I commit”) or a checkout-step visit. The conversion will happen disproportionately on these high-confidence sessions, because the agent already filtered the buyers who weren’t ready.
What gets compromised
The merchants who’ve been optimizing for the closing step are doing necessary work. ACP integration matters. Instant Checkout participation matters. Fast PDP load times matter. These don’t substitute for being chosen in the discovery step.
What gets compromised when you optimize only for the closing step is the volume of high-confidence sessions you receive in the first place. The closing step optimizes the conversion of the buyers who reach you. The discovery step determines how many of them reach you. The math is multiplicative, not additive.
Concretely: a merchant whose PDP converts agent-referred traffic at a great rate sounds like a great outcome until you compare it to a competitor whose PDP converts at a worse rate but receives many times the agent referrals. The competitor wins on revenue. The first merchant wins on conversion-rate dashboards. The upstream volume is what makes the difference, and the upstream volume is determined by whether the agent picks you in the discovery step.
The upstream work is what determines the referral volume. It’s also harder to measure, slower to compound, and easier to deprioritize. The merchants who are winning the personal-agent surface this year are the ones who treat the upstream work as a budget priority, not as something that happens after the technical integrations are done.
What the upstream work looks like
If you want to be on the shortlist for personal-agent discovery in your category, the work is:
Make sure the agent can find you in the first stage. This means clean editorial coverage, structured data on every product page, an llms.txt that declares your category and your top SKUs, and an agent-discoverable sitemap. The agents do their first-pass shortlist from public signals. Be present in those signals.
Make sure your structured data answers the questions agents ask. Not the questions you’d write for a marketing landing page. The questions a buyer would ask an agent: temperature ratings, dimensions, compatibility, ingredient lists, sourcing, return windows. Put the answers in JSON-LD or in /agent/query endpoints. The agents that need to give specific answers reward sources that have specific answers.
Make sure your honesty bar is high. The agents that have learned to detect specificity asymmetry, cherry-picked comparisons, and review-distribution gaming will downweight you for any of these. The fix is to surface negatives confidently, present full review distributions, and write product descriptions that name real limitations. This is uncomfortable for marketing teams that have spent a decade optimizing for human visitors. It’s necessary for agent visibility.
Make sure your speed is right. Personal agents are running long-running tasks. Slow responses get dropped from the consideration set. Gemini specifically times out at two seconds. Anthropic’s CUA degrades after five. The performance target for agent-visible endpoints is tighter than for human-facing pages. Treat it that way.
What we’ll watch through Q2
The platform releases will keep coming. ChatGPT’s agentic mode is getting more reliable each quarter. Anthropic’s Computer Use is getting faster and cheaper. Google’s Mariner-derived products are getting better at handling sites it hasn’t seen before. Apple Intelligence is gradually opening up to more third-party app integrations.
The interesting question for merchants is which buyer behavior changes go from “early adopter” to “default” first. We think gift purchasing and multi-product planning (events, trips, complex outfits, gift baskets) tips first, because the buyer-side value of delegating is highest there. Single-item replenishment (your usual coffee, your usual face cream) follows, because the agent can hold the preference across sessions. Single-item discovery for high-consideration purchases (cars, expensive furniture, technical products) takes longer, because the buyer wants to be involved.
The merchants whose categories tip first will see the upstream shift first. The merchants whose categories tip later will get a few extra quarters of buffer to learn from the early movers.
Either way, the closing-step work isn’t enough. The discovery step is where the action is now. Worth allocating accordingly.