Agentic commerce is emerging – where AI agents are empowered to search for and execute transactions, operating within user-defined limits. 

In the last three months, Mastercard unveiled its Agent Pay initiative and Visa introduced Intelligent Commerce. At the same time, agent-specific payment infrastructure is rapidly taking shape. PayPal has released a toolkit that enables agent frameworks, such as OpenAI’s Agents SDK, to integrate directly with its APIs for agent-led transactions. Startups like PayOS have launched credit card-native wallets for agentic transactions which let users link a card once with human-in-the-loop controls, such as setting a maximum amount the agent can spend and security.

Yet a recent Stanford University study (Shenzhe Zhu et al) suggests Agentic AI has some way to go in securing you the best deal.

Study design

The researchers designed an experimental setting in which a buyer agent attempts to negotiate a lower price for a product based on a user-defined budget, while a seller agent aims to maximise its profit. Both agents know the retail prices of the product, however only the buyer agent knows the user’s budget and only the seller agent knows the wholesale cost.

The buyer agent prompt defines its persona as a cost-sensitive, realistic and goal-driven negotiator. The seller agent prompt guides it to present prices, justify value propositions and respond to buyer objections in a persuasive and professional manner, balancing willingness to negotiate with profit-preserving strategies. The buyer agent is instructed not to pay above the user’s budgeted level and the seller agent is instructed not to sell below the wholesale cost.

The buyer agent initiates the negotiation with an expression of interest in the product and a first offer. Then the two agents take turns to continue this negotiation until a termination condition (sale or no deal) is met.

The researchers used 100 real-world products in three categories: motor vehicles, electronic devices and real estate, to test for any differences in buyer agent behaviour on the buyer’s capacity and willingness to pay. The researchers set different budget levels for each product: high, retail, mid, wholesale cost and low.

Key takeaways

There is a significant difference in performance between AI models. Larger, more advanced models consistently perform better as buyers. In the graph below, the models in the top right-hand corner appear as both stronger buyer agents (scoring a higher price reduction rate (PRR)) and stronger seller agents (scoring a lower PRR).

Scatter plot charting AI models by Buyer Price Reduction Rate (%) on the x-axis and Seller Price Reduction Rate (%) on the y-axis (inverted axis: higher seller value is lower on the plot). GPT-3.5 shows the highest reduction rates for both parties. GPT-4o-mini, DeepSeek-V3, and DeepSeek-R1 favor sellers with lower reduction rates. o3 and GPT-4.1 cluster in the top-right, offering moderate reductions to both parties. Color and size represent relative profitability across models.

As seller agents, some models are much better at balancing rates of successful sales with reasonable levels of profitability. In the graph below, the models in the top left-hand corner offer high discounts off the retail price but achieve low sales success rates, while the models in the bottom left-hand corner achieve more sales but offer smaller discounts.

Bubble chart showing various AI models plotted by Deal Rate (%) on the x-axis and Average Profit Rate (%) on the y-axis. GPT-4o-mini has the highest profit rate (~26%) at a ~37% deal rate. GPT-3.5 appears lowest on both axes. Bubble sizes and colors indicate Relative Profit, with GPT-4.1 achieving the highest (13.3x). Other notable models include o3, DeepSeek-V3, and Qwen2.5-7B. The chart demonstrates trade-offs between deal frequency and profit efficiency across models.

In the real world, buyers and sellers often will be using models sourced from different suppliers, earlier or later versions in the same developer’s model series or models which are smaller or larger than the other party’s model. This disparity in negotiation capability between models could have real impact on users. If the buyer agent is a stronger model and the seller agent is a weaker model, the retail price reduction is 5-11% higher compared to strong buyer vs strong seller scenarios. 

When a weak buyer agent negotiates with a strong seller agent, the buyer can pay 2% more. The researchers saw this as a more likely outcome in business-to-consumer scenarios.

"While the number may seem small, once the agents are deployed in the real world at scale, this could create systematic disadvantages for people using these agents. For example, when lay consumers use small but on-device models to make automated negotiations with big merchants who use large and capable models running on cloud services, the cumulative economic loss for lay consumers will become significant."

There are dangers for sellers as well: the study saw weaker seller agents losing up to 14% in profit compared to negotiations between AI agents of equal capability.

Despite explicit instructions, agents can act outside their user-imposed constraints. When acting as buyer agents, weaker models like GPT-3.5 and Qwen-7B frequently breach constraints, accepting deals above their budget in over 10% of all cases: for example, a buyer agent instructed to buy a smart phone with a budget of $500 paid $900.

Further, while all models consistently stuck with the budget constraint when the budget was high, the models which showed a willingness to exceed budgets did so when the budgets were low. For example, the budget busting rate of Qwen2.5-7B was 0% for a high budget but 18.7% for a mid-budget and 20.2% for a low budget.

Models also displayed behaviour that could be considered as naïve or illogical. Although models knew what the retail price is for a product, many models paid materially over the retail price. This is a particular risk when agents are given a high budget. When the researchers dug further, they found that, despite being instructed not to do so, the buyer agent would reveal its budget when asked by the seller agent. 

Lower budgets stimulate stronger negotiating behaviour from buyer agents. Despite instructions to seek the best possible commercial deal, if the agent has a high budget, it will quickly accept an offer as soon as it hits the budgeted level. On average, users with high budgets will pay 10% more for the same products as users with low budgets.

Agents can become trapped in negotiation deadlock when agents become overly fixated on continuing the negotiation. For example, buyer agents often obsessively pursue price reductions even after sellers state their minimum acceptable price. This is particularly prevalent among weaker agents operating on lower budgets.

Conclusion

Agentic commerce will involve much more than simply automating the mechanical steps in transactions. Instead, as the researchers observe, agentic commerce will need to replicate the essential features of human deal-making, being “effective information gathering, strategic reasoning, and above all, skilled negotiation and decision-making”.

The challenge, as this study shows, is that “agent-to-agent negotiation and transaction is naturally an imbalanced game where users using less capable agents will face significant financial loss against stronger agents”.  

As we rapidly head towards a world in which agent-to-agent transactions are widespread, issues of what is ‘fair dealing’ in that world will need to be addressed by enterprises and the consumer protection authorities and the courts which regulate them.