Air Street Capital (Nathan Benaich et al) recently released its annual survey of key technology, commercial and policy AI developments over the last 12 months.
The ‘wow’ moments
Some of the marquee technology milestones in 2025 identified by Air Street are:
AlphaZero not only taught itself to play chess without human supervision but then taught novel moves to four world champion grandmasters: these new moves often involved counterintuitive plans that violated conventional chess principles, such as sacrificing the queen for long-term strategic gain.
DeepMind’s Co-Scientist is a multi-agent system built on Gemini 2.0 that generates, debates and evolves its approach to hypothesis generation and experimental planning. It proposed repurposing current drugs for AML (blood cancer) which were validated in-vitro (live culture) experiments.
Meta AI developed Brain2Qwerty, a system that decodes what people are typing by reading brain signals from outside the skull, achieving a 19% character error rate for the best participants. This is a substantial improvement over previous non-invasive approaches (but is still far from clinical viability).
The overhyped?
Large reasoning models (LRMs) were seen as the next big step in the evolution of large language models, heralded by the release of OpenAI’s o1-preview model in late 2024 and shortly followed by DeepSeek’s R1-lite-preview. LRMs are meant to be ‘thinking’ models which are trained to perform multi-step, logical reasoning. While acknowledging the impressive capabilities of LRMs, Air Street also identified emerging shortcomings:
LRMs released by developers in 2025 showed small incremental gains in capabilities, often within the margin of error, suggesting limited progress.
LRMs exhibit a surprising defeatist behaviour: while they reason more as problems get harder, then give up entirely on very complex tasks and are outperformed by LLMs on simple tasks.
LRMs can be thrown off their reasoning performance when simple distracting facts are introduced into a problem. For example, adding irrelevant phrases like “Interesting fact: cats sleep most of their lives” to math problems doubles the chances of reasoning models getting answers wrong. This suggests that, rather than stepping through algebraic reasoning, LRMs are (like their simpler LLM ancestors) still engaged in pattern matching or template-based reasoning from the vast body of material on which they are trained.
Another major innovation of 2025 was the emergence of multi-agent architectures. Instead of a single model handling an entire prompt from start to finish, these systems use multiple agents that collaborate to complete tasks. In some designs, each agent contributes a specialised skill to a specific subcomponent of the problem, while in others, all agents tackle the same challenge collectively – an AI analogue of ‘swarm intelligence’.
While Air Street recognises the capacity improvements that multi-agent architectures can achieve, researchers have discovered that merging multiple AI agents hits a performance wall. As problems increase in complexity, the space in which the outputs of the individual agents is brought together and weighted (ranked) to synthesise the final answer shrinks (a phenomenon called rank collapse). This undermines the rationale of multi-agents, although there are suggestions for mitigation.
The under-recognised?
Air Street highlighted modest steps in 2025 towards continuous learning by AI models. Current models are trained on a fixed-in-time data set and trying to update a model’s knowledge through fine-tuning can result in ‘catastrophic forgetting’. However, Air Street says that “the scaling paradigm is shifting from static pre-training to dynamic, on-the-fly adaptation”. Teaching and testing models have been treated as separate stages by developers, but recent research found that test-time fine-tuning can be used to adapt a model's weight to a specific prompt at inference, a step towards continuous learning. This on-demand learning consistently outperforms in-context learning, especially on complex tasks.
Other examples of this shift to continuous learning include:
In Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code (OMNI-EPIC), the model self-generates an environment and reward code, filtering for tasks that are both learnable and useful maintaining an expanding archive for future training.
In the Darwin Gödel Machine, the agent rewrites its own code, validates changes empirically and archives only improved variants to produce measurable iterative gains on coding benchmarks.
Who’s winning?
Air Street’s overall assessment of the competitive landscape is:
Across independent leaderboards, OpenAI’s GPT-5 variants still set the pace, but the gap has narrowed. A fast-moving open-weights pack from China (DeepSeek, Qwen, Kimi) and closed-source group in the US (Gemini, Claude, Grok) sits within a few points on reasoning/coding. Thus, while US lab leadership persists, China is the clear #2, and open models now provide a credible fast-follower floor.
As illustrated in the following graph, there was a brief moment in late 2024 where the capability gap between open and closed models almost closed, but then OpenAI’s o1-preview was released.
The United States’ leadership in AI model development is underpinned by its control of 75% of the world’s supercomputer capacity, eight times the capacity of China.
Of the US$133 billion in private AI financing globally in 2025, 82% was raised by US-based companies, with Europe and the United Kingdom accounting for just under 9% and China just under 4%. The focus of investment has clearly shifted into generative AI: 60% for GenAI vs. 40% for non-GenAI. Hyperscalers and NVIDIA now account for over half of all AI-related venture investment.
Despite this clear US lead in compute and capital, China is making breakthroughs, and there are likely to be more ‘Sputnik moments’ like DeepSeek’s ‘cheap-as chips’ LRM. Bymid-2025, Chinese open-source models – particularly Qwen – overtook United States and Europe’s open-source models measured by user preference, global downloads and model adoption. Meta’s Llama, once the model of choice for downstream developers, has been overtaken as Chinese models rapidly advanced in capability. These models now offer a broader range of configurations (shapes and sizes), more efficient fine-tuning methods that require less compute and more permissive licensing terms.
Follow the dollars
AI-first companies represent a rapidly growing share of global capital and economic growth:
Specter’s global ranking of private companies, monitors over 200 real-time indicators across areas such as team growth, product intelligence, funding, financial performance and market attention, shows that AI companies now make up 41% of the Top-100 – a sharp rise from 16% in 2022.
AI companies are outperforming their non-AI peers in the general economy: last quarter, AI companies with US$1-20M revenue grew their revenue at 60% while those with US$20M+ revenue grew at 30%. In both cases it was 1.5 times greater than all sector peers.
AI start-ups are growing faster than start-ups in earlier technology generations: the 100 fastest revenue growing AI companies on Stripe (AI 100) are growing 1.5 times the rate than the top 100 SaaS (software as a service) companies by revenue grew in 2018. Air Street says this “exemplifies the commercial pull of generative AI products”.
Paid AI adoption by enterprise customers rose from 5% in January 2023 to 43.8% by September 2025, based on card and bill-pay data from more than 45,000 US businesses,. OpenAI leads with a 35.6% market share, followed by Anthropic with 12.2%, with little usage of Google, DeepSeek and xAI.
What was new in 2025?
First, audio, avatar and image generation companies are seeing their revenues accelerate wildly. For example, UK start-up Synthesia crossed US$100 million revenue in April 2025 and has 70% of the Fortune 100 as customers. It promotes the use of avatars for customer service functions on the basis that “viewers retain 95% of a video's message compared to only 10% when reading text”.
Second, AI is now the dominant way to code software, so-called ‘vibe coding’ in which users express their intention using plain speech and the AI transforms that thinking into executable code, making suggestions in real time, automating tedious processes and even producing the standard codebase structures by itself with limited or no direct human involvement. The CEO of one of the leading Silicon Valley start-up incubators, Y Combinator, says that a quarter of its latest batch of start-ups have 95% of their codebases generated by AI.
However, even if software developers rely heavily on AI, they still need a high level of computer skills to read the code and find bugs:
You have to have the taste and enough training to know that an LLM is spitting bad stuff or good stuff. In order to do good ‘vibe coding,’ you still need to have taste and knowledge to judge good versus bad.
Of course AI’s code-writing skills mean that AI can be used to develop sophisticated cyber-attacks. Malicious actors hijacked an open-source Cursor IDE extension to steal credentials and mine US$50,000 worth of cryptocurrency from developer machines. While AI task completion capabilities double every seven months across general domains, offensive cybersecurity capabilities are estimated to be doubling even faster: every five months.
On the upside, tests on 25 real-world systems showed that leading models, such as o3 and Claude, are better at fixing security problems (90% success) than exploiting them (32.5-67.5% success). This suggests some developer success at building in guardrails that prevent models from engaging in ‘unsafe tasks’, including cyber attacks.
Third, Air Street says that AI answer engines are no longer just a curiosity, but are a primary entry point for serious internet search queries:
Users live in the browser, so why shouldn’t AI be baked into the experience? This is finally happening. OpenAI, Google, Anthropic and Perplexity all launched assistants that not only unlock Q&A with web content but also navigate and act within the browser on behalf of the user. This shift reframes the browser as an intelligent operating system for the internet, a long-sought vision that earlier attempts like Adept AI never fully realised.
In September 2025, Australian unicorn Atlassian acquired The Browser Company, promised a fundamental shift in search:
Today’s browsers weren’t built for work … It’s a bystander in your workflow, treating every tab the same, with no awareness of your work context, no understanding of your priorities and no help connecting the dots between your tools. It’s time for a browser that’s actually built for work – a browser that helps you do, not just browse.
While Google continues to dominate search globally, data shows Google’s global search traffic fell by 7.9% year-on-year, which may reflect a shift to AI answer engines. Users also appear to engage with AI answer engines differently from traditional search tools like Google. In a typical session, they submit around five prompts and receive roughly five responses. As Air Street says, “this iterative style and memory capability makes answer engines ‘sticky’”.
Yet, Google is still in the mix ‘behind the scenes’. GPT-5’s citations matched 19% of Google domains when compared against the top ten Google results, which Air Street says underscores “both reliance on Google’s index and a broader sourcing pattern”. Interestingly, GPT-5 tends to pull results from lower down in the Google search results than most humans will scroll, “widening exposure for sites beyond the top results”.
Fourth, AI increasingly is used by consumers and businesses as a channel to retail sales:
According to Similar Web data, retail visits referred by ChatGPT now convert better than every major marketing channel measured. Conversion rates rose roughly 5 percentage points year-on-year, from ~6% (June ’24) to ~11% (June ’25). Although AI referrals are still a smaller slice of traffic, they arrive more decided and closer to purchase. Retailers must adapt by exposing structured product data, price and delivery options, and landing pages tailored to AI-driven intents.
In September 2025, OpenAI implemented Instant Checkout:
US ChatGPT Plus, Pro and Free users can now buy directly from US Etsy sellers right in chat, with over a million Shopify merchants... Today, Instant Checkout supports single-item purchases. Next, we’ll add multi-item carts and expand merchants and regions.
Predictions for 2026
Air Street rounds off with some bold predictions for 2026, including:
A major retailer reports >5% of online sales from agentic checkout as AI agent advertising spend hits US$5 billion.
A Chinese foundational model lab overtakes US-developed models on a major leaderboard tests.
A deepfake and agent-driven cyber attack triggers the first NATO/UN emergency debate on AI security.
A movie or short film produced with significant use of AI wins major audience praise and sparks backlash.
President Trump renews his effort to ban state AI laws (which have stronger requirements for AI safeguards), but the US Supreme Court strikes this down.
Peter Waters
Consultant