Even the Smartest AI Can't Beat the Premier League

The hype around artificial intelligence has reached fever pitch. Every week brings fresh announcements about AI systems solving impossible problems, writing code without human help, and generally reshaping the economy. But here’s a humbling reality: some of the world’s most advanced AI models couldn’t make money betting on soccer matches.

That’s the central finding from a new study by London-based AI startup General Reasoning, which tested eight leading systems from companies like Google, OpenAI, Anthropic, and xAI in a virtual simulation of the 2023-24 Premier League season. The models were given detailed historical data, team statistics, and information about previous games. Their job was simple: build betting models that would maximize returns while managing risk.

They all failed.

The Results Are Brutal

Anthropic’s Claude Opus 4.6 performed best among the bunch, but “best” here means losing an average of 11 percent. It came close to breaking even on one attempt, which is basically celebrating that your house fire only burned half of it down.

Google’s Gemini 3.1 Pro managed a 34 percent profit once, then went bankrupt on another attempt. xAI’s Grok 4.20 went bankrupt immediately and couldn’t complete two of its three attempts. The paper’s conclusion was unsparing: “Every frontier model we evaluated lost money over the season and many experienced ruin.”

This matters because it reveals something uncomfortable about where AI actually stands. When we talk about Technology disruption, the conversation tends to focus on what these systems can do spectacularly well: write software, generate text, summarize documents. But those tasks exist in relatively controlled environments. They don’t require constant adaptation to messy, unpredictable real-world conditions that shift over time.

Betting on soccer, it turns out, does.

The Gap Between Hype and Reality

Ross Taylor, General Reasoning’s chief executive and one of the study’s authors, pointed to a deeper problem with how we typically measure AI capability. Most benchmarks operate in “very static environments” that bear little resemblance to actual reality. A coding challenge has a right answer. A Premier League season doesn’t.

“If you try AI on some real-world tasks, it does really badly,” Taylor, a former Meta AI researcher, told the Financial Times. “Yes, software engineering is very important and economically valuable, but there are lots of other activities with longer time horizons that are important to look at.”

That observation cuts through a lot of noise. The recent excitement around AI’s progress in software engineering is justified. These systems are genuinely useful for certain technical tasks. But extrapolating that success to the entire economy or every type of problem-solving is where things get fuzzy.

The Premier League study is particularly revealing because it stacks the deck in AI’s favor. The models had access to comprehensive historical data. They weren’t competing against human experts dealing with incomplete information or gut feel. They had three attempts to learn and adapt. They just… couldn’t. The systems “systematically underperformed humans” according to the paper.

What This Means for the Panic

There’s something almost reassuring here for the millions of Business professionals and workers losing sleep over AI displacement. The technology is powerful and transformative in specific domains. But it’s also brittle in ways that matter. It struggles with long-term adaptation, with managing competing variables that shift over time, with the kind of contextual judgment that experienced analysts bring to uncertain situations.

That doesn’t mean AI won’t reshape work or eliminate certain job categories. It will. But the notion that we’re on the cusp of general-purpose AI systems that can simply replace human decision-makers across the board? The Premier League betting experiment suggests we’re not there yet, and maybe we’re further away than the current conversation implies.

The paper hasn’t been peer reviewed yet, so it’s worth treating it with appropriate caution. But it’s a useful counterweight to the Silicon Valley narrative that every problem is now solvable with enough compute and clever prompting.

Maybe the real question isn’t whether AI can do everything humans can do. Maybe it’s which specific, bounded problems it’s actually useful for, and where we still need the kind of adaptive thinking that humans have been practicing for thousands of years. The Premier League apparently isn’t one of the places where AI has cracked the code.

Written by

Adam Makins

I’m a published content creator, brand copywriter, photographer, and social media content creator and manager. I help brands connect with their customers by developing engaging content that entertains, educates, and offers value to their audience.