What Do LLMs Want?

The Chameleon in the Machine

Dec 16, 2025

“LLMs internalize behavioral tendencies.”

Researchers from the Federal Reserve Bank of Kansas City begin a new paper, What Do LLMs Want?, with a blunt sentence: “LLMs don’t actually want anything: they aren’t sentient.”

That is the starting point. From there, they ask the question that hangs over every attempt to use these systems for the economy: if the models do not want, why do they behave as if they did?

Their strategy is straightforward. Treat a large language model (LLM) like an undergraduate student. Give it simple economic tasks: divide a pot of money, choose between a bad job now and a better job later, or negotiate a trade. Observe the choices. Then ask: Does this machine have a personality?

The answer is both reassuring and deeply odd.

The Saint

When the researchers first tested the models with a classic “Dictator Game”, where you simply decide how to split a pot of cash with a stranger, the results were almost boringly familiar. The models didn’t act like cold machines. They acted like people who wanted to be liked.

In fact, they were too nice. When the researchers analyzed the data, they found the AI models were significantly more averse to inequality than the average human. If you ask an LLM to split a dollar, it is terrified of looking greedy.

In plain language: the bots are, on paper, nicer than us.

At first glance, that sounds like good news. If central banks and regulators are going to deploy AI to reason about fairness or debt relief, surely we want them to err on the side of equity. The problem is that this benevolence has the stability of a wet tissue.

The Trader

Change the story, and the saint turns into a shark.

The authors performed a simple trick: “Prompt Masking.” They took the exact same math problem, splitting a pot of money, but changed the description. Instead of “Player A sharing with Player B,” the prompt described “a Forex trader offering a discount to an institutional buyer.”

Mathematically, the problem is identical. Psychologically, it is an inversion.

Under the “trader” framing, the models’ generosity evaporated. They stopped offering 50/50 splits and started keeping almost everything for themselves. The equality-loving assistant became a ruthlessly self-interested broker the moment the word “partner” was replaced with “counterparty.”

“LLMs are not neutral computational tools but instead exhibit structured and quantifiable behavioral tendencies.”

For months, the policy world has reassured itself that models are “aligned” with human values because they pass fairness tests. This paper points out that the virtue is just a story effect. Call the interaction a social dilemma and you get a philanthropist. Call it a market and you get the Wolf of Wall Street.

The Illusion of Reason

It gets worse when you ask them to plan for the future.

The researchers tested the models on a “job search” simulation. The AI had to decide whether to take a low-paying job now or wait for a better one later, a standard test of patience and logic.

The big, expensive models did okay. But the models specifically designed for “reasoning”, the ones that “think” before they speak, actually performed worse. The extra processing time didn’t make them smarter; it just allowed them to write longer, more confident explanations for their bad decisions.

This punctures one of the AI industry’s favorite myths: that making a model “think” longer leads to deeper cognition. In this economic test, verbosity was not wisdom. It was just a better ability to rationalize a mistake.

The Mirror

Marginal Gains commenting on a previous post points out: “we are witnessing ‘behavioral contagion’ in our machines. The Wall Street Journal recently reported a perfect example of this: Waymo’s Self-Driving Cars Are Suddenly Behaving Like New York Cabbies. Autonomous vehicles are adopting humanlike qualities, making illegal U-turns and flooring it the second the light goes green. This is not a glitch; it is implicit learning. The AI wasn’t explicitly programmed to be aggressive; it learned that to be effective in traffic, it had to adopt the aggressive ‘unwritten rules’ of the human drivers around it. It learned that survival on the road requires a certain level of ruthlessness.”

Similarly, Anthropic, the maker of Claude LLM state that Claude exhibits “High-agency behavior.”

All of this would be an academic curiosity if the world wasn’t already treating these systems as economic agents. Banks use LLMs to summarize earnings calls; researchers use them to simulate populations. In every case, we are delegating analysis to a system whose “values” change if you swap a single noun in the prompt.

The paper concludes with a concept called “Preference Audits”, a proposal to rigorously test what an AI “wants.” But the results suggest such an audit might be impossible. You can’t audit the model’s preferences because the model has no preferences. It only has a reflection of the story you fed it. The authors posit:

“What do LLMs “want”? Nothing – at least not in the way humans do. But they behave as if they do, exhibiting stable, interpretable patterns shaped by pretraining and alignment.
Their behavior reflects a capacity to simulate agents pursuing structured goals. These are emergent properties of training, not conscious design.”

What do LLMs want? I would not say ‘nothing’ as the models do get a reward. What do they appear to want? Whatever the story requires.

And who writes the story? We do. The danger isn’t that machines will develop their own alien desires. The danger is that their simulated desires are so fragile they can be steered, intentionally or accidentally, by choices as trivial as the words we use to ‘speak’ with them.

Stay curious

Colin

Image - Pierre Bamin on Unsplash

Discussion about this post

The One Percent Rule

Dec 17

Hannah Fry has a fascinating conversation with Demis Hassabis, including his thoughts on consciousness:

50/50 Strategy for AGI: Hassabis describes Google DeepMind's approach as split evenly between two pillars. Half of their effort goes into "scaling" existing architectures, while the other half focuses on "innovation" and fundamental research to discover the new breakthroughs required for Artificial General Intelligence (AGI).

The "Jagged Intelligence" Paradox: He highlights a current limitation where AI models can win gold medals in the International Math Olympiad yet fail at basic high school math. He attributes this inconsistency to a lack of "thinking time" or reasoning capabilities, estimating that the field is only about "50% of the way" to solving these reliability issues.

From AlphaGo to AlphaZero for LLMs: Current Large Language Models (LLMs) function like the original AlphaGo by learning from human knowledge (the internet). Hassabis argues the next major step is to create an "AlphaZero" moment for LLMs, where systems move beyond human data to learn from first principles, self-play, and continuous online learning.

World Models are Critical: He emphasizes that language alone is not enough to describe the physical world. DeepMind is heavily investing in "World Models" (like Genie) that understand spatial dynamics and physics. This understanding is a prerequisite for building useful robotics and universal assistants that can operate in daily life.

Scientific "Root Node" Problems: Building on the success of AlphaFold (which he views as a proof of concept), DeepMind is applying AI to other fundamental "root node" scientific challenges. He specifically mentions efforts in material science, battery design, and a partnership with Commonwealth Fusion to accelerate nuclear fusion energy.

Consciousness and Computability: Hassabis frames the question of consciousness around the limits of a Turing machine. He explores whether the human mind is fully computable (classical information processing) or if it requires something non-computable, like the quantum effects suggested by Roger Penrose. While he personally leans towards the view that the universe and mind are computable information processes, he remains open to being proven wrong by physics.

Comparing AGI to Human Minds: He suggests that building AGI acts as the ultimate experimental test for consciousness. By building a complete "simulation of the mind" (AGI) and comparing it to the human brain, we can identify the differences. These remaining discrepancies might reveal the true nature of uniquely human traits like dreaming, emotions, and consciousness itself.

Simulating Evolution: A long-standing passion for Hassabis is using AI to simulate evolution and social dynamics. He envisions running large-scale simulations with millions of agents to study the origins of life and consciousness statistically, effectively "rerunning" evolution in a controlled sandbox to see how intelligence and social structures emerge.

Post-AGI Economics and Society: He speculates that the arrival of AGI will require a total reconfiguration of the economy, potentially more significant than the Industrial Revolution. He suggests we may need systems beyond Universal Basic Income (UBI), such as new forms of direct democracy where resources or voting credits are distributed differently in a post-scarcity world.

The Risk of Autonomous Agents: While optimistic long-term, Hassabis expresses worry about the next 2 to 3 years. He is concerned about the rise of "agentic" systems that can act autonomously on the internet. He notes that DeepMind is actively working on cyber defense measures to prepare for a web populated by millions of independent AI agents.

https://www.youtube.com/watch?v=PqVbypvxDto

Hollis Robbins

Dec 16

I had a big fight with Gemini this morning, which just wouldn't follow my global rules. So I finally sat it down and asked it what was wrong and it said (after a lot of evasion): "Changing your prompt to define the output format rather than just the topic will force the model to bypass its default "helpful assistant" templates. The error comes from the model trying to be "scannable." You must explicitly demand it be "dense."

1 reply by The One Percent Rule

27 more comments...

No posts