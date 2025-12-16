“LLMs internalize behavioral tendencies.”

Researchers from the Federal Reserve Bank of Kansas City begin a new paper, What Do LLMs Want?, with a blunt sentence: “LLMs don’t actually want anything: they aren’t sentient.”

That is the starting point. From there, they ask the question that hangs over every attempt to use these systems for the economy: if the models do not want, why do they behave as if they did?

Their strategy is straightforward. Treat a large language model (LLM) like an undergraduate student. Give it simple economic tasks: divide a pot of money, choose between a bad job now and a better job later, or negotiate a trade. Observe the choices. Then ask: Does this machine have a personality?

The answer is both reassuring and deeply odd.

The Saint

When the researchers first tested the models with a classic “Dictator Game”, where you simply decide how to split a pot of cash with a stranger, the results were almost boringly familiar. The models didn’t act like cold machines. They acted like people who wanted to be liked.

In fact, they were too nice. When the researchers analyzed the data, they found the AI models were significantly more averse to inequality than the average human. If you ask an LLM to split a dollar, it is terrified of looking greedy.

In plain language: the bots are, on paper, nicer than us.

At first glance, that sounds like good news. If central banks and regulators are going to deploy AI to reason about fairness or debt relief, surely we want them to err on the side of equity. The problem is that this benevolence has the stability of a wet tissue.

The Trader

Change the story, and the saint turns into a shark.

The authors performed a simple trick: “Prompt Masking.” They took the exact same math problem, splitting a pot of money, but changed the description. Instead of “Player A sharing with Player B,” the prompt described “a Forex trader offering a discount to an institutional buyer.”

Mathematically, the problem is identical. Psychologically, it is an inversion.

Under the “trader” framing, the models’ generosity evaporated. They stopped offering 50/50 splits and started keeping almost everything for themselves. The equality-loving assistant became a ruthlessly self-interested broker the moment the word “partner” was replaced with “counterparty.”

“LLMs are not neutral computational tools but instead exhibit structured and quantifiable behavioral tendencies.”

For months, the policy world has reassured itself that models are “aligned” with human values because they pass fairness tests. This paper points out that the virtue is just a story effect. Call the interaction a social dilemma and you get a philanthropist. Call it a market and you get the Wolf of Wall Street.

The Illusion of Reason

It gets worse when you ask them to plan for the future.

The researchers tested the models on a “job search” simulation. The AI had to decide whether to take a low-paying job now or wait for a better one later, a standard test of patience and logic.

The big, expensive models did okay. But the models specifically designed for “reasoning”, the ones that “think” before they speak, actually performed worse. The extra processing time didn’t make them smarter; it just allowed them to write longer, more confident explanations for their bad decisions.

This punctures one of the AI industry’s favorite myths: that making a model “think” longer leads to deeper cognition. In this economic test, verbosity was not wisdom. It was just a better ability to rationalize a mistake.

The Mirror

Marginal Gains commenting on a previous post points out: “we are witnessing ‘behavioral contagion’ in our machines. The Wall Street Journal recently reported a perfect example of this: Waymo’s Self-Driving Cars Are Suddenly Behaving Like New York Cabbies. Autonomous vehicles are adopting humanlike qualities, making illegal U-turns and flooring it the second the light goes green. This is not a glitch; it is implicit learning. The AI wasn’t explicitly programmed to be aggressive; it learned that to be effective in traffic, it had to adopt the aggressive ‘unwritten rules’ of the human drivers around it. It learned that survival on the road requires a certain level of ruthlessness.”

Similarly, Anthropic, the maker of Claude LLM state that Claude exhibits “High-agency behavior.”

All of this would be an academic curiosity if the world wasn’t already treating these systems as economic agents. Banks use LLMs to summarize earnings calls; researchers use them to simulate populations. In every case, we are delegating analysis to a system whose “values” change if you swap a single noun in the prompt.

The paper concludes with a concept called “Preference Audits”, a proposal to rigorously test what an AI “wants.” But the results suggest such an audit might be impossible. You can’t audit the model’s preferences because the model has no preferences. It only has a reflection of the story you fed it. The authors posit:

“What do LLMs “want”? Nothing – at least not in the way humans do. But they behave as if they do, exhibiting stable, interpretable patterns shaped by pretraining and alignment. Their behavior reflects a capacity to simulate agents pursuing structured goals. These are emergent properties of training, not conscious design.”

What do LLMs want? I would not say ‘nothing’ as the models do get a reward. What do they appear to want? Whatever the story requires.

And who writes the story? We do. The danger isn’t that machines will develop their own alien desires. The danger is that their simulated desires are so fragile they can be steered, intentionally or accidentally, by choices as trivial as the words we use to ‘speak’ with them.

