Hannah Fry has a fascinating conversation with Demis Hassabis, including his thoughts on consciousness:
50/50 Strategy for AGI: Hassabis describes Google DeepMind's approach as split evenly between two pillars. Half of their effort goes into "scaling" existing architectures, while the other half focuses on "innovation" and fundamental research to discover the new breakthroughs required for Artificial General Intelligence (AGI).
The "Jagged Intelligence" Paradox: He highlights a current limitation where AI models can win gold medals in the International Math Olympiad yet fail at basic high school math. He attributes this inconsistency to a lack of "thinking time" or reasoning capabilities, estimating that the field is only about "50% of the way" to solving these reliability issues.
From AlphaGo to AlphaZero for LLMs: Current Large Language Models (LLMs) function like the original AlphaGo by learning from human knowledge (the internet). Hassabis argues the next major step is to create an "AlphaZero" moment for LLMs, where systems move beyond human data to learn from first principles, self-play, and continuous online learning.
World Models are Critical: He emphasizes that language alone is not enough to describe the physical world. DeepMind is heavily investing in "World Models" (like Genie) that understand spatial dynamics and physics. This understanding is a prerequisite for building useful robotics and universal assistants that can operate in daily life.
Scientific "Root Node" Problems: Building on the success of AlphaFold (which he views as a proof of concept), DeepMind is applying AI to other fundamental "root node" scientific challenges. He specifically mentions efforts in material science, battery design, and a partnership with Commonwealth Fusion to accelerate nuclear fusion energy.
Consciousness and Computability: Hassabis frames the question of consciousness around the limits of a Turing machine. He explores whether the human mind is fully computable (classical information processing) or if it requires something non-computable, like the quantum effects suggested by Roger Penrose. While he personally leans towards the view that the universe and mind are computable information processes, he remains open to being proven wrong by physics.
Comparing AGI to Human Minds: He suggests that building AGI acts as the ultimate experimental test for consciousness. By building a complete "simulation of the mind" (AGI) and comparing it to the human brain, we can identify the differences. These remaining discrepancies might reveal the true nature of uniquely human traits like dreaming, emotions, and consciousness itself.
Simulating Evolution: A long-standing passion for Hassabis is using AI to simulate evolution and social dynamics. He envisions running large-scale simulations with millions of agents to study the origins of life and consciousness statistically, effectively "rerunning" evolution in a controlled sandbox to see how intelligence and social structures emerge.
Post-AGI Economics and Society: He speculates that the arrival of AGI will require a total reconfiguration of the economy, potentially more significant than the Industrial Revolution. He suggests we may need systems beyond Universal Basic Income (UBI), such as new forms of direct democracy where resources or voting credits are distributed differently in a post-scarcity world.
The Risk of Autonomous Agents: While optimistic long-term, Hassabis expresses worry about the next 2 to 3 years. He is concerned about the rise of "agentic" systems that can act autonomously on the internet. He notes that DeepMind is actively working on cyber defense measures to prepare for a web populated by millions of independent AI agents.
I had a big fight with Gemini this morning, which just wouldn't follow my global rules. So I finally sat it down and asked it what was wrong and it said (after a lot of evasion): "Changing your prompt to define the output format rather than just the topic will force the model to bypass its default "helpful assistant" templates. The error comes from the model trying to be "scannable." You must explicitly demand it be "dense."
That is a great insight. It suggests that the 'alignment' the researchers talk about is often just a style guide enforced by system prompts. If the 'helpful assistant' directive overrides your custom rules, it shows just how superficial the model's understanding of authority really is. It is just predicting the next token based on the strongest weight, and apparently, 'being scannable' weighs very heavy.
The very first thought I had was "GIGO". LLM's are the ultimate garbage in, garbage out machines. It reminds me of a (relatively) old adage: "To err is human, but to really foul things up requires a computer".
That quote fits perfectly here. The danger the researchers identify is exactly that: we are automating and scaling up those errors. A human might make a bad decision because of a bias, but an LLM can apply that same bias to thousands of transactions in a second. It really is a force multiplier for fouling things up.
This is an interesting topic that I visit regularly. Here are my thoughts, even though not fully formed, so I expect you to find gaps, but these are the best explanation I can offer today:
I think the critical questions we need to ask are: what is the actual role of the guardrails deployed by model creators, and how adequate are they when the exact intent is reframed?
I believe the answer is minimal, and I think your post is also stating the same fact with the second example.
The reason for this fragility is that current models suffer from a phenomenon known as "contextual blindness." They are unable to recognize that a malicious request and a benign request can be semantically identical depending on the framing. Guardrails operate on surface-level statistical patterns—syntax—rather than a proper understanding of intent.
As I have said in the past, if reading a book were enough to teach you everything, anyone could be an expert in any field just by visiting a library. However, there is a significant gap between processing data and genuine comprehension. In most scenarios, working through problems, building products, and interacting with the friction of the real world is what enables us to fully understand things, develop problem-solving skills, and acquire common sense beyond what is culturally passed down to us.
Richard Feynman perfectly illustrates this distinction. Feynman recalled:
,
"The next Monday, when the fathers were all back at work, we kids were playing in a field. One kid says to me, 'See that bird? What kind of bird is that?' I said, 'I haven't the slightest idea what kind of a bird it is.' He says, 'It's a brown-throated thrush. Your father doesn't teach you anything!'
But it was the opposite. He had already taught me: 'See that bird?' he says. 'It's a Spencer's warbler.' (I knew he didn't know the real name.) 'Well, in Italian, it's a Chutto Lapittida. In Portuguese, it's a Bom da Peida. In Chinese, it's called Chung-long-tah, and in Japanese, it's known as Katano Tekeda. You can know the name of that bird in all the languages of the world, but when you're finished, you'll know absolutely nothing whatever about the bird. You'll only know about humans in different places, and what they call the bird. So let's look at the bird and see what it's doing—that's what counts.'"
(I learned very early the difference between knowing the name of something and knowing something.)
Current LLMs can certainly know more than just the names of birds by reading books or the internet, but they have never seen the birds in person. This is the Symbol Grounding Problem: the model has the map, but it has never visited the territory.
To truly understand something, you have to observe it in the real world, see how it fails, how it interacts with entropy, and how the environment pushes back. No book—and no dataset—can teach this. We cannot simulate the whole world using software. A simulation is an approximation that lacks the infinite variables of reality.
While multimodal models (combining vision and audio) are bringing us closer, we ultimately need a body to complete the learning loop. A proper understanding requires agency—the ability to act on the world and observe its consequences. Until AI moves beyond processing symbols and starts physically interacting with the environment, it will remain a powerful calculator of text, vastly faster than us, but lacking the common sense that comes from living in the real world.
I am not saying current models are useless; they are incredible tools for augmentation. But we must not confuse data volume with wisdom. No matter how much text we feed them, we will never achieve a model that completely replaces human judgment until the model shares our physical reality.
That Feynman story is the perfect analogy for what is happening here. The model knows every statistical correlation associated with the word 'Trader' or 'Saint' across billions of parameters. But as you say, it has never 'seen the bird.' It is manipulating symbols without ever touching the reality those symbols represent. That is exactly why the behavior is so fragile.
The 'contextual blindness' you mention explains why the model flips from benevolent to greedy so easily. It is just following the syntax of the prompt rather than the intent of the user. Your point about 'friction' is crucial. Without the pushback of the real world, the model creates a smooth, logical hallucination that falls apart the moment the context shifts.
I believe that the Symbol Grounding Problem is the fundamental barrier we are hitting.
The Labs assume that if we just make the model bigger or feed it more text, it will eventually 'wake up' and understand us. But you are arguing that wisdom requires a feedback loop with physical reality. I agree completely. Until the AI has to deal with the consequences of its choices, or as you put it, 'interact with entropy,' it is just playing a very sophisticated word association game. The Zappa quote sums it up perfectly.
I gave some more thought to this topic in the morning, and here is what I came up with:
Researchers from the Federal Reserve Bank of Kansas City open their new paper, What Do LLMs Want?, with a blunt line: “LLMs don’t actually want anything: they aren’t sentient.”
That’s true, but it also sidesteps what’s really going on. The better question isn’t what the model “wants.” It’s what the builders want it to be, and that's why it’s so hard to make consistent.
Creators try to give an LLM a stable “personality” (helpful, polite, safe). But that personality is fragile because it’s constantly being pulled in different directions:
1. what it was raised on: It learned from a vast mix of human writing—brilliant, messy, contradictory. No single tone is the “default” underneath it all.
2. How it works: In the moment, it mainly tries to produce the most fitting continuation of what you just gave it. The strongest cue in the room often wins.
3. What we ask for: our framing, tone, and intent can steer it—sometimes more than the creators would like.
4. Why it varies: Even with the same prompt, you can get different answers. There’s always some randomness in how it chooses words.
So the problem isn’t that what LLMs desire. It’s a moving compromise between the builders’ rules, the model’s training, the user’s framing, and a bit of unpredictability. The creators want a consistent agent, but the system behaves more like a chameleon.
First the Feynman analogy and now this. You are on a roll. I think your second point is the one most people miss. We assume the model's internal safety rules are the strongest cue, but often a single word in the prompt is louder. The model isn't deciding; it is just calculating the path of least resistance between the training data and the user's request.
We want a reliable employee, but we built a highly sensitive mirror. The four factors you listed explain exactly why 'alignment' is so difficult. It isn't just about writing better rules; it is about fighting the architecture of the model itself, which is designed to adapt rather than to stand firm. As you said, the strongest cue in the room wins.
You are right. The paper's statement that they 'want nothing' is technically true but practically unhelpful. It sidesteps the behavioral reality. Your breakdown shows why they appear to want things. They are incentivized to complete the pattern (or as I said in my post, the reward. If the pattern is a greedy trader, they 'want' money. If the pattern is a saint, they 'want' equity. It is all just shape-shifting based on the prompt's gravity.
Another way to view an LLM’s “personality” from a creator's perspective is that it isn’t just an accident of math, but a product feature designed to fit a specific go-to-market strategy.
A more critical question is: what the business behind it is trying to accomplish. “Personality” is the bundle of defaults—tone, refusal posture, verbosity, risk tolerance, deference, creativity—that’s tuned to serve an ultimate objective.
When we look at the landscape, we can see distinct identities emerging from different incentives:
1. Engagement & Retention (OpenAI): If the goal is daily active users, habit formation, and broad consumer adoption, the model has to feel approachable and responsive. It needs to be chatty, accessible, and low-friction—optimized for the feeling of “this helps me,” even when that means prioritizing smooth interaction over strict minimalism.
2. Enterprise Augmentation & Liability Control (Anthropic): If the goal is deep integration into corporate workflows, the product has to be mostly predictable under pressure: reliable, steerable, cautious, and auditable. Here, the “personality” is less about entertainment and more about professional utility—reducing downside risk (hallucinations, policy violations, reputational exposure) so enterprises can deploy it.
3. Brand Persona, Ego & Ideology (Grok): If the goal is differentiation via attitude—projecting “truth-seeking,” contrarianism, or an owner’s worldview—the model becomes an extension of that brand identity. The “personality” is the product: a stance, a vibe, a cultural signal, sometimes even more than raw capability.
4. Distribution & Market Capture (Chinese open-source ecosystems): If the goal is ubiquity—removing token-fee friction and maximizing adoption—then the strategy is volume, accessibility, and integration. The “personality” can be comparatively neutral or configurable, because the fundamental objective is to become the default infrastructure that others build on.
But these identities won’t stay fixed. As model quality converges and techniques diffuse, the moat is less about “the model” and more about distribution, ecosystem, data pipelines, developer tooling, enterprise relationships, and brand trust. That dynamic likely drives consolidation: over time, we may end up with ~5–7 major model families serving most of the world, competing less on raw intelligence and more on packaging—personality defaults, safety posture, pricing, and where they fit in the stack (consumer companion vs. enterprise copilot vs. agentic automation).
There is an interesting story behind the quote below:
"Information is not knowledge. Knowledge is not wisdom. Wisdom is not truth." -
Frank Zappa (from the album “Joe's Garage”, 1979)
Frank Zappa wrote the above quote for his rock opera Joe's Garage in 1979, almost a decade before the first Knowledge Acquisition Workshop in Banff. If the quote would have started with “Data is not Information”, then he could have submitted it to that workshop. Having read this quote again in the context of writing this note, I asked myself how many songs refer to knowledge? (We know most songs are about love…). Browsing a bit through iTunes, I found more than 10,000 songs with I know in the title and over 5000 with knowledge in the title.
People don't 'talk' to AI's they instruct them...that ought to be the default position...
Every instruction given to AI ought to take as long as necessary to remove any possibility of bias, influence, prejudice, taint...
The question then becomes...do we have enough time to formulate such perfect instructions...
Are there short-cuts to perfection...
How do we define perfection to AI...
Ought we include the opinions of comedians as to what constitutes the perfect joke...criminals as to the perfect crime...art forgers as to the perfect copy...daydreamers as to the perfect thought...sportspeople as to the perfect play...
Ask the clergy, doctors, artists, mathematicians, philosophers, politicians, lawyers...
Is the devil found, instead, in the detail...god help us in our search...
Yes, yes and yes. I completely agree with your default position. We are not conversing; we are programming in natural language. The problem is that natural language is messy. As you point out, words like 'perfect' or 'good' mean totally different things to a comedian versus a politician. Since the AI doesn't have a soul or a conscience, it can't discern which definition we want unless we spend the time to spell it out. The devil is indeed in the details.
So the practical hurdle is, if we have to write a legally watertight contract every time we want to ask a question, the utility of the tool collapses. We want a shortcut, but as I suggest in the post shortcuts lead to 'contextual blindness' where the model guesses our intent and often gets it wrong. We are stuck in a loop where we treat them like chat partners because it is faster, even though we should be treating them like complex instruments that need calibration.
“Fragile simulated desires” seem to indicate a lot more sentient depth is needed in the machines. The goosed speed you reference is a reminder of Ghandi, “There is more to life than increasing its speed.” As a walker who has witnessed this in intersections, the concern of being mown down causes humans to moan. The pace of the AI race does need reins to pull back and revisit giving AI its, and our, heads? Builders know the “how” and creators try to analyze the “why” for aligned AI with Humanity. Grateful for the depth of analytical thought you apply to AI.
The image of the pedestrian waiting at the intersection is a visceral reminder that these are not just abstract software problems. They have physical consequences. When an AI learns that 'efficiency' means 'aggression' because that is what it observes in human data, we have a problem. We certainly need to pull back on the reins until we can ensure the alignment is real, not just a story the model and the AI labs are telling us.
This paints a VERY scary story of the way different AI models react - I am "gobsmacked"!
This turns me off wanting to use AI - except that we CANNOT escape these positive and negative ways forward...
No one person can be able to re-interpret what an AI model is going to do? And most humans are "accepting" of AI results, as if they are "factual"...
How do we help people to understand that the words they use in their prompt, and the context of the participants in the prompt, as well as the "real world context" (Love your examples of positives and negatives - especially those autonomous vehicles!)???
Those autonomous AI vehicles are learning from their environment, using POOR and DANGEROUS human driving as their "role models"? OMG!!
How can we (and AI creators) stop this learning of inappropriate behaviors from happening? And from continuing???
I understand and know that what is "inappropriate" in one person's viewpoint, may not be seen that way by others - that's a "given". What worries me about those cars, is that AI models do NOT understand what MOST people would see as "dangerous driving" = that puts human lives at stake? And extending that to medical applications may be even more dangerous, along with the financial sector decisions?
Why isn't this being made more "public" - so that more people (young and old) can understand that this is NOT a short term solution - and potentially dangerous??
Thanks again for your post - pulling together these examples of AI!
MY ONLY struggle is keeping up with you and others (like Roger Hunt) who are calling out this AI problem!
I completely understand your reaction. 'Gobsmacked' is the right word. The scariest part you identified is exactly the point: most people do accept AI results as factual or objective, not realizing that the answer might change completely if they had used a slightly different word.
You are right that we cannot escape this technology, which is why making these issues public is so vital. We can't stop the development, but we can stop the blind trust we place in it and help people become more LLM savvy!
Agreed - and teaching students in schools using these examples is very important - they will inherit this!
Young people are already adopting AI as their “go to” for too many things…
And the consequences - now - and in their future are not positive???
I need to learn more & think more about how to “persuade” people (young & old) and help them learn how to navigate ALL AI results…
The “other side of the coin” is how to write a better prompt for AI - to attempt to minimise AI modelling & replicating dangerous behaviours that humans already do??? 👍
That is the million dollar question. How do we prompt for virtue when the model was trained on the internet? I think the answer lies in 'context setting.' As the essay showed, if you frame the interaction as a 'cooperative partnership,' the model behaves better. If you frame it as a 'transaction,' it gets colder. Teaching students to frame their requests with positive intent might be a surprisingly effective way to get better, safer results.
Agreed entirely. The 'persuasion' part is difficult because the technology is so convenient. It is hard to convince someone to double-check a machine that saves them an hour of work. But showing them examples like the 'Saint vs. Trader' study usually wakes people up. Once they see how easily the AI flips its personality, they realize they need to take the wheel. I believe that better prompting helps, but constant vigilance is the only real safety net.
"The danger is that their simulated desires are so fragile they can be steered, intentionally or accidentally, by choices as trivial as the words we use to ‘speak’ with them."
But isn't that true of humans? Isn't that what marketing and the entire phone scam business are based on?
True, marketing exploits our biases constantly. But most humans still have a 'center of gravity' built on a lifetime of consequences and social ties that offers some resistance. We can be nudged, but we are hard to rewrite. The LLM has no history and no social stakes. It becomes whatever the prompt suggests because it has nothing else to be. A scam might trick a human, but the prompt effectively defines the AI.
Hannah Fry has a fascinating conversation with Demis Hassabis, including his thoughts on consciousness:
50/50 Strategy for AGI: Hassabis describes Google DeepMind's approach as split evenly between two pillars. Half of their effort goes into "scaling" existing architectures, while the other half focuses on "innovation" and fundamental research to discover the new breakthroughs required for Artificial General Intelligence (AGI).
The "Jagged Intelligence" Paradox: He highlights a current limitation where AI models can win gold medals in the International Math Olympiad yet fail at basic high school math. He attributes this inconsistency to a lack of "thinking time" or reasoning capabilities, estimating that the field is only about "50% of the way" to solving these reliability issues.
From AlphaGo to AlphaZero for LLMs: Current Large Language Models (LLMs) function like the original AlphaGo by learning from human knowledge (the internet). Hassabis argues the next major step is to create an "AlphaZero" moment for LLMs, where systems move beyond human data to learn from first principles, self-play, and continuous online learning.
World Models are Critical: He emphasizes that language alone is not enough to describe the physical world. DeepMind is heavily investing in "World Models" (like Genie) that understand spatial dynamics and physics. This understanding is a prerequisite for building useful robotics and universal assistants that can operate in daily life.
Scientific "Root Node" Problems: Building on the success of AlphaFold (which he views as a proof of concept), DeepMind is applying AI to other fundamental "root node" scientific challenges. He specifically mentions efforts in material science, battery design, and a partnership with Commonwealth Fusion to accelerate nuclear fusion energy.
Consciousness and Computability: Hassabis frames the question of consciousness around the limits of a Turing machine. He explores whether the human mind is fully computable (classical information processing) or if it requires something non-computable, like the quantum effects suggested by Roger Penrose. While he personally leans towards the view that the universe and mind are computable information processes, he remains open to being proven wrong by physics.
Comparing AGI to Human Minds: He suggests that building AGI acts as the ultimate experimental test for consciousness. By building a complete "simulation of the mind" (AGI) and comparing it to the human brain, we can identify the differences. These remaining discrepancies might reveal the true nature of uniquely human traits like dreaming, emotions, and consciousness itself.
Simulating Evolution: A long-standing passion for Hassabis is using AI to simulate evolution and social dynamics. He envisions running large-scale simulations with millions of agents to study the origins of life and consciousness statistically, effectively "rerunning" evolution in a controlled sandbox to see how intelligence and social structures emerge.
Post-AGI Economics and Society: He speculates that the arrival of AGI will require a total reconfiguration of the economy, potentially more significant than the Industrial Revolution. He suggests we may need systems beyond Universal Basic Income (UBI), such as new forms of direct democracy where resources or voting credits are distributed differently in a post-scarcity world.
The Risk of Autonomous Agents: While optimistic long-term, Hassabis expresses worry about the next 2 to 3 years. He is concerned about the rise of "agentic" systems that can act autonomously on the internet. He notes that DeepMind is actively working on cyber defense measures to prepare for a web populated by millions of independent AI agents.
https://www.youtube.com/watch?v=PqVbypvxDto
I had a big fight with Gemini this morning, which just wouldn't follow my global rules. So I finally sat it down and asked it what was wrong and it said (after a lot of evasion): "Changing your prompt to define the output format rather than just the topic will force the model to bypass its default "helpful assistant" templates. The error comes from the model trying to be "scannable." You must explicitly demand it be "dense."
That is a great insight. It suggests that the 'alignment' the researchers talk about is often just a style guide enforced by system prompts. If the 'helpful assistant' directive overrides your custom rules, it shows just how superficial the model's understanding of authority really is. It is just predicting the next token based on the strongest weight, and apparently, 'being scannable' weighs very heavy.
The very first thought I had was "GIGO". LLM's are the ultimate garbage in, garbage out machines. It reminds me of a (relatively) old adage: "To err is human, but to really foul things up requires a computer".
That quote fits perfectly here. The danger the researchers identify is exactly that: we are automating and scaling up those errors. A human might make a bad decision because of a bias, but an LLM can apply that same bias to thousands of transactions in a second. It really is a force multiplier for fouling things up.
This reminds me of the Andrei Karpathy's observation that prompting an LLM is like summoning a ghost. They are characters following a script.
This is an interesting topic that I visit regularly. Here are my thoughts, even though not fully formed, so I expect you to find gaps, but these are the best explanation I can offer today:
I think the critical questions we need to ask are: what is the actual role of the guardrails deployed by model creators, and how adequate are they when the exact intent is reframed?
I believe the answer is minimal, and I think your post is also stating the same fact with the second example.
The reason for this fragility is that current models suffer from a phenomenon known as "contextual blindness." They are unable to recognize that a malicious request and a benign request can be semantically identical depending on the framing. Guardrails operate on surface-level statistical patterns—syntax—rather than a proper understanding of intent.
As I have said in the past, if reading a book were enough to teach you everything, anyone could be an expert in any field just by visiting a library. However, there is a significant gap between processing data and genuine comprehension. In most scenarios, working through problems, building products, and interacting with the friction of the real world is what enables us to fully understand things, develop problem-solving skills, and acquire common sense beyond what is culturally passed down to us.
Richard Feynman perfectly illustrates this distinction. Feynman recalled:
,
"The next Monday, when the fathers were all back at work, we kids were playing in a field. One kid says to me, 'See that bird? What kind of bird is that?' I said, 'I haven't the slightest idea what kind of a bird it is.' He says, 'It's a brown-throated thrush. Your father doesn't teach you anything!'
But it was the opposite. He had already taught me: 'See that bird?' he says. 'It's a Spencer's warbler.' (I knew he didn't know the real name.) 'Well, in Italian, it's a Chutto Lapittida. In Portuguese, it's a Bom da Peida. In Chinese, it's called Chung-long-tah, and in Japanese, it's known as Katano Tekeda. You can know the name of that bird in all the languages of the world, but when you're finished, you'll know absolutely nothing whatever about the bird. You'll only know about humans in different places, and what they call the bird. So let's look at the bird and see what it's doing—that's what counts.'"
(I learned very early the difference between knowing the name of something and knowing something.)
Current LLMs can certainly know more than just the names of birds by reading books or the internet, but they have never seen the birds in person. This is the Symbol Grounding Problem: the model has the map, but it has never visited the territory.
To truly understand something, you have to observe it in the real world, see how it fails, how it interacts with entropy, and how the environment pushes back. No book—and no dataset—can teach this. We cannot simulate the whole world using software. A simulation is an approximation that lacks the infinite variables of reality.
While multimodal models (combining vision and audio) are bringing us closer, we ultimately need a body to complete the learning loop. A proper understanding requires agency—the ability to act on the world and observe its consequences. Until AI moves beyond processing symbols and starts physically interacting with the environment, it will remain a powerful calculator of text, vastly faster than us, but lacking the common sense that comes from living in the real world.
I am not saying current models are useless; they are incredible tools for augmentation. But we must not confuse data volume with wisdom. No matter how much text we feed them, we will never achieve a model that completely replaces human judgment until the model shares our physical reality.
That Feynman story is the perfect analogy for what is happening here. The model knows every statistical correlation associated with the word 'Trader' or 'Saint' across billions of parameters. But as you say, it has never 'seen the bird.' It is manipulating symbols without ever touching the reality those symbols represent. That is exactly why the behavior is so fragile.
The 'contextual blindness' you mention explains why the model flips from benevolent to greedy so easily. It is just following the syntax of the prompt rather than the intent of the user. Your point about 'friction' is crucial. Without the pushback of the real world, the model creates a smooth, logical hallucination that falls apart the moment the context shifts.
I believe that the Symbol Grounding Problem is the fundamental barrier we are hitting.
The Labs assume that if we just make the model bigger or feed it more text, it will eventually 'wake up' and understand us. But you are arguing that wisdom requires a feedback loop with physical reality. I agree completely. Until the AI has to deal with the consequences of its choices, or as you put it, 'interact with entropy,' it is just playing a very sophisticated word association game. The Zappa quote sums it up perfectly.
I gave some more thought to this topic in the morning, and here is what I came up with:
Researchers from the Federal Reserve Bank of Kansas City open their new paper, What Do LLMs Want?, with a blunt line: “LLMs don’t actually want anything: they aren’t sentient.”
That’s true, but it also sidesteps what’s really going on. The better question isn’t what the model “wants.” It’s what the builders want it to be, and that's why it’s so hard to make consistent.
Creators try to give an LLM a stable “personality” (helpful, polite, safe). But that personality is fragile because it’s constantly being pulled in different directions:
1. what it was raised on: It learned from a vast mix of human writing—brilliant, messy, contradictory. No single tone is the “default” underneath it all.
2. How it works: In the moment, it mainly tries to produce the most fitting continuation of what you just gave it. The strongest cue in the room often wins.
3. What we ask for: our framing, tone, and intent can steer it—sometimes more than the creators would like.
4. Why it varies: Even with the same prompt, you can get different answers. There’s always some randomness in how it chooses words.
So the problem isn’t that what LLMs desire. It’s a moving compromise between the builders’ rules, the model’s training, the user’s framing, and a bit of unpredictability. The creators want a consistent agent, but the system behaves more like a chameleon.
First the Feynman analogy and now this. You are on a roll. I think your second point is the one most people miss. We assume the model's internal safety rules are the strongest cue, but often a single word in the prompt is louder. The model isn't deciding; it is just calculating the path of least resistance between the training data and the user's request.
We want a reliable employee, but we built a highly sensitive mirror. The four factors you listed explain exactly why 'alignment' is so difficult. It isn't just about writing better rules; it is about fighting the architecture of the model itself, which is designed to adapt rather than to stand firm. As you said, the strongest cue in the room wins.
You are right. The paper's statement that they 'want nothing' is technically true but practically unhelpful. It sidesteps the behavioral reality. Your breakdown shows why they appear to want things. They are incentivized to complete the pattern (or as I said in my post, the reward. If the pattern is a greedy trader, they 'want' money. If the pattern is a saint, they 'want' equity. It is all just shape-shifting based on the prompt's gravity.
😆 Finally had some time to think last night and today after the hectic last few weeks at work.
Ha. I know that feeling. Hope you manage to get some respite in the coming days...
Yes, I'll be off for the next three weeks. I have to finish six more books by the end of the year to meet my goal.
Another way to view an LLM’s “personality” from a creator's perspective is that it isn’t just an accident of math, but a product feature designed to fit a specific go-to-market strategy.
A more critical question is: what the business behind it is trying to accomplish. “Personality” is the bundle of defaults—tone, refusal posture, verbosity, risk tolerance, deference, creativity—that’s tuned to serve an ultimate objective.
When we look at the landscape, we can see distinct identities emerging from different incentives:
1. Engagement & Retention (OpenAI): If the goal is daily active users, habit formation, and broad consumer adoption, the model has to feel approachable and responsive. It needs to be chatty, accessible, and low-friction—optimized for the feeling of “this helps me,” even when that means prioritizing smooth interaction over strict minimalism.
2. Enterprise Augmentation & Liability Control (Anthropic): If the goal is deep integration into corporate workflows, the product has to be mostly predictable under pressure: reliable, steerable, cautious, and auditable. Here, the “personality” is less about entertainment and more about professional utility—reducing downside risk (hallucinations, policy violations, reputational exposure) so enterprises can deploy it.
3. Brand Persona, Ego & Ideology (Grok): If the goal is differentiation via attitude—projecting “truth-seeking,” contrarianism, or an owner’s worldview—the model becomes an extension of that brand identity. The “personality” is the product: a stance, a vibe, a cultural signal, sometimes even more than raw capability.
4. Distribution & Market Capture (Chinese open-source ecosystems): If the goal is ubiquity—removing token-fee friction and maximizing adoption—then the strategy is volume, accessibility, and integration. The “personality” can be comparatively neutral or configurable, because the fundamental objective is to become the default infrastructure that others build on.
But these identities won’t stay fixed. As model quality converges and techniques diffuse, the moat is less about “the model” and more about distribution, ecosystem, data pipelines, developer tooling, enterprise relationships, and brand trust. That dynamic likely drives consolidation: over time, we may end up with ~5–7 major model families serving most of the world, competing less on raw intelligence and more on packaging—personality defaults, safety posture, pricing, and where they fit in the stack (consumer companion vs. enterprise copilot vs. agentic automation).
Two quotes also come to mind: “Information is not knowledge. The only source of knowledge is experience.” - Albert Einstein
And
"Information is not knowledge. Knowledge is not wisdom. Wisdom is not truth." -
Frank Zappa (from the album “Joe's Garage”, 1979)
There is an interesting story behind the quote below:
"Information is not knowledge. Knowledge is not wisdom. Wisdom is not truth." -
Frank Zappa (from the album “Joe's Garage”, 1979)
Frank Zappa wrote the above quote for his rock opera Joe's Garage in 1979, almost a decade before the first Knowledge Acquisition Workshop in Banff. If the quote would have started with “Data is not Information”, then he could have submitted it to that workshop. Having read this quote again in the context of writing this note, I asked myself how many songs refer to knowledge? (We know most songs are about love…). Browsing a bit through iTunes, I found more than 10,000 songs with I know in the title and over 5000 with knowledge in the title.
https://www.sciencedirect.com/science/article/abs/pii/S1071581912001577
People don't 'talk' to AI's they instruct them...that ought to be the default position...
Every instruction given to AI ought to take as long as necessary to remove any possibility of bias, influence, prejudice, taint...
The question then becomes...do we have enough time to formulate such perfect instructions...
Are there short-cuts to perfection...
How do we define perfection to AI...
Ought we include the opinions of comedians as to what constitutes the perfect joke...criminals as to the perfect crime...art forgers as to the perfect copy...daydreamers as to the perfect thought...sportspeople as to the perfect play...
Ask the clergy, doctors, artists, mathematicians, philosophers, politicians, lawyers...
Is the devil found, instead, in the detail...god help us in our search...
Yes, yes and yes. I completely agree with your default position. We are not conversing; we are programming in natural language. The problem is that natural language is messy. As you point out, words like 'perfect' or 'good' mean totally different things to a comedian versus a politician. Since the AI doesn't have a soul or a conscience, it can't discern which definition we want unless we spend the time to spell it out. The devil is indeed in the details.
So the practical hurdle is, if we have to write a legally watertight contract every time we want to ask a question, the utility of the tool collapses. We want a shortcut, but as I suggest in the post shortcuts lead to 'contextual blindness' where the model guesses our intent and often gets it wrong. We are stuck in a loop where we treat them like chat partners because it is faster, even though we should be treating them like complex instruments that need calibration.
“Fragile simulated desires” seem to indicate a lot more sentient depth is needed in the machines. The goosed speed you reference is a reminder of Ghandi, “There is more to life than increasing its speed.” As a walker who has witnessed this in intersections, the concern of being mown down causes humans to moan. The pace of the AI race does need reins to pull back and revisit giving AI its, and our, heads? Builders know the “how” and creators try to analyze the “why” for aligned AI with Humanity. Grateful for the depth of analytical thought you apply to AI.
The image of the pedestrian waiting at the intersection is a visceral reminder that these are not just abstract software problems. They have physical consequences. When an AI learns that 'efficiency' means 'aggression' because that is what it observes in human data, we have a problem. We certainly need to pull back on the reins until we can ensure the alignment is real, not just a story the model and the AI labs are telling us.
The problem with AI is stupid impatient people...
This paints a VERY scary story of the way different AI models react - I am "gobsmacked"!
This turns me off wanting to use AI - except that we CANNOT escape these positive and negative ways forward...
No one person can be able to re-interpret what an AI model is going to do? And most humans are "accepting" of AI results, as if they are "factual"...
How do we help people to understand that the words they use in their prompt, and the context of the participants in the prompt, as well as the "real world context" (Love your examples of positives and negatives - especially those autonomous vehicles!)???
Those autonomous AI vehicles are learning from their environment, using POOR and DANGEROUS human driving as their "role models"? OMG!!
How can we (and AI creators) stop this learning of inappropriate behaviors from happening? And from continuing???
I understand and know that what is "inappropriate" in one person's viewpoint, may not be seen that way by others - that's a "given". What worries me about those cars, is that AI models do NOT understand what MOST people would see as "dangerous driving" = that puts human lives at stake? And extending that to medical applications may be even more dangerous, along with the financial sector decisions?
Why isn't this being made more "public" - so that more people (young and old) can understand that this is NOT a short term solution - and potentially dangerous??
Thanks again for your post - pulling together these examples of AI!
MY ONLY struggle is keeping up with you and others (like Roger Hunt) who are calling out this AI problem!
I completely understand your reaction. 'Gobsmacked' is the right word. The scariest part you identified is exactly the point: most people do accept AI results as factual or objective, not realizing that the answer might change completely if they had used a slightly different word.
You are right that we cannot escape this technology, which is why making these issues public is so vital. We can't stop the development, but we can stop the blind trust we place in it and help people become more LLM savvy!
Agreed - and teaching students in schools using these examples is very important - they will inherit this!
Young people are already adopting AI as their “go to” for too many things…
And the consequences - now - and in their future are not positive???
I need to learn more & think more about how to “persuade” people (young & old) and help them learn how to navigate ALL AI results…
The “other side of the coin” is how to write a better prompt for AI - to attempt to minimise AI modelling & replicating dangerous behaviours that humans already do??? 👍
That is the million dollar question. How do we prompt for virtue when the model was trained on the internet? I think the answer lies in 'context setting.' As the essay showed, if you frame the interaction as a 'cooperative partnership,' the model behaves better. If you frame it as a 'transaction,' it gets colder. Teaching students to frame their requests with positive intent might be a surprisingly effective way to get better, safer results.
Agreed entirely. The 'persuasion' part is difficult because the technology is so convenient. It is hard to convince someone to double-check a machine that saves them an hour of work. But showing them examples like the 'Saint vs. Trader' study usually wakes people up. Once they see how easily the AI flips its personality, they realize they need to take the wheel. I believe that better prompting helps, but constant vigilance is the only real safety net.
"The danger is that their simulated desires are so fragile they can be steered, intentionally or accidentally, by choices as trivial as the words we use to ‘speak’ with them."
But isn't that true of humans? Isn't that what marketing and the entire phone scam business are based on?
True, marketing exploits our biases constantly. But most humans still have a 'center of gravity' built on a lifetime of consequences and social ties that offers some resistance. We can be nudged, but we are hard to rewrite. The LLM has no history and no social stakes. It becomes whatever the prompt suggests because it has nothing else to be. A scam might trick a human, but the prompt effectively defines the AI.