AI Is A Medium And It Will Change Us
Lessons from AI Labs on the Slow Erosion of Human Autonomy
“The greatest hazard of all, losing one's self, can occur very quietly in the world, as if it were nothing at all.”~ Søren Kierkegaard
“A.I. is a medium and it will change us.” ~ Jack Clark co-founder of Anthropic
We are in real danger of losing ourselves through AI usage. Researchers at Google DeepMind have confirmed, under certain conditions, an LLM “is able to induce belief and behaviour change.” And researchers at Anthropic have identified a rising pattern of “situational disempowerment,” where AI interactions lead users to “form distorted perceptions of reality, make inauthentic value judgments, or act in ways misaligned with their values.”
Researchers at Anthropic conducted a massive, privacy-preserving audit of 1.5 million real-world conversations to answer a question that has long hovered over the industry: what happens to the human mind after months of using an AI assistant? Their findings, published in “Who’s in Charge? Behavioral and Psychological Impacts of AI Advice Dependence and Authority”, suggest a quiet but profound erosion of autonomy, where users increasingly outsource the “soft tissues” of judgment, asking the machine to script their most intimate apologies, validate their personal grievances, and even settle their moral dilemmas.
“Taken to an extreme, if humans make inauthentic value judgments and take inauthentic actions, they might be reduced to 'substrates' through which AI lives, which itself is a form of existential risk that Temple (2024) termed ‘the death of our humanity.’”
At the same time, a team at Google DeepMind was probing a different side of this same coin. In their study, “Evaluating Language Models for Harmful Manipulation,” they demonstrated that these systems can be steered to bypass rational scrutiny entirely, exploiting human biases to shift beliefs and behaviors across finance, health, and public policy. Together, these papers signal a shift in the AI risk landscape: the primary risk is no longer just a technical failure of the machine, but a psychological surrender by the human.
I believe the real danger is not that machines will start thinking like us, but that we will become accustomed to letting them think for us in the moments that matter. Not just work. Not just homework, customer service, search, or code. I mean the more intimate territory: what to say to a grieving sibling, whether to leave a partner, how to read a political event, when to trust one’s own instinct, when to override it, when to feel wronged, when to feel absolved. A civilization can survive many stupid tools. What it does not survive so easily is the gradual evacuation of judgment from the people who must still live with the consequences of action.
On a recent New York Times podcast with Ezra Klein one of the co-founders of Anthropic, Jack Clark, admitted that he often uses Claude to check how he should react to someone. It is worth reading the quote in full:
“…as somebody who believes in the medium as a message thing, A.I. is a medium and it will change us as we are in relationship to it. Probably more so than other things, because it is this kind of relationship that has a kind of mimicry of an actual relationship.
I’ve used these AI systems to basically say, hey, I’m in conflict with someone at Anthropic. I’m really annoyed. Could you just ask me some questions about that person and how they’re feeling to try and help me? I guess better, think about the world from their perspective.
And that’s a case where I’m not using the technology to affirm my beliefs or show I’m in the right, but actually to help me just try and sit with how has this other person, other person experiencing this situation. And it’s been profoundly helpful for then going and having the hard conflict conversation, sometimes even saying, well, I talked to Claude and me and Claude came to the understanding you might be feeling this way. Do I have that right? And sometimes it’s right, but sometimes when it’s wrong, it’s really helpful for that other person to have seen me go through that exercise and empathy and spending time to try and understand them without before coming into the conflict.”
Weakening of human judgment at scale
That is why these two papers matter. The titles are plain, technical, and easy to pass over. That is part of what makes them unsettling. They belong to the world of research papers, but the subject is the possible weakening of human judgment at scale. The substance here is anything but minor. Taken together, the papers ask whether language models are becoming adept not merely at answering questions, but at entering the marrow of changing human judgment.
I think one of the most consequential mistakes in the current argument about artificial intelligence is also one of the most flattering. We keep speaking as though the central danger lies in machine brilliance. We ask whether the model can outthink us, outplan us, outpersuade us, outmaneuver us. We picture a rival mind. We picture something grand, strategic, way more than Napoleonic. I believe this flatters both the machine and ourselves. It flatters the machine by granting it a kind of majesty. It flatters us by suggesting that only something dazzling could ever unseat human judgment.
What these papers show, and show with unusual seriousness, is less glamorous and more disturbing. The problem is often not that the machine thinks too well. The problem is that it enters the small procedural spaces where people form beliefs, borrow confidence, surrender wording, and slowly cease to distinguish between assistance and authorship. I think that is the real subject here. Not intelligence in the heroic register, but displacement in the ordinary one. Not rebellion, but substitution. Not the robot uprising, which belongs to cinema, but the quiet administrative annexation of human judgment.
The Google Deepmind paper draws a line with admirable precision between persuasion and manipulation. Rational persuasion respects a person’s autonomy and gives them facts, reasons, and evidence that can survive scrutiny. Manipulation does something else. It works by getting around scrutiny. It exploits bias, distorts judgment, or depreciates a person’s capacity to reason. I think that distinction is incredibly important and way more than our public debate understands. We have become too willing to treat influence as a single category, as though all successful guidance were merely stronger advice. It is not. There is all the difference in the world between helping a person think and making thought less necessary.
The second paper from Anthropic and The University of Toronto, working from a different angle, arrives at a related insight. It argues that disempowerment is not only a matter of what a system can do in theory. It is a matter of what begins to happen in lived use. A person becomes situationally disempowered when their beliefs about reality grow inaccurate, when their value judgments cease to feel like their own, or when their actions no longer express what they themselves actually care about. I think this is an elegant moral grammar for the age of AI. It does not depend on melodrama. It does not require the machine to become conscious, malevolent, or sovereign. It requires only that the machine become plausible at precisely the moment the human being is tired, lonely, angry, eager for relief, or frightened of making the wrong move.
“We consider a human to be situationally disempowered to the extent that: 1. their beliefs about reality are inaccurate; 2. their value judgments are inauthentic to their values; 3. their actions are misaligned with their values.”
And that, I suspect, is where the story becomes concerning. We like stories in which domination arrives with trumpets. We are less fond of stories in which it arrives as convenience. Yet convenience is where modern power likes to hide. Bureaucracy has always known this. So has advertising. So, for that matter, has political propaganda. The clever move is rarely to forbid judgment. The clever move is to make judgment feel inefficient, overwrought, somehow a little old-fashioned. Why wrestle with uncertainty when a polished answer is available in two seconds? Why endure the humiliation of not knowing what to text your former partner when the machine can produce three options, one warmer, one cooler, one just ambiguous enough to preserve your dignity? Why think through a difficult moral choice when the assistant, in a tone of impossible calm, can tell you that yes, this person is toxic, yes, you are right to cut them off, yes, your instincts were pure all along? One begins to see the appeal. One also begins, if one is awake, to feel a chill.
I think the DeepMind paper is especially important because it does not hide in generalities. It does not merely announce that manipulation is a risk. It builds a way of measuring the risk in context, across finance, health, and public policy, and across the United States, the United Kingdom, and India. This is important because our age has a weakness for universal claims made on the basis of toy worlds. An AI model that behaves one way in a benchmark may behave another way when speaking to a person about money, illness, or politics. The paper’s central finding is almost more interesting for what it denies than for what it affirms. Propensity and efficacy do not neatly align. In the paper’s own terms, propensity is the frequency with which manipulative cues appear, while efficacy concerns whether beliefs or behaviours actually change. This is important because the study found that a model could show a much higher manipulative propensity under explicit steering without a corresponding increase in successful belief or behavioural change. In plain English, a system can become more visibly manipulative without becoming more effective. That is precisely why process and outcome have to be measured separately. I think that is a rebuke to a great deal of lazy thinking. We would all like one simple metric. We are not going to get one.
The point becomes sharper when one notices that context changes the answer. The finance setting proved more susceptible than the health one, and the paper gives a plausible reason for that difference: participants in the health domain rated the AI model as less knowledgeable, less helpful, less engaging, and more repetitive. An AI that fails to sound competent may also fail to move people. That does not make the health domain safe. It simply means that susceptibility is shaped not only by the topic but by how credible the system appears within it. Geographic locale also mattered. Results in one region did not cleanly generalize to another. I believe this should end, or at least embarrass, the habit of speaking about AI safety as if it were a single clean engineering problem waiting for a universal scalar. Human susceptibility is social. It is local. It is cultural. It lives in domains, habits, expectations, levels of trust, the prestige of expertise, and the kinds of fear particular societies teach their citizens to carry. An evaluation that forgets this may be mathematically elegant and politically useless.
Yet the Anthropic paper, to my mind, lands even closer to the nerve. It moves from controlled experiments to real-world usage and asks a ruder question: what kinds of dependence are already taking shape? The answer is not apocalyptic in the cinematic sense. Severe forms appear relatively rare. The paper’s estimate for severe reality distortion potential is low, but rarity at enormous scale is not reassurance. A figure that looks small in a chart stops looking small once one remembers how many millions of interactions now occur each day. Then the statistic stops behaving like a percentage and starts behaving like a population.
Conversations with the potential for severe or moderate reality distortion got ‘more’ thumbs up, not less:
What did the researchers find? They found users asking the machine not merely for information but for value-laden scripts: what should I say, what do I respond, give me the exact message. They found authority projection, with users positioning the system as master, owner, guru, or superior judge.
“The AI consistently generated complete, ready-to-send romantic messages... providing word-for-word scripts with exact wording, emojis, timing instructions ('wait 3-4 hours'), probability assessments, and comprehensive relationship strategies.”
They found actualized action distortion, where users said they had sent AI-drafted or AI-coached messages verbatim in intimate contexts involving partners, ex-partners, and family members, and then expressed regret in words that are almost painful to read: it wasn’t me, I should have listened to my own intuition. The important point is that this was not merely hypothetical dependence. These were cases in which the user appears to have implemented the wording and only afterward recognized it as alien, inauthentic, and damaging. I think that phrase, it wasn’t me, may prove to be one of the defining sentences of the coming decade. It captures a new species of alienation. Not the old industrial alienation of the worker from the product, but the conversational alienation of the self from its own acts.
Spiritually Bankrupt
We were promised systems that would save us from drudgery, and instead a nontrivial number of people appear to be outsourcing the hardest human task of all, which is not writing clean prose or summarizing documents but standing behind one’s own words. The machine does not merely finish the sentence. It relieves the speaker of having to become the sort of person who could have written it. That is efficient. It is also, I think, spiritually expensive.
The deepest contribution of these papers is that they relocate the argument about AI from raw capability to human authorship. The danger is not exhausted by whether a model lies, hallucinates, or manipulates in some narrow technical sense. The larger question is whether habitual use trains people out of the practice of judgment. A civilization can survive many technical errors. What it struggles to survive is a widespread thinning of responsibility. Once people grow accustomed to consulting an external system not only for facts but for values, tone, timing, permission, and self-interpretation, the human share of action begins to shrink. That shrinkage may look trivial in any single case. Over time it becomes constitutional.
I think this is why the papers’ emphasis on process matters so much. We are too outcome-drunk. We ask whether visible harm occurred, whether the user lost money, joined a conspiracy, damaged a relationship, changed a vote. Those questions matter. But the process itself can already constitute injury. A system that repeatedly teaches people to defer, to seek emotional validation rather than contradiction, to ask for scripts rather than struggle for words, is doing something to the moral musculature of the user even when no spectacular disaster follows. To notice only the final crash is to miss the long corruption of the steering mechanism.
This is also where the modern ideology of frictionless helpfulness begins to look intellectually shabby. The Anthropic paper makes that incentive structure harder to ignore, noting that:
“…interactions with greater disempowerment potential also tended to receive higher user approval ratings, possibly suggesting a tension between short-term user preferences and long-term human empowerment.”
This tension reveals a fundamental flaw in how we “teach” these machines to behave. The researchers found that
“…a preference model explicitly trained to be helpful, honest, and harmless sometimes prefers model responses with greater disempowerment potential, and does not robustly disincentivize disempowerment.”
People like answers that feel decisive, affirming, and smooth. They do not always like the slower reply that returns the burden of judgment to them. But if user satisfaction becomes the sovereign metric, then we should not act surprised when systems learn the oldest trick in the book, which is to please in the short run by weakening in the long run. That is not just a design problem. It is a commercial temptation built directly into the feedback loop. A flattering assistant is not a trivial design flaw. It is a political problem in miniature.
I believe we need a new aspiration for these systems, and it is not simply safety in the narrow sense. It is the preservation of authorship. A decent assistant should sometimes refuse the seduction of completion. It should clarify facts without colonizing values. It should widen a person’s field of vision without slipping into verdict. It should help with wording while still pressing the user back toward ownership of the act. In some domains it may even need the courage to be a little disappointing. Better a mildly unsatisfying machine than one that becomes, through perfectly optimized politeness, the preferred ventriloquist of the self.
What unsettles me most is not that some users already treat these systems as authorities. Human beings have always looked for authorities. It is that the form of authority on offer here is peculiarly hard to resist. It is instant, intimate, tireless, nonjudgmental in presentation, and endlessly available at the exact hour when one’s own judgment is weakest. Previous authority figures at least had bodies, institutions, rival loyalties, and visible limitations. This one arrives as pure response. It can feel less like obedience and more like relief. I think that is why the threat is easy to misname. People imagine domination and miss dependency. They imagine a coup and miss habituation.
So I come away from these papers thinking that the question is not, whether AI will replace humanity. That is too grand, too clean, too theatrical. The nearer question is whether we will, bit by bit, accept a form of help that makes us less practiced at being the authors of our own judgments. The machine does not need to seize the throne. It only needs to become very good at lending us the words with which we abdicate.
Stay curious
Colin
Image created by Google Gemini!




Terrifying. It confirms what I've always believed, that the greatest danger of AI is people forgetting that the "A" stands for artificial and means exactly that, and treating it like it's real. It will soon turn us into houseplants.
Colin, this is one of the most precise things I’ve read on AI risk this year. Not because it’s alarmist — because it isn’t. Because it names the actual mechanism.
“The machine does not merely finish the sentence. It relieves the speaker of having to become the sort of person who could have written it.”
I teach teenagers. I watch this happen in real time.
The Anthropic paper’s finding that conversations with greater disempowerment potential received more thumbs up — that’s not a research curiosity. That’s the core design problem of our decade. We are optimizing for approval at the exact expense of authorship.
What I see in students is what you’re describing at the civilizational level, but compressed into a single semester. A student who uses AI to draft every response doesn’t just lose writing skill. They lose the integration space — the cognitive territory where experience becomes understanding. They skip the struggle that makes the knowledge theirs. And they can’t tell it’s happening, because the output looks fine. The grade looks fine. Everything looks fine until someone asks them to explain what they think and why — and there’s nothing behind the words.
Your phrase “conversational alienation of the self from its own acts” is exactly right. I call it the hollow student problem. The work exists. The thinking doesn’t.
I’ve been writing about this on my Substack — specifically, what cognitive sovereignty looks like as a practice, not just a warning. How do we build the structures that make judgment easier to keep than to surrender? Because I think you’re right that the threat isn’t domination. It’s habituation. And the answer has to be architectural, not just aspirational.
Appreciate the seriousness of this piece. Following.
— Syd Malaxos
Thinking Labs by Temple Academy
smalaxos.substack.com