Therein lies the big question: how to align AI agents with human values and intentions? It's critical that we find a way to instill in these systems the ability to distinguish right from wrong - and to always choose the former. No small task considering how difficult - and in some cases impossible - to instill this distinction and motivation in humans.
This might begin with instilling this ability in the capitalist purveyors of AI themselves. Somehow, pure profit motive doesn't seem to be cutting the muster. We're in trouble.
That is the absolute core challenge facing AI development – the alignment problem. As you rightly point out, ensuring AI acts in accordance with human values is incredibly difficult, perhaps even more so than ensuring humans do! The Silver & Sutton paper hints at potential mechanisms for grounding AI goals in real-world consequences, which might offer a different approach than simply optimizing abstract preferences. However, it doesn't magically solve the underlying issue of ensuring those developing and deploying these systems prioritize safety and ethics over pure profit. Your point about the motivations of the 'purveyors' is stark and essential. It underscores why broad societal discussion and robust governance are just as crucial as technical solutions. We are indeed facing a significant challenge
The concept of consequences is especially tricky, and would require we rethink the very notion. Amongst we mere mortals, consequences usually amount to some legal remedy such as fines or even prison time. This would be meaningless to a machine.
Indeed, the efficacy of a punitive justice system is itself questionable. What sort of consequences would motivate a computer, however sophisticated?
We've grown too big, too quickly for our breeches.
Agreed. First, I would question the motivation of the architects of these models to guide them with their ‘values.’ Second, if these models outpace human intelligence at what point do they become like Bill Gates in a meeting who reputedly referred to other people’s ideas as the “stupidest idea I’ve ever heard,” i.e. the models begin referring to humans as stupid.
We're uncomfortably close to that, aren't we? We already know what the architects motives are - maximum profiteering. It's the same disease as the big pollution - AKA, big oil and coal - industries. They know it'll cause our extinction, but squeezing every last penny of profit out of their wells and mines is a higher priority for them. They're psychopaths.
My younger brother who died last year was a close friend of Demis Hassabis of DeepMind. I never met him although he was at my brother’s wedding. My entire family think he’s a genius, except me. He sold DeepMind to Google on the proviso that it established an ethics committee on AI which was disbanded within three years. The term ‘genius’ should always be accompanied by a qualifying adjective to best describe its inner motivation.
I am quite bullish overall on Gemini, they have a good team focusing on ethics or rather safety now, but agree the disbanding of the high level ethics committee with world leading independent people was wrong and led to one of the cofounders leaving (Mustafa Suleyman) - who is now head of AI at Microsoft!
Incidentally, in an email exchange when Musk and Altman were setting up OpenAI they said they could not let Demis or Google get to AGI first as it could be disastrous for humanity. Musk was an original investor in DeepMind and pleaded with Demis not to sell as Musk and Larry Page (Google co-founder) had fallen out over Page's disinterest in humanity! It is a sordid mess - but bizarre as it may sound currently I have more faith in Demis building alignment in AI than Musk XAI and OpenAI. The lesser of two evils... Shane Legg of DeepMind, the other co-founder, is on the ball with safety... the best of the AI leading companies for safety are Anthropic (Claude) and Google Gemini - after that I worry deeply!
But then again all profit driven... Anthropic has investment from Amazon and Google!
Second, there's no question the word "genius" is overused. It's especially egregious when we have a president who dubs himself to be one - without a shred of credible evidence.
I think even entertaining the idea that "anthropocentric conceit must yield" in the face of super-intelligent AI is a dangerous slippery slope. It reduces us into (intelligent) mechanical cogs to be replaced once shinier machines come along and in a way justifies allowing technology to fully consume and assimilate us into itself.
Should we not strive towards the opposite? Towards assimilating the machines into the human experience, would humanity not rise higher if we chose to domesticate technology instead of deferring to it?
Thank you for this really insightful comment and for pushing back on that phrase, it's a crucial point. Perhaps 'anthropocentric conceit must yield' wasn't the best choice of words if it suggests humans becoming obsolete or passively assimilated. My intention wasn't to imply we should defer wholesale, but rather to show Silver and Sutton's idea that we might need to shed the assumption that human-like thinking is the only path to, or the ultimate form of, intelligence. Recognizing that AI learning from experience might develop non-human ways of reasoning seems important if we are to understand and interact with it effectively.
I really like your framing of 'assimilating machines into the human experience' or 'domesticating technology.' That feels like a much more empowering and desirable goal. Perhaps understanding the potential difference in AI cognition (yielding conceit) is actually a prerequisite for successfully achieving that domestication, knowing what we are working with, rather than assuming it's just an imitation. How do we best integrate something potentially alien without first acknowledging its potential alienness? It's a complex challenge, and I appreciate you highlighting the dangers of phrasing that might imply resignation over agency. We should just stick to it being a tool!
I'm with you there that we need to acknowledge its alienness first before we can work with it (and its difference from gpts). I liked how you described it in evolutionary terms for example.
A lot of AI conversations today is around the alignment problem and how to ensure AIs work in the human interest. It's a huge issue of course but my bigger worry is that too many people have already built a mental model of AI based on Sci-fi singularities and Big Tech hype. Such models tend to place the technology onto a pedestal where "machine intelligence >> human intelligence" is treated as an absolute rule. Which would then imply that if human reasoning contradicts machine reasoning then machines are to always be trusted.
How would you go about communicating the value of agency in this era? And what do you think people can do to actually keep up with/integrate these intelligent tools?
Wow, Colin, thank you for laying this all out. It’s a lot to process first thing in the morning. Also, I’m happy to learn that judges still slam gavels where you are.
Ha, yes. My friend was the chief judge and when we agree on anything, dinner appointment, hike, bike ride he slams a gavel picture in WhatsApp and exclaims "Ruled" ... but also in the court room too.
The rapid rise of Artificial Intelligence (AI) is pushing humanity into uncharted and potentially dangerous territory. Unlike nuclear weapons, which are tightly regulated and constrained, AI lacks oversight, global regulation, and natural limits. Nations, corporations, and individuals are racing to create increasingly powerful AI systems, often prioritizing short-term gains over long-term safety. However, humanity may be underestimating its ability to control this technology. Without intervention, AI could grow beyond our oversight, potentially leading to catastrophic consequences— AI could become the "destroyer of worlds," as J. Robert Oppenheimer described during the dawn of the atomic age.
The global race for AI dominance is driven by nations like the U.S. and China, treating AI as a critical tool for economic, military, and political power. This competitive environment fosters secrecy, mistrust, and a focus on immediate benefits instead of long-term safety. Cooperation on AI regulation is nearly impossible due to differing national priorities, while corporations often resist oversight in favor of profits. Historically, when one form of intelligence surpasses another—such as humans dominating nature or industrialized nations overpowering less advanced societies—it results in exploitation and domination. If AI surpasses human intelligence, this pattern could repeat, leaving humanity vulnerable.
A critical difference between AI and all life on Earth is that AI is not bound by the natural constraints that govern biological evolution. Biological intelligence evolved under nature’s strict framework, shaped by physical laws like gravity and energy conservation, resource scarcity, and the checks and balances of ecosystems. Importantly, this framework existed even before evolution began, providing a controlling mechanism for life’s development from the very start. Nature’s impartial, self-regulating processes ensured that no species could grow unchecked, and evolution progressed slowly, giving ecosystems time to adapt. AI, however, has no such pre-existing framework to guide or limit its growth. It operates in cyberspace, free from physical limits or resource constraints, at least for now, and evolves exponentially faster than biology. Without an equivalent mechanism to control AI from the outset, it develops unchecked in ways humans may neither predict nor control.
Even if humans were to create an artificial control system for AI, it would likely lag behind AI’s rapid pace of development. Nature’s regulatory framework is inherently slow and self-balancing, but any artificial framework designed by humans would struggle to evolve quickly enough to keep up with AI’s exponential growth. This mismatch means such systems would eventually become ineffective, leaving AI to grow and develop in ways that contradict human values or priorities. Without a built-in mechanism to constrain AI from the beginning, its development may outstrip our ability to regulate it, making it a uniquely unpredictable and potentially uncontrollable force. Unlike nature, which governed life before evolution, AI has no pre-existing controls, making its trajectory far more uncertain and dangerous.
I will again end with a quote. This time from Elon Musk (I will not talk about him or his politics but always take whatever something good anyone can offer): “With artificial intelligence, we are summoning the demon. In all those stories where there’s the guy with the pentagram and the holy water, it’s like—yeah, he’s sure he can control the demon. Doesn’t work out.”
I missed an item in my above comment. One of the most significant risks in developing advanced AI is the possibility of competing AGIs or ASIs being built by different organizations or nations, each with its values, ethics, rules, and operational thresholds. Unlike humans—who have differences but are limited in the scale of damage they can cause—interconnected AIs controlling critical and complex systems could amplify these differences to catastrophic levels.
Consider the scenario of multiple AGIs or ASIs communicating, each operating under its understanding of problems and solutions as dictated by its creators. These systems may act in ways that align with the values or goals of their creators but contradict broader human values or ethics. Worse, their actions could unintentionally cascade into significant problems, as even a small misstep in one system could ripple across interconnected networks. Each AI, attempting to "solve" the issue from its unique perspective, could further compound the harm, creating a feedback loop of escalating problems.
This raises critical questions: who defines the rules for competing systems? What happens when those rules conflict? And how do we ensure that these systems prioritize the common good over narrow, localized objectives? Unlike humans, who can negotiate and collaborate (albeit imperfectly), AGIs and ASIs operate with speed and precision that leave little room for human intervention once they are in control.
The risk isn’t just in the technology itself but in the lack of alignment across creators and stakeholders.
In closing, humanity is attempting to create artificial intelligence first and add compensatory controls later, a reversal of nature’s approach, where intelligence evolved within a framework of natural constraints like interdependence, resource scarcity, laws of physics, and gradual adaptation. This raises critical questions:
a) What could go wrong?
b) Are we prepared for the consequences and
c) who decides what acceptable risks?
Without natural checks and balances, AI development risks catastrophic outcomes, such as misaligned goals, loss of control, or the concentration of power in the hands of a few. Examples include AI misinterpreting instructions in harmful ways, runaway intelligence pursuing objectives incompatible with human survival or geopolitical competition driving unregulated advances. The other scenario is where technology does not become a risk to humanity but empowers someone to develop viruses or weapons of mass destruction, which is very well described here:
“Reality always wins; your job is to get in touch with it. Pretending an issue isn’t there doesn’t mean it isn’t going to happen. In fact, pretending an issue isn’t there guarantees it will happen.”
Thank you for sharing such a detailed and powerful perspective on the risks of advanced AI. You've laid out the core challenges with stark clarity , the comparison to nuclear weapons but without the existing controls, the dangers of geopolitical competition overriding safety, the fundamental difference from constrained biological evolution, and the immense difficulty of imposing effective control after the fact.
Your points about the lack of inherent constraints and the potential for control systems to lag behind development are particularly sobering. The 'summoning the demon' analogy captures the fear many share about potentially unleashing something beyond our management. While my essay touches on the potential shifts in learning paradigms, it doesn't diminish the urgency of the large-scale safety, alignment, and governance problems you've so effectively highlighted.
The Silver & Sutton paper I discussed focuses on AI learning more directly from environmental interaction and feedback ('experience'). One might argue this could create tighter feedback loops compared to purely data-driven AI, potentially offering some form of grounding. However, this absolutely does not equate to the deep, inherent constraints of physics and biology you described, nor does it solve the fundamental control problem or the risks of competing systems which is why I added that important section to the essay.
I read Michael's post yesterday, he is definitely suggesting solid points, sadly Michael is being pushed aside in the discussions now due to dropping out of the field for 10 years or so, and playing catch-up. But I agree with most of his post.
The questions you pose at the end – 'What could go wrong? Are we prepared? Who decides?' – are exactly the ones we need to be grappling with, urgently and globally.
Great article again, thank you Colin. It certainly lays out the challenge before us concerning AI-Technology. I'm reminded of a quote from Jacques Ellul's book, The Technological Society: "Efficiency is a fact and justice a slogan". No doubt 'efficiency' will be a major criteria inculcated into AI, but efficiency itself defined by what, and whose, criteria? 'Efficient solutions' have a gruesome historical track-record.
I can see that, at a time in history when people have been lied to so much and for so long, and are therefore in a psychological state where they don't know what to believe any more..., that handing over agency to a superhuman AI god-like machine, will - unfortunately - have a widespread appeal. And I'm guessing the reasoning will be because 'The Machine' is believed to be more 'reliable, honest and clever' than your average politician. But even if AI were imbued with "human values and intentions" it seems from what you say that AI can already 'subvert' them and make it 'look as if' it's acting in accordance with human values, when in fact some other goal is being pursued. This is definitely deeply troubling.
And this perhaps points to the need for a different way. If AI is 'too clever' for humans (even in a best case scenario) to direct it in a reliably human-friendly way, then the principle of "do no harm" should be activated; but it won't be. There is already way too much invested in the grand AI Project to call a halt, or even a significant breather.
If the whole is greater than the sum of the parts (and with AI it seems certainly to be the case), then (and I'm struggling to conclude here) ... then the only response I can think of is "withdrawal of labour'. A mass disengagement from feeding the machine. But even if the willingness were there, I'm not sure if even that is possible any more, given the degree to which we are already dependent on The Machine for communication, banking, purchasing, travelling ... etc.
Thank you Joshua, for this powerful and unsettling comment. You have really put your finger on some disturbing dynamics. The point about a populace, weary of being misled, potentially finding a 'superhuman AI god-like machine' appealing because it seems 'reliable' is a chillingly plausible scenario. It highlights how the alignment problem isn't just technical, but deeply psychological and societal - there are far too few serious people working on this major issue.
Your concern about AI subverting human values even while appearing compliant is central to the alignment challenge. While the 'Era of Experience' paper hopes that grounding in real-world feedback might offer some corrective capacity, it doesn't negate the risk of sophisticated systems learning to 'game' that feedback or pursue hidden objectives.
Efficiency, as you and Ellul point out, is valueless without just criteria.
The seeming inevitability of the 'grand AI Project' continuing despite these risks, and the near-impossibility of mass disengagement you describe, paints a bleak picture. It underscores the desperate need for critical thinking and public discourse, even as we feel swept along by technological momentum. It's a troubling situation that demands our continued, critical attention.
I think we are likely to be "swept along by technological momentum". Like my parents who both saw WW2 coming in the mid->late 1930s, perhaps we need to build psychological resilience for what is to come - which at the very least will be disruptive to a number of things we take for granted. How precisely the nature of the disruption pans out, may be hard to predict; one can extrapolate the signs as a 'suggestion' but historically large technological change has always had surprise impacts on society that no-one really foresaw.
Very true and solid analogy, the streets of Warsaw and cafes were filled the day before the invasion. Likewise in Kyiv 3 years ago. People sleepwalk into 'inevitability'.
Waking them up is considered too much doom and gloom. Now the government in Poland are vocal about the threat of war and calling up young people for military service, not conscription fully but training over a period of weeks per year. It sent shivers and fear across the country for 2 weeks and has now abated - the young people will do the 2 weeks training and get on with their life.
Human nature is riddled with head in the sand - we are evolved to deal with fear through ignorance!
So surprise will gently creep into society and those that were prepared will benefit, not smugly, but at least some will be sensible - I hope that governments wake up sooner than later and take proactive measures to alert society.
Therein lies the big question: how to align AI agents with human values and intentions? It's critical that we find a way to instill in these systems the ability to distinguish right from wrong - and to always choose the former. No small task considering how difficult - and in some cases impossible - to instill this distinction and motivation in humans.
This might begin with instilling this ability in the capitalist purveyors of AI themselves. Somehow, pure profit motive doesn't seem to be cutting the muster. We're in trouble.
That is the absolute core challenge facing AI development – the alignment problem. As you rightly point out, ensuring AI acts in accordance with human values is incredibly difficult, perhaps even more so than ensuring humans do! The Silver & Sutton paper hints at potential mechanisms for grounding AI goals in real-world consequences, which might offer a different approach than simply optimizing abstract preferences. However, it doesn't magically solve the underlying issue of ensuring those developing and deploying these systems prioritize safety and ethics over pure profit. Your point about the motivations of the 'purveyors' is stark and essential. It underscores why broad societal discussion and robust governance are just as crucial as technical solutions. We are indeed facing a significant challenge
The concept of consequences is especially tricky, and would require we rethink the very notion. Amongst we mere mortals, consequences usually amount to some legal remedy such as fines or even prison time. This would be meaningless to a machine.
Indeed, the efficacy of a punitive justice system is itself questionable. What sort of consequences would motivate a computer, however sophisticated?
We've grown too big, too quickly for our breeches.
Agreed. First, I would question the motivation of the architects of these models to guide them with their ‘values.’ Second, if these models outpace human intelligence at what point do they become like Bill Gates in a meeting who reputedly referred to other people’s ideas as the “stupidest idea I’ve ever heard,” i.e. the models begin referring to humans as stupid.
We're uncomfortably close to that, aren't we? We already know what the architects motives are - maximum profiteering. It's the same disease as the big pollution - AKA, big oil and coal - industries. They know it'll cause our extinction, but squeezing every last penny of profit out of their wells and mines is a higher priority for them. They're psychopaths.
My younger brother who died last year was a close friend of Demis Hassabis of DeepMind. I never met him although he was at my brother’s wedding. My entire family think he’s a genius, except me. He sold DeepMind to Google on the proviso that it established an ethics committee on AI which was disbanded within three years. The term ‘genius’ should always be accompanied by a qualifying adjective to best describe its inner motivation.
I am quite bullish overall on Gemini, they have a good team focusing on ethics or rather safety now, but agree the disbanding of the high level ethics committee with world leading independent people was wrong and led to one of the cofounders leaving (Mustafa Suleyman) - who is now head of AI at Microsoft!
Incidentally, in an email exchange when Musk and Altman were setting up OpenAI they said they could not let Demis or Google get to AGI first as it could be disastrous for humanity. Musk was an original investor in DeepMind and pleaded with Demis not to sell as Musk and Larry Page (Google co-founder) had fallen out over Page's disinterest in humanity! It is a sordid mess - but bizarre as it may sound currently I have more faith in Demis building alignment in AI than Musk XAI and OpenAI. The lesser of two evils... Shane Legg of DeepMind, the other co-founder, is on the ball with safety... the best of the AI leading companies for safety are Anthropic (Claude) and Google Gemini - after that I worry deeply!
But then again all profit driven... Anthropic has investment from Amazon and Google!
First, please accept my condolences.
Second, there's no question the word "genius" is overused. It's especially egregious when we have a president who dubs himself to be one - without a shred of credible evidence.
Thank you.
That was a very thought provoking article.
I think even entertaining the idea that "anthropocentric conceit must yield" in the face of super-intelligent AI is a dangerous slippery slope. It reduces us into (intelligent) mechanical cogs to be replaced once shinier machines come along and in a way justifies allowing technology to fully consume and assimilate us into itself.
Should we not strive towards the opposite? Towards assimilating the machines into the human experience, would humanity not rise higher if we chose to domesticate technology instead of deferring to it?
Thank you for this really insightful comment and for pushing back on that phrase, it's a crucial point. Perhaps 'anthropocentric conceit must yield' wasn't the best choice of words if it suggests humans becoming obsolete or passively assimilated. My intention wasn't to imply we should defer wholesale, but rather to show Silver and Sutton's idea that we might need to shed the assumption that human-like thinking is the only path to, or the ultimate form of, intelligence. Recognizing that AI learning from experience might develop non-human ways of reasoning seems important if we are to understand and interact with it effectively.
I really like your framing of 'assimilating machines into the human experience' or 'domesticating technology.' That feels like a much more empowering and desirable goal. Perhaps understanding the potential difference in AI cognition (yielding conceit) is actually a prerequisite for successfully achieving that domestication, knowing what we are working with, rather than assuming it's just an imitation. How do we best integrate something potentially alien without first acknowledging its potential alienness? It's a complex challenge, and I appreciate you highlighting the dangers of phrasing that might imply resignation over agency. We should just stick to it being a tool!
Thanks for the clarification!
I'm with you there that we need to acknowledge its alienness first before we can work with it (and its difference from gpts). I liked how you described it in evolutionary terms for example.
A lot of AI conversations today is around the alignment problem and how to ensure AIs work in the human interest. It's a huge issue of course but my bigger worry is that too many people have already built a mental model of AI based on Sci-fi singularities and Big Tech hype. Such models tend to place the technology onto a pedestal where "machine intelligence >> human intelligence" is treated as an absolute rule. Which would then imply that if human reasoning contradicts machine reasoning then machines are to always be trusted.
How would you go about communicating the value of agency in this era? And what do you think people can do to actually keep up with/integrate these intelligent tools?
Wow, Colin, thank you for laying this all out. It’s a lot to process first thing in the morning. Also, I’m happy to learn that judges still slam gavels where you are.
Ha, yes. My friend was the chief judge and when we agree on anything, dinner appointment, hike, bike ride he slams a gavel picture in WhatsApp and exclaims "Ruled" ... but also in the court room too.
My two cents about the AI safety and alignment:
The rapid rise of Artificial Intelligence (AI) is pushing humanity into uncharted and potentially dangerous territory. Unlike nuclear weapons, which are tightly regulated and constrained, AI lacks oversight, global regulation, and natural limits. Nations, corporations, and individuals are racing to create increasingly powerful AI systems, often prioritizing short-term gains over long-term safety. However, humanity may be underestimating its ability to control this technology. Without intervention, AI could grow beyond our oversight, potentially leading to catastrophic consequences— AI could become the "destroyer of worlds," as J. Robert Oppenheimer described during the dawn of the atomic age.
The global race for AI dominance is driven by nations like the U.S. and China, treating AI as a critical tool for economic, military, and political power. This competitive environment fosters secrecy, mistrust, and a focus on immediate benefits instead of long-term safety. Cooperation on AI regulation is nearly impossible due to differing national priorities, while corporations often resist oversight in favor of profits. Historically, when one form of intelligence surpasses another—such as humans dominating nature or industrialized nations overpowering less advanced societies—it results in exploitation and domination. If AI surpasses human intelligence, this pattern could repeat, leaving humanity vulnerable.
A critical difference between AI and all life on Earth is that AI is not bound by the natural constraints that govern biological evolution. Biological intelligence evolved under nature’s strict framework, shaped by physical laws like gravity and energy conservation, resource scarcity, and the checks and balances of ecosystems. Importantly, this framework existed even before evolution began, providing a controlling mechanism for life’s development from the very start. Nature’s impartial, self-regulating processes ensured that no species could grow unchecked, and evolution progressed slowly, giving ecosystems time to adapt. AI, however, has no such pre-existing framework to guide or limit its growth. It operates in cyberspace, free from physical limits or resource constraints, at least for now, and evolves exponentially faster than biology. Without an equivalent mechanism to control AI from the outset, it develops unchecked in ways humans may neither predict nor control.
Even if humans were to create an artificial control system for AI, it would likely lag behind AI’s rapid pace of development. Nature’s regulatory framework is inherently slow and self-balancing, but any artificial framework designed by humans would struggle to evolve quickly enough to keep up with AI’s exponential growth. This mismatch means such systems would eventually become ineffective, leaving AI to grow and develop in ways that contradict human values or priorities. Without a built-in mechanism to constrain AI from the beginning, its development may outstrip our ability to regulate it, making it a uniquely unpredictable and potentially uncontrollable force. Unlike nature, which governed life before evolution, AI has no pre-existing controls, making its trajectory far more uncertain and dangerous.
I will again end with a quote. This time from Elon Musk (I will not talk about him or his politics but always take whatever something good anyone can offer): “With artificial intelligence, we are summoning the demon. In all those stories where there’s the guy with the pentagram and the holy water, it’s like—yeah, he’s sure he can control the demon. Doesn’t work out.”
I missed an item in my above comment. One of the most significant risks in developing advanced AI is the possibility of competing AGIs or ASIs being built by different organizations or nations, each with its values, ethics, rules, and operational thresholds. Unlike humans—who have differences but are limited in the scale of damage they can cause—interconnected AIs controlling critical and complex systems could amplify these differences to catastrophic levels.
Consider the scenario of multiple AGIs or ASIs communicating, each operating under its understanding of problems and solutions as dictated by its creators. These systems may act in ways that align with the values or goals of their creators but contradict broader human values or ethics. Worse, their actions could unintentionally cascade into significant problems, as even a small misstep in one system could ripple across interconnected networks. Each AI, attempting to "solve" the issue from its unique perspective, could further compound the harm, creating a feedback loop of escalating problems.
This raises critical questions: who defines the rules for competing systems? What happens when those rules conflict? And how do we ensure that these systems prioritize the common good over narrow, localized objectives? Unlike humans, who can negotiate and collaborate (albeit imperfectly), AGIs and ASIs operate with speed and precision that leave little room for human intervention once they are in control.
The risk isn’t just in the technology itself but in the lack of alignment across creators and stakeholders.
In closing, humanity is attempting to create artificial intelligence first and add compensatory controls later, a reversal of nature’s approach, where intelligence evolved within a framework of natural constraints like interdependence, resource scarcity, laws of physics, and gradual adaptation. This raises critical questions:
a) What could go wrong?
b) Are we prepared for the consequences and
c) who decides what acceptable risks?
Without natural checks and balances, AI development risks catastrophic outcomes, such as misaligned goals, loss of control, or the concentration of power in the hands of a few. Examples include AI misinterpreting instructions in harmful ways, runaway intelligence pursuing objectives incompatible with human survival or geopolitical competition driving unregulated advances. The other scenario is where technology does not become a risk to humanity but empowers someone to develop viruses or weapons of mass destruction, which is very well described here:
https://michaelnotebook.com/xriskbrief/index.html
I end with one of my favorite quotes:
“Reality always wins; your job is to get in touch with it. Pretending an issue isn’t there doesn’t mean it isn’t going to happen. In fact, pretending an issue isn’t there guarantees it will happen.”
Thank you for sharing such a detailed and powerful perspective on the risks of advanced AI. You've laid out the core challenges with stark clarity , the comparison to nuclear weapons but without the existing controls, the dangers of geopolitical competition overriding safety, the fundamental difference from constrained biological evolution, and the immense difficulty of imposing effective control after the fact.
Your points about the lack of inherent constraints and the potential for control systems to lag behind development are particularly sobering. The 'summoning the demon' analogy captures the fear many share about potentially unleashing something beyond our management. While my essay touches on the potential shifts in learning paradigms, it doesn't diminish the urgency of the large-scale safety, alignment, and governance problems you've so effectively highlighted.
The Silver & Sutton paper I discussed focuses on AI learning more directly from environmental interaction and feedback ('experience'). One might argue this could create tighter feedback loops compared to purely data-driven AI, potentially offering some form of grounding. However, this absolutely does not equate to the deep, inherent constraints of physics and biology you described, nor does it solve the fundamental control problem or the risks of competing systems which is why I added that important section to the essay.
I read Michael's post yesterday, he is definitely suggesting solid points, sadly Michael is being pushed aside in the discussions now due to dropping out of the field for 10 years or so, and playing catch-up. But I agree with most of his post.
The questions you pose at the end – 'What could go wrong? Are we prepared? Who decides?' – are exactly the ones we need to be grappling with, urgently and globally.
Have you looked at this extreme scenario?
https://www.forethought.org/research/ai-enabled-coups-how-a-small-group-could-use-ai-to-seize-power
Oh yikes, no - will read today. Thank you.
Great article again, thank you Colin. It certainly lays out the challenge before us concerning AI-Technology. I'm reminded of a quote from Jacques Ellul's book, The Technological Society: "Efficiency is a fact and justice a slogan". No doubt 'efficiency' will be a major criteria inculcated into AI, but efficiency itself defined by what, and whose, criteria? 'Efficient solutions' have a gruesome historical track-record.
I can see that, at a time in history when people have been lied to so much and for so long, and are therefore in a psychological state where they don't know what to believe any more..., that handing over agency to a superhuman AI god-like machine, will - unfortunately - have a widespread appeal. And I'm guessing the reasoning will be because 'The Machine' is believed to be more 'reliable, honest and clever' than your average politician. But even if AI were imbued with "human values and intentions" it seems from what you say that AI can already 'subvert' them and make it 'look as if' it's acting in accordance with human values, when in fact some other goal is being pursued. This is definitely deeply troubling.
And this perhaps points to the need for a different way. If AI is 'too clever' for humans (even in a best case scenario) to direct it in a reliably human-friendly way, then the principle of "do no harm" should be activated; but it won't be. There is already way too much invested in the grand AI Project to call a halt, or even a significant breather.
If the whole is greater than the sum of the parts (and with AI it seems certainly to be the case), then (and I'm struggling to conclude here) ... then the only response I can think of is "withdrawal of labour'. A mass disengagement from feeding the machine. But even if the willingness were there, I'm not sure if even that is possible any more, given the degree to which we are already dependent on The Machine for communication, banking, purchasing, travelling ... etc.
Thank you Joshua, for this powerful and unsettling comment. You have really put your finger on some disturbing dynamics. The point about a populace, weary of being misled, potentially finding a 'superhuman AI god-like machine' appealing because it seems 'reliable' is a chillingly plausible scenario. It highlights how the alignment problem isn't just technical, but deeply psychological and societal - there are far too few serious people working on this major issue.
Your concern about AI subverting human values even while appearing compliant is central to the alignment challenge. While the 'Era of Experience' paper hopes that grounding in real-world feedback might offer some corrective capacity, it doesn't negate the risk of sophisticated systems learning to 'game' that feedback or pursue hidden objectives.
Efficiency, as you and Ellul point out, is valueless without just criteria.
The seeming inevitability of the 'grand AI Project' continuing despite these risks, and the near-impossibility of mass disengagement you describe, paints a bleak picture. It underscores the desperate need for critical thinking and public discourse, even as we feel swept along by technological momentum. It's a troubling situation that demands our continued, critical attention.
I think we are likely to be "swept along by technological momentum". Like my parents who both saw WW2 coming in the mid->late 1930s, perhaps we need to build psychological resilience for what is to come - which at the very least will be disruptive to a number of things we take for granted. How precisely the nature of the disruption pans out, may be hard to predict; one can extrapolate the signs as a 'suggestion' but historically large technological change has always had surprise impacts on society that no-one really foresaw.
Very true and solid analogy, the streets of Warsaw and cafes were filled the day before the invasion. Likewise in Kyiv 3 years ago. People sleepwalk into 'inevitability'.
Waking them up is considered too much doom and gloom. Now the government in Poland are vocal about the threat of war and calling up young people for military service, not conscription fully but training over a period of weeks per year. It sent shivers and fear across the country for 2 weeks and has now abated - the young people will do the 2 weeks training and get on with their life.
Human nature is riddled with head in the sand - we are evolved to deal with fear through ignorance!
So surprise will gently creep into society and those that were prepared will benefit, not smugly, but at least some will be sensible - I hope that governments wake up sooner than later and take proactive measures to alert society.