Great minds think alike! I just wrote about this here: AI’s Raised Bar Paradox. “In this brave new world, the disenfranchised lose not only their jobs, but also their ability to think independently, making them structurally dependent on AI systems and, by extension, those who control them.”
Everyone is talking about this study I don't buy it. The methodological shortcomings and the specific issues with the EEG analysis confound each other. Everything is intertwined. It seems unreliable and speculative.
Start with the problematic Session 4 design, with its abrupt group reassignments and personalized prompts. The novelty is confounding, there are withdrawal effects, and prior learning directly impacts cognitive load and strategies. It is impossible to disentangle EEG data showing corresponding shifts in neural connectivity from genuine "cognitive debt" or simply the brain struggling to rapidly adapt to a suddenly removed tool after previous reliance. Higher neural activity in the "Brain-to-LLM" group in Session 4 could also due to the novelty of using an AI, or a well-exercised brain effectively integrating new information, not the AI's pure cognitive benefit.
Also, the reliance on subjective self-reported measures (essay ownership and satisfaction) in the context of documented biases in human and AI judge scoring, complicates correlation of observed neural patterns and behavioral outcomes. If the behavioral data itself is questionable, then linking complex dDTF connectivity shifts (already presented with qualitative visual distinctions between "weak" and "strong" significance ) creates a house of cards. The selective reporting of EEG data (the exclusion of spectral power changes and EOG) denies a more complete and localized neural picture. In short, I don't see how the authors can argue for direct links between specific tool usage, cognitive engagement, and measurable brain changes.
Thank you Hollis, I raised concerns about the low number of participants in the study, but pushed send before I had finished my critique. It was supposed to go out today as I already published on cognitive warfare yesterday. So I was surprised when I woke up this morning!
You are right to question the methodology.
On the Session 4 Design: You have correctly identified that the Session 4 design is fraught with potential confounders. The abrupt reassignment absolutely introduces novelty and withdrawal effects. However, one could argue that these aren't just flaws; they are the very phenomena the study aims to capture.
The "withdrawal effect" seen in the LLM-to-Brain group is, in essence, the behavioral manifestation of the "cognitive debt" the authors propose. The brain's "struggle to adapt" is the core observation, suggesting that prior reliance on the tool changed the cognitive approach.
Similarly, the "higher neural activity" in the Brain-to-LLM group isn't presented as a pure benefit of AI. Rather, the authors frame it as the signature of a brain that, having already done the hard work of structuring its own thoughts, is now engaging in the intense process of integrating a new tool. It's the difference between using AI as a substitute and using it as a challenging 'sparring partner', as I write.
The design is certainly messy, but it attempts to model the real-world cognitive shock of switching between assisted and unassisted modes, which is arguably more valuable than a perfectly sterile but less applicable lab condition.
On the Behavioral Data and Scoring: Your point about subjective measures is highly relevant. Self-reported "ownership" is soft data. However, the study's strength comes from triangulation. The subjective feeling of low ownership in the LLM group is powerfully corroborated by a more objective behavioral measure: their documented, statistically significant inability to recall direct quotes from their own text.
Likewise, by including the analysis of biased human and AI scoring, the authors aren't undermining their study; they're highlighting one of its most chilling paradoxes. The fact that "soulless" essays (as teachers described them) can score higher on superficial metrics is a core part of the "cognitive debt" argument: the output can appear to improve even as the thinker recedes.
On the EEG Analysis: This, in my opinion, is the strongest point of criticism. The dDTF analysis is complex, and the exclusion of other measures like spectral power or EOG does present an incomplete picture. The authors chose a tool that specifically measures directional connectivity because their question was about how different brain networks communicate during these tasks.
While the visual summaries in the main text might appear qualitative ("weak" vs. "strong"), the study's appendix contains extensive tables with the actual quantitative dDTF values, statistical comparisons, and p-values for those who want to scrutinize the raw analysis.
Ultimately, the study's strength comes from the convergence of evidence:
Neural Data: Shows a clear hierarchy of engagement (Brain > Search > LLM).
Behavioral Data: Shows a matching hierarchy in memory and ownership.
Linguistic Data: Shows a corresponding hierarchy in originality vs. homogeneity.
No single strand is perfect, but together they point in a consistent and provocative direction. You are absolutely right that it is difficult to prove direct causation. The authors themselves call their study a "preliminary guide." It doesn't offer a final, definitive answer, but it presents a compelling, multi-modal correlation between unreflective AI use and a measurable decrease in cognitive engagement. It's a speculative but well-documented first step and more robust research is needed.
But, in my experience, the study points to something I have 'observed' with over 1000 students and professors.
Yes this last point "But, in my experience, the study points to something I have 'observed' with over 1000 students and professors" is what troubles me. The study seems designed to confirm all of our priors, and this *always* makes me suspicious. One doesn't learn anything (I didn't learn anything, you didn't learn anything, if it confirms your priors) yet there seems to be data now that we have to *do* something with. This is not science, to me. Nothing has been learned!
The instinct to be suspicious when a study confirms our priors is not just healthy; it's essential. Well said.
I think the question you are asking, "What have we actually learned?", can be answered in two ways.
If we define learning as discovering something we did not already suspect, then you are right. We haven't learned much. Many of us already had a strong intuition that outsourcing thinking would lead to diminished engagement.
But there's another, equally vital role for science: to take a widely held, qualitative observation and provide a quantitative, mechanistic explanation for it.
As an analogy: for centuries, people "knew" that being sick often involves a fever. A study confirming that link would teach us nothing. What science taught us was the mechanism: the role of pathogens, the immune system's inflammatory response, hypothalamic regulation. That mechanistic understanding is what allowed us to move from folk remedies to targeted medicine.
This study, for all its methodological flaws, is a preliminary attempt to do the same. Many educators, like me "observe" that something is different about how students engage with AI-generated text. This study offers a potential neurocognitive mechanism. What we've learned isn't that students seem disengaged, but what that disengagement might look like in the brain: a measurable downshift in the specific neural networks associated with memory encoding (alpha/theta waves) and executive function.
It moves the conversation from "I feel like this is happening" to "Here is a testable, physiological correlate of what we are seeing." It gives us a new layer of granularity, the ability to distinguish the cognitive footprint of a Search Engine from that of an LLM, for instance. That, I would argue, is new knowledge.
You are right to worry about what we're supposed to do with this imperfect data. I don't see it as a mandate for a specific policy. I see it as a tool that reframes the problem. It suggests we should be less concerned with just "catching cheaters" and more concerned with designing assignments and tools that explicitly demand the kind of cognitive integration that lights up the brain. Which is why I write "We must teach students not just how to use these tools, but how, when and why."
So, while nothing has been "proven" definitively, I don't think it's true that "nothing has been learned." We've simply replaced a general suspicion with a specific, albeit preliminary, overview of the cognitive firing, and sometimes lack thereof. And that feels like a necessary first step.
Well you are an optimist. I spent many years back at Johns Hopkins looking at brains playing music, listening to music, being surprised by music, and I'm more suspicious than the usual technology and how we've come to intuit it all (brains lighting up!) than anything else. I appreciate (truly) your optimism. I think there are different ways to think about what will happen to different groups in working with LLMs differently, over time.
I'm currently reviewing Adam Zeman's work on imagination and aphantasia. So I'm steeped in the brains and our misconceptions. 10 years ago nobody knew about aphantasia. Some of us embed all of our imagination in language. Some don't. Are those of us who do more immune to whatever is going on with LLMs than others? These are more interesting questions to me.
Oh, I think I am pessimistic about humanity. I see many instances of laziness, giving over cognitive faculties to the LLM, presenting the work, without scrutinizing it and not building the knowledge. I am very worried about this.
The work you are doing is very interesting. This could be impactful on many levels. I will look at Adam Zeman's work and look forward to learning more from yours.
I'll chime in to add that this work - esp the EEG results - is aligned with a voluminous literature on memory and learning that goes back many many decades and precedes fancy measurement tools. So those of us in or adjacent to the cognitive science world definitely would have anticipated this, not because of priors with students, but because of priors in research. I think the ownership/memory findings are predictable as well, but I am thinking about the implications those findings might have more broadly - if I get somewhere interesting, will share.
Personally, I'd much rather write my own thoughts than let a machine generate them for me. We need to revise the goals of pedagogy. Grading students creates an artificial goal to achieve, thus encouraging taking shortcuts and even outright cheating. This isn't a new problem. How often have we heard that a student might be especially adept at passing tests, but is otherwise mediocre? Conversely, how often have we seen someone who is weak at test taking suddenly shine in more substantive areas?
//
"The MIT study is not merely about pedagogy. It is about sovereignty of thought".
Ah, but that begins with pedagogy! It brings to mind memories of my grade school days, with teachers sternly demanding obedience above all else. I've been told this is no longer the case, yet we still use much of the same methodology and expectations designed to induce conformity over thinking for oneself - that is, sovereignty of thought.
//
"The solution is not prohibition. It is reorientation. We must build systems that enhance engagement, not mute it. We must teach students not just how to use these tools, but how, when and why".
That reorientation will necessitate transforming the grade motive to something more substantive. I don't know exactly what or how, but grade seeking is problematic in its own right, and severely exacerbated by the advent of AI.
Great minds think alike! I just wrote about this here: AI’s Raised Bar Paradox. “In this brave new world, the disenfranchised lose not only their jobs, but also their ability to think independently, making them structurally dependent on AI systems and, by extension, those who control them.”
(https://www.whitenoise.email/p/ais-raised-bar-paradox)
Super, thank you Tom. I will check it out now
Everyone is talking about this study I don't buy it. The methodological shortcomings and the specific issues with the EEG analysis confound each other. Everything is intertwined. It seems unreliable and speculative.
Start with the problematic Session 4 design, with its abrupt group reassignments and personalized prompts. The novelty is confounding, there are withdrawal effects, and prior learning directly impacts cognitive load and strategies. It is impossible to disentangle EEG data showing corresponding shifts in neural connectivity from genuine "cognitive debt" or simply the brain struggling to rapidly adapt to a suddenly removed tool after previous reliance. Higher neural activity in the "Brain-to-LLM" group in Session 4 could also due to the novelty of using an AI, or a well-exercised brain effectively integrating new information, not the AI's pure cognitive benefit.
Also, the reliance on subjective self-reported measures (essay ownership and satisfaction) in the context of documented biases in human and AI judge scoring, complicates correlation of observed neural patterns and behavioral outcomes. If the behavioral data itself is questionable, then linking complex dDTF connectivity shifts (already presented with qualitative visual distinctions between "weak" and "strong" significance ) creates a house of cards. The selective reporting of EEG data (the exclusion of spectral power changes and EOG) denies a more complete and localized neural picture. In short, I don't see how the authors can argue for direct links between specific tool usage, cognitive engagement, and measurable brain changes.
Thank you Hollis, I raised concerns about the low number of participants in the study, but pushed send before I had finished my critique. It was supposed to go out today as I already published on cognitive warfare yesterday. So I was surprised when I woke up this morning!
You are right to question the methodology.
On the Session 4 Design: You have correctly identified that the Session 4 design is fraught with potential confounders. The abrupt reassignment absolutely introduces novelty and withdrawal effects. However, one could argue that these aren't just flaws; they are the very phenomena the study aims to capture.
The "withdrawal effect" seen in the LLM-to-Brain group is, in essence, the behavioral manifestation of the "cognitive debt" the authors propose. The brain's "struggle to adapt" is the core observation, suggesting that prior reliance on the tool changed the cognitive approach.
Similarly, the "higher neural activity" in the Brain-to-LLM group isn't presented as a pure benefit of AI. Rather, the authors frame it as the signature of a brain that, having already done the hard work of structuring its own thoughts, is now engaging in the intense process of integrating a new tool. It's the difference between using AI as a substitute and using it as a challenging 'sparring partner', as I write.
The design is certainly messy, but it attempts to model the real-world cognitive shock of switching between assisted and unassisted modes, which is arguably more valuable than a perfectly sterile but less applicable lab condition.
On the Behavioral Data and Scoring: Your point about subjective measures is highly relevant. Self-reported "ownership" is soft data. However, the study's strength comes from triangulation. The subjective feeling of low ownership in the LLM group is powerfully corroborated by a more objective behavioral measure: their documented, statistically significant inability to recall direct quotes from their own text.
Likewise, by including the analysis of biased human and AI scoring, the authors aren't undermining their study; they're highlighting one of its most chilling paradoxes. The fact that "soulless" essays (as teachers described them) can score higher on superficial metrics is a core part of the "cognitive debt" argument: the output can appear to improve even as the thinker recedes.
On the EEG Analysis: This, in my opinion, is the strongest point of criticism. The dDTF analysis is complex, and the exclusion of other measures like spectral power or EOG does present an incomplete picture. The authors chose a tool that specifically measures directional connectivity because their question was about how different brain networks communicate during these tasks.
While the visual summaries in the main text might appear qualitative ("weak" vs. "strong"), the study's appendix contains extensive tables with the actual quantitative dDTF values, statistical comparisons, and p-values for those who want to scrutinize the raw analysis.
Ultimately, the study's strength comes from the convergence of evidence:
Neural Data: Shows a clear hierarchy of engagement (Brain > Search > LLM).
Behavioral Data: Shows a matching hierarchy in memory and ownership.
Linguistic Data: Shows a corresponding hierarchy in originality vs. homogeneity.
No single strand is perfect, but together they point in a consistent and provocative direction. You are absolutely right that it is difficult to prove direct causation. The authors themselves call their study a "preliminary guide." It doesn't offer a final, definitive answer, but it presents a compelling, multi-modal correlation between unreflective AI use and a measurable decrease in cognitive engagement. It's a speculative but well-documented first step and more robust research is needed.
But, in my experience, the study points to something I have 'observed' with over 1000 students and professors.
Yes this last point "But, in my experience, the study points to something I have 'observed' with over 1000 students and professors" is what troubles me. The study seems designed to confirm all of our priors, and this *always* makes me suspicious. One doesn't learn anything (I didn't learn anything, you didn't learn anything, if it confirms your priors) yet there seems to be data now that we have to *do* something with. This is not science, to me. Nothing has been learned!
The instinct to be suspicious when a study confirms our priors is not just healthy; it's essential. Well said.
I think the question you are asking, "What have we actually learned?", can be answered in two ways.
If we define learning as discovering something we did not already suspect, then you are right. We haven't learned much. Many of us already had a strong intuition that outsourcing thinking would lead to diminished engagement.
But there's another, equally vital role for science: to take a widely held, qualitative observation and provide a quantitative, mechanistic explanation for it.
As an analogy: for centuries, people "knew" that being sick often involves a fever. A study confirming that link would teach us nothing. What science taught us was the mechanism: the role of pathogens, the immune system's inflammatory response, hypothalamic regulation. That mechanistic understanding is what allowed us to move from folk remedies to targeted medicine.
This study, for all its methodological flaws, is a preliminary attempt to do the same. Many educators, like me "observe" that something is different about how students engage with AI-generated text. This study offers a potential neurocognitive mechanism. What we've learned isn't that students seem disengaged, but what that disengagement might look like in the brain: a measurable downshift in the specific neural networks associated with memory encoding (alpha/theta waves) and executive function.
It moves the conversation from "I feel like this is happening" to "Here is a testable, physiological correlate of what we are seeing." It gives us a new layer of granularity, the ability to distinguish the cognitive footprint of a Search Engine from that of an LLM, for instance. That, I would argue, is new knowledge.
You are right to worry about what we're supposed to do with this imperfect data. I don't see it as a mandate for a specific policy. I see it as a tool that reframes the problem. It suggests we should be less concerned with just "catching cheaters" and more concerned with designing assignments and tools that explicitly demand the kind of cognitive integration that lights up the brain. Which is why I write "We must teach students not just how to use these tools, but how, when and why."
So, while nothing has been "proven" definitively, I don't think it's true that "nothing has been learned." We've simply replaced a general suspicion with a specific, albeit preliminary, overview of the cognitive firing, and sometimes lack thereof. And that feels like a necessary first step.
Well you are an optimist. I spent many years back at Johns Hopkins looking at brains playing music, listening to music, being surprised by music, and I'm more suspicious than the usual technology and how we've come to intuit it all (brains lighting up!) than anything else. I appreciate (truly) your optimism. I think there are different ways to think about what will happen to different groups in working with LLMs differently, over time.
I'm currently reviewing Adam Zeman's work on imagination and aphantasia. So I'm steeped in the brains and our misconceptions. 10 years ago nobody knew about aphantasia. Some of us embed all of our imagination in language. Some don't. Are those of us who do more immune to whatever is going on with LLMs than others? These are more interesting questions to me.
Oh, I think I am pessimistic about humanity. I see many instances of laziness, giving over cognitive faculties to the LLM, presenting the work, without scrutinizing it and not building the knowledge. I am very worried about this.
The work you are doing is very interesting. This could be impactful on many levels. I will look at Adam Zeman's work and look forward to learning more from yours.
I'll chime in to add that this work - esp the EEG results - is aligned with a voluminous literature on memory and learning that goes back many many decades and precedes fancy measurement tools. So those of us in or adjacent to the cognitive science world definitely would have anticipated this, not because of priors with students, but because of priors in research. I think the ownership/memory findings are predictable as well, but I am thinking about the implications those findings might have more broadly - if I get somewhere interesting, will share.
Personally, I'd much rather write my own thoughts than let a machine generate them for me. We need to revise the goals of pedagogy. Grading students creates an artificial goal to achieve, thus encouraging taking shortcuts and even outright cheating. This isn't a new problem. How often have we heard that a student might be especially adept at passing tests, but is otherwise mediocre? Conversely, how often have we seen someone who is weak at test taking suddenly shine in more substantive areas?
//
"The MIT study is not merely about pedagogy. It is about sovereignty of thought".
Ah, but that begins with pedagogy! It brings to mind memories of my grade school days, with teachers sternly demanding obedience above all else. I've been told this is no longer the case, yet we still use much of the same methodology and expectations designed to induce conformity over thinking for oneself - that is, sovereignty of thought.
//
"The solution is not prohibition. It is reorientation. We must build systems that enhance engagement, not mute it. We must teach students not just how to use these tools, but how, when and why".
That reorientation will necessitate transforming the grade motive to something more substantive. I don't know exactly what or how, but grade seeking is problematic in its own right, and severely exacerbated by the advent of AI.