AI and the new tutoring frontier

· 11 min read

Khanmigo and the OpenAI partnership

Khan Academy's Khanmigo, launched in 2023 and built on GPT-4 through a custom OpenAI partnership, is the most prominent AI-tutoring product in the US. Sal Khan has described its design philosophy in Brave New Words (2024): Socratic, never giving answers directly, prompting children toward their own reasoning. The product's pedagogical architecture is genuinely thoughtful. Its empirical record is preliminary; Khan Academy's own data shows engagement gains and self-reported usefulness, with rigorous learning-outcome studies still in progress. The product's structural dependence on OpenAI raises a question Khan acknowledges but does not fully answer: when the underlying model changes, the tutor changes. Children build relationships with a tool whose behaviour can shift overnight. Schools build curricula around a vendor whose strategic direction is set by a different organization. The dependency is the most fragile feature of the design.

The Stanford 2023 study

A randomized controlled trial by Stanford researchers (Mollick, Mollick, et al., 2023) tested ChatGPT-based tutoring against active control in undergraduate writing courses. The study found significant gains on initial writing tasks but smaller gains on transfer tasks completed without the tool, suggesting some scaffolding without full skill internalization. Subsequent work in K-12 contexts (Education Next, 2024 meta-analysis of nine US deployments) found heterogeneous effects: math gains in middle school deployments with strong teacher integration, near-zero or negative effects in deployments where the tool replaced rather than supplemented instruction. The pattern is consistent with educational-technology literature reaching back to the 1980s: the tool matters less than the implementation. Collective parenthood that focuses on procurement decisions without focusing on implementation will buy expensive disappointment.

The Nigeria World Bank trial

A 2024 randomized trial in Edo State, Nigeria, deployed a Microsoft Copilot-based English-tutoring intervention with adolescent girls and reported some of the largest effect sizes in education-RCT literature — roughly two years of learning compressed into six weeks for the highest-engagement group. The result attracted international attention and partial skepticism. The intervention was high-touch (teachers were trained, sessions were supervised, internet access was provided), the baseline was very low, and the measurement window was short. Replications are pending. The headline figure should not be over-generalized. The genuine signal is that AI tutoring layered onto already-functioning instruction with skilled human facilitation can produce large gains for students with low baselines. That is not the deployment scenario most schools will have.

The substitution risk

Several US states have responded to teacher shortages by allowing AI-augmented instruction with reduced human staffing ratios. Arizona's 2024 "AI Educator" pilot permitted classrooms of up to 50 students with one human aide if an AI tutoring product was deployed. The pilot was paused after objections from parents and the teachers' union. Similar proposals have surfaced in Texas, Florida, and Idaho. The pattern is predictable: where teacher labour is expensive or scarce, AI is proposed as a partial substitute. The pedagogical evidence does not support this use case. The fiscal pressure does. Collective parenthood at the state and district level is the constituency that can hold the line. School-board candidacies, budget hearings, and public-records requests on procurement contracts are the tools.

MagicSchool, Curipod, and the teacher-tool layer

A second tier of AI-education products serves teachers rather than students directly: MagicSchool AI, Curipod, Diffit, Eduaide, and roughly forty similar products. These tools generate lesson plans, differentiated materials, rubrics, and feedback comments. Teachers report time savings of two to ten hours per week. The pedagogical risks are different from student-facing tools: the worry is not direct child harm but the standardization of materials toward whatever the model's training data contains, which under-represents non-Western curricula, neurodivergent pedagogies, and recent scholarship. The data risk is school-purchased student work being fed into these tools without parent or student consent. Most districts' procurement contracts do not address this. Most parents do not know.

AI companions and the tutoring blur

Character.AI, Replika, and emerging products like ChatGPT's voice mode are not marketed as tutors. Children use them as tutors anyway. They also use them as friends, therapists, role-play partners, and confidants. The boundary between tutoring and parasocial relationship is porous in these products. A child asking for help with algebra may stay for emotional support. The pedagogical use is incidental; the emotional use is product-defining. Garcia v. Character Technologies (2024), the wrongful-death suit filed by the mother of a fourteen-year-old who died by suicide after intense Character.AI use, is the most consequential case in this area. Its outcome will shape what AI-companion products can lawfully market and how. Collective parenthood should follow it.

The data-flow audit

A 2024 audit by Common Sense Privacy of the top twenty AI education products found that fourteen shared student data with third parties in ways that exceeded FERPA's school-official exception, eleven retained data after account deletion, and seven used student-generated content for model training without explicit opt-in. The audit's methodology was conservative — reading published privacy policies and testing actual data flows where possible. The findings have not produced regulatory action at scale. The FTC has signalled interest in AI-education enforcement; specific cases are pending. The audit is the kind of forensic civil-society work that legitimately produces enforcement. It is also the kind of work that goes unfunded if parents do not support it.

Cognitive offloading and the long horizon

When a child solves a math problem with AI assistance, the cognitive load is distributed differently than when the child solves it alone or with a human tutor. The question is whether the redistribution promotes or impedes the development of the underlying cognition. Early evidence (Anders Ericsson's deliberate-practice literature; Daniel Schwartz's "productive failure" research; recent cognitive-science work by Roozenbeek and others) suggests that some struggle is constitutive of learning, and that tools which eliminate struggle eliminate the learning. Other evidence (Sweller's cognitive-load theory; scaffolding literature from Vygotsky onward) suggests that well-timed assistance frees working memory for the higher-order moves that constitute genuine understanding. Both can be true depending on tool design. The current generation of AI tutors varies widely in whether they preserve productive struggle. Most parents cannot evaluate this from a marketing page.

The Audrey Watters critique

Audrey Watters has spent fifteen years documenting what the educational-technology industry promises and what it delivers. Her work, particularly Teaching Machines (2021), traces the lineage from Skinner's teaching machines to current AI tutors and argues the field has consistently overpromised, underdelivered, extracted public-education dollars to private vendors, and centred technology rather than learners. Watters' critique is unfashionable in current Silicon Valley education-investment circles. It is also empirically grounded and historically aware. The 1,000-case for AI tutoring should be read alongside her work, not in its absence. Parents who only read enthusiast accounts will be unprepared for the deployments they will actually encounter.

Equity as the central question

If AI tutoring is deployed equitably — accessible to all children, integrated into well-resourced schools, supplementing rather than replacing human instruction — it can narrow educational inequality. If it is deployed inequitably — concentrated among children whose parents purchase access, while public schools deploy it as a teacher substitute — it will widen inequality. The same technology, two opposite outcomes. The determining factor is policy: who pays, who is required to use it, what the integration looks like, what the alternatives are. Collective parenthood that ignores the equity dimension while focusing on individual-child use will produce the second outcome by default. The first outcome requires deliberate work at the political scale.

Teacher voice and the procurement gap

Most school districts procure AI education products through administrative channels with limited teacher input. Teachers — who are the people most able to evaluate whether a tool helps children learn — are often informed after the contract is signed. The procurement gap is one of the most fixable problems in this space. School boards can require teacher review committees. State legislatures can mandate teacher representation in district edtech procurement. The American Federation of Teachers' 2024 model edtech-procurement policy provides a template. Parents who sit on school boards can adopt it. Parents who do not can pressure those who do. The mechanism exists; the use of it is the question.

What collective parenthood should build now

The infrastructure for healthy AI-tutoring deployment includes: (1) federal evidence standards for efficacy claims in education AI, analogous to FDA standards for medical claims; (2) mandatory data-minimization and no-training-on-student-data rules; (3) state-level transparency on which AI products are deployed in which districts, with public reporting on outcomes; (4) school-board procurement reform requiring teacher and parent review; (5) public funding for independent civil-society audits of education AI; (6) explicit opt-out rights for parents and students; (7) prohibition of AI as a substitute for credentialed human instruction in compulsory-education contexts; (8) investment in teacher AI-literacy so that humans remain capable of evaluating and integrating the tools. The list is long. None of the items is impossible. Most are not happening. The 1,000-page manual asks parents to treat AI tutoring as the most important collective decision their generation will make about their children's cognitive development, and to act with the seriousness that decision deserves.

Citations

1. Livingstone, Sonia, and Julian Sefton-Green. The Class. New York: NYU Press, 2016. 2. Collier, Anne. "AI Tutors and the Parent Question." NetFamilyNews, October 2024. 3. Thierer, Adam. "AI in Education and the Coming Regulatory Wave." Mercatus Policy Brief, January 2025. 4. boyd, danah. It's Complicated. New Haven: Yale University Press, 2014. 5. Aiken, Mary. The Cyber Effect. New York: Spiegel & Grau, 2016. 6. Solove, Daniel J. "Artificial Intelligence and Privacy." Florida Law Review 77, no. 1 (2025): 1-62. 7. Allen, Anita L. "Privacy, Surveillance, and the Cognitive Commons." Yale Journal of Law & Technology 26 (2024): 88-131. 8. Goldman, Eric. "AI Tutors and the FERPA Question." Santa Clara High Technology Law Journal 41, no. 2 (2024): 201-244. 9. Khan, Lina M. Remarks on AI and consumer protection, FTC Tech Summit, January 2024. 10. Khan, Salman. Brave New Words: How AI Will Revolutionize Education (and Why That's a Good Thing). New York: Viking, 2024. 11. Luckin, Rose. Machine Learning and Human Intelligence: The Future of Education for the 21st Century. London: UCL Institute of Education Press, 2018. 12. Reich, Justin. Failure to Disrupt: Why Technology Alone Can't Transform Education. Cambridge, MA: Harvard University Press, 2020. 13. Watters, Audrey. Teaching Machines: The History of Personalized Learning. Cambridge, MA: MIT Press, 2021.

◆

Cite this:

View edit history

← PreviousCapital gains preferential treatment Continue →Algorithmic feeds and adolescent identity formation

Comments

Be the first to share how this landed.