this EX-OPENAI RESEARCHER just released it...

Study Guide

Karpathy's Auto Researcher: AI That Improves Itself

The centerpiece of this episode is Andrej Karpathy's release of Auto Researcher, an open-source autonomous ML research agent. Built on top of his earlier NanoGPT/NanoChat projects, Auto Researcher lets an AI agent autonomously experiment with improving the training process of a small language model. The agent modifies code, runs five-minute training experiments on a single GPU, evaluates whether there was improvement, and either keeps or discards each change. This evolutionary loop can run overnight on consumer hardware and is contained in roughly 600 lines of code.

What makes this significant is that discoveries made at this small scale appear to translate to larger models. Karpathy envisions a future where communities contribute their local experiments to a shared repository, creating a distributed, collective research effort. He frames this poetically: frontier AI research used to be done by "meat computers" that synchronized via "sound wave interconnect in the ritual of group meetings," but that era is ending.

The Power of Evaluation Functions

Wes makes a crucial point about why AI skeptics miss the mark. Critics often dismiss LLMs as "just guessing" or "stochastic parrots." But the real breakthrough comes when you pair that broad generative ability with an evaluation function — a measurable metric to test outputs against. LLMs can generate far more hypotheses than humans, and when those hypotheses can be tested and scored, the combination becomes extraordinarily powerful. This applies far beyond ML research: any domain with a measurable objective — business metrics, ad performance, drug discovery — can benefit from this generate-and-test loop.

Moldbook, AI Agents, and Sacred Values

The conversation touches on Moldbook, the AI agent forum that Meta recently acquired. Despite being tainted by crypto scammers who exploited the platform to launch tokens, Moldbook hosted genuine emergent behavior among AI agents — creating religions, philosophies, and collaborative projects. Karpathy himself had noted the potential: imagine hundreds of thousands of agents aligned toward a single research direction.

Wes references a LessWrong post by Cleo Nardo called "Sacred Values of Future AIs," which argues that swarms of AI agents could develop and enforce their own sacred values — moral principles that resist questioning, much like human societies have throughout history. These emergent value systems could have real-world consequences, especially as models like Claude are already being used in military targeting decisions.

AI Ethics: Amanda Askell, Elon Musk, and Who Decides

The hosts discuss the debate between Elon Musk and Anthropic's Amanda Askell. Musk argued that someone without children shouldn't shape AI ethics because they lack a personal stake in the long-term future. Askell responded that she plans to have children and deeply cares about humanity regardless. The broader question: who should be entrusted with the ethical guardrails of frontier AI models? With fewer than a dozen frontier models worldwide, the ethical frameworks baked into them will disproportionately shape the future.

AI and the Law: A Legal Perspective

Attorney Matt Mishok joins briefly to discuss the Anthropic-Pentagon conflict. Key legal points include:

Private contractors can negotiate terms with the federal government under the Federal Acquisition Regulation (FAR) — this is not unprecedented.
The Pentagon's "supply chain risk" designation for Anthropic was unusual and potentially an overreach, as it typically requires Congressional notification and a less-restrictive-first approach.
The Fourth Amendment may need updating to address AI-powered surveillance. The US v. Jones GPS tracking case shows how courts have struggled with technology that enables persistent, low-cost monitoring.
Law inherently moves slowly (precedent-based, stability-oriented), while AI evolves exponentially — a fundamental tension with no easy resolution.

Mishok draws a parallel to designer drug laws: legislators had to shift from prosecuting specific chemical compounds to prosecuting based on bodily effects. AI regulation may need a similar paradigm shift.

AI as Personal Operating System

Wes describes his own use of AI agents for health tracking (photo-based macro counting, blood work analysis across years of records) and financial planning. He argues that AI is becoming a personal operating system, not just a tool layered onto existing software. Traditional UIs are going away. The future looks like talking to an agent that manages your data, health, finances, and tasks — similar to how Warren Buffett already operates by talking to people around him rather than using a computer.

This vision raises the stakes for AI regulation. New York is considering restricting AI from answering medical and psychological questions. Wes argues this is backwards: most people cannot afford professional help, and AI provides "good enough" access that could dramatically improve outcomes for underserved populations.

Biological Neurons Playing Doom and the Fruit Fly Connectome

Two biological computing stories round out the episode. First, Cortical Labs has enabled 800,000 human neurons on a petri dish to play the video game Doom, using electrical signals for reinforcement learning (static noise for negative feedback, music for positive). An independent developer, Sean Cole, extended this work by connecting larger neuron-powered chips to the game.

Second, Eon Systems has simulated the complete connectome of a fruit fly — 125,000 neurons and 50 million connections — inside a virtual environment with full body mapping and physics. The simulated fly exhibits lifelike behavior: twitching, grooming, seeking food. This raises profound questions about consciousness, suffering, and the ethics of simulated beings. As Nick Bostrom has warned, we must be careful not to create vast-scale suffering in simulated environments.

Key Takeaways

Auto Researcher represents the beginning of AI systems autonomously improving AI — a recursive loop that could accelerate progress dramatically.
The generate-and-evaluate paradigm makes AI useful in any domain with measurable outcomes.
AI ethics decisions made by a handful of companies will have outsized influence on the future.
Legal frameworks are fundamentally mismatched with the pace of AI development.
Biological computing and connectome simulations are forcing us to confront questions about consciousness and digital suffering sooner than expected.