The Dangerous Illusion of AI Coding? - Jeremy Howard

Study Guide

Overview

In this MLST (Machine Learning Street Talk) interview, host Tim Scarfe sits down with Jeremy Howard — deep learning pioneer, Kaggle Grandmaster, fast.ai co-founder, and creator of ULMFiT — at his home in Queensland, Australia. The conversation spans Howard's foundational work on transfer learning, the current state of AI-assisted coding, the psychology of "vibe coding," why software engineering and coding are fundamentally different skills, the case for interactive exploratory programming, and broader questions about AI risk and power centralization.

Key Themes

1. The Origins of Transfer Learning (ULMFiT)

Howard recounts creating ULMFiT, the paper that established the now-standard three-stage training pipeline: pre-training on a general corpus, mid-training (domain adaptation), and fine-tuning on a downstream task. Key innovations included training on a general-purpose Wikipedia corpus (not task-specific data), using discriminative learning rates (different learning rates per layer), gradual unfreezing of layers, and fine-tuning all normalization layers. The entire pre-training ran overnight on a single gaming GPU, and the fine-tuned model beat specialized PhD-level results on sentiment analysis in minutes.

2. Hierarchies of Abstraction in Language Models

Howard's original hypothesis for ULMFiT was that predicting the next word of Wikipedia text would force a model to build a hierarchy of abstractions — from the concept of objects, to people, to social hierarchies, to specific historical facts. He argues this is how LLMs develop what looks like understanding: by compressing statistical relationships into layered representations. However, he emphasizes this "understanding" is bounded by the training distribution and breaks down sharply outside it.

3. LLM Creativity: Cosplaying Intelligence

Howard argues LLMs are capable of combinatorial creativity — recombining elements from their training data in novel ways — but cannot perform transformative creativity that moves outside the training distribution. He uses the example of Anthropic's AI-generated C compiler: while impressive, it is fundamentally a style transfer from existing compiler source code (much of it based on LLVM) into Rust, not a genuinely novel design. Chris Lattner, the creator of LLVM and Clang, confirmed that the AI replicated specific design decisions (including ones Lattner considers mistakes) that only existed in LLVM.

4. Coding vs. Software Engineering

Howard draws a sharp distinction between coding (translating specifications into syntax) and software engineering (designing the right abstractions, finding the right-sized pieces, and composing them into larger systems). He references Fred Brooks' "No Silver Bullet" essay, which argued decades ago that typing code was never the bottleneck — the hard part is the design work. While LLMs now write ~90% of Howard's code, he reports only modest productivity gains because the typing was never the slow part. LLMs are good at coding but empirically bad at software engineering, and there is no evidence this gap is closing.

5. The Gambling Psychology of AI Coding

Howard and his wife Rachel Thomas identified that AI-assisted coding shares the psychological hallmarks of gambling addiction: an illusion of control (crafting prompts and configurations), stochastic rewards (occasionally getting a working feature), losses disguised as wins (producing code that looks functional but no one understands), and self-deception about skill (believing productivity went up when studies show it barely changed). Howard describes 14-hour Claude Code marathon sessions that left him drained, contrasting them with the energized feeling of interactive exploratory programming.

6. Knowledge Erosion and Organizational Risk

Howard warns that delegating cognitive tasks to LLMs erodes knowledge within organizations. He cites the concept of desirable difficulty from learning science: memories and skills only form through effortful practice, and removing friction removes learning. An Anthropic study confirmed that most users of AI coding tools showed minimal learning. Howard tells his own team that personal capability growth matters more than feature output, invoking John Ousterhout's principle: "A little bit of slope makes up for a lot of intercept." Companies betting on AI replacing software engineers are making a speculative gamble that could destroy their ability to maintain or evolve their products.

7. The iPyKernel Experiment

Howard describes a telling case study: fixing a major version upgrade of iPyKernel (the engine behind Jupyter notebooks) using AI. Over two weeks, alternating between Codex and GPT models, he produced a working implementation — but one that nobody fully understands. He now faces an unprecedented software engineering question: should he bet his company's product on code that works but has no human comprehension behind it? There is no established practice for managing this situation.

8. Interactive Exploratory Programming

Howard advocates for notebook-based, REPL-driven development where humans and AI work together in a rich, stateful environment. He traces this lineage from Smalltalk and APL through Mathematica to his own tools (nbdev, Solara). The core principle, inspired by Bret Victor's work: humans build understanding by directly manipulating objects in real time. Traditional software development — editing dead text files through a terminal — removes this connection. Howard considers it "inhumane" and argues it produces worse outcomes for both humans and AI.

9. AI Existential Risk and Power Centralization

Howard co-authored a rebuttal (with Arvind Narayanan) to the AI existential risk statement signed by Hinton, Hassabis, and others. His argument: regardless of how powerful AI becomes, centralizing control of it in the hands of a few companies or governments is the actual danger. History shows that powerful technologies (writing, printing, voting) were always subject to monopolization attempts by those in power. The solution is broad distribution, not concentration. His current primary concern is not autonomous AI but competence erosion — people and organizations losing their ability to grow and adapt.

Key Quotes

"The difference between pretending to be intelligent and actually being intelligent is entirely unimportant as long as you're in the region in which the pretence is actually effective."
"No one's actually creating 50 times more high quality software than they were before. We've actually just done a study of this and there's a tiny uptick."
"The thing about AI-based coding is that it's like a slot machine and you have an illusion of control."
"Here's a piece of code that no one understands. Am I going to bet my company's product on it?"
"Almost the only thing I care about is how much your personal human capabilities are growing."
"A little bit of slope makes up for a lot of intercept." (John Ousterhout, via Howard)
"It literally disgusts me. I literally think it's inhumane. My mission remains the same as it has been for 20 years, which is to stop people working like this."

Key People and References

Jeremy Howard — Deep learning pioneer, fast.ai co-founder, creator of ULMFiT
Rachel Thomas — fast.ai co-founder, author of the gambling analogy for AI coding
Chris Lattner — Creator of LLVM and Clang, cited on the AI-generated C compiler
Fred Brooks — Author of "No Silver Bullet" on software engineering
Bret Victor — Researcher and designer advocating direct manipulation in computing
Cesar Hidalgo — Author of "The Laws of Knowledge," on knowledge as non-fungible
Margaret Boden — AI philosopher, hierarchy of creativity (combinatorial, exploratory, transformative)
Francois Chollet — Creator of Keras, cited on shared cognitive models in teams
Stephen Merity — Developed the AWD-LSTM regularization approach used in ULMFiT
John Ousterhout — Stanford professor, "slope vs. intercept" philosophy

Discussion Questions

Howard argues that AI coding tools share the psychological structure of gambling. What safeguards could teams adopt to prevent the "slot machine" dynamic from degrading code quality?
If software engineering skill is now more important than coding skill, how should computer science education adapt?
Howard describes producing working code that "nobody understands" via AI. What organizational practices could help teams manage AI-generated code they cannot fully review?
The concept of "desirable difficulty" suggests that removing friction removes learning. How can educators design AI-assisted learning that preserves productive struggle?
Howard advocates for interactive exploratory programming over terminal-based workflows. What are the trade-offs of adopting notebook-first development in a production engineering team?