We’re not just coding anymore — we’re prompting, orchestrating, and building alongside AI.

— Andrej Karpathy

Andrej Karpathy, former Director of AI at Tesla and founding member of OpenAI, recently gave a powerful talk titled “Software in the Era of AI.” While it’s worth watching in full (link below), here’s a distilled, structured guide to the core ideas and takeaways — especially for ML engineers building at the frontier.

🎥 Watch the full talk here → https://www.youtube.com/watch?v=LCEmiRjPEtQ

🛡 The Three Generations of Software

Karpathy introduces a new framework to understand how software is evolving:

🔹 Software 1.0 — Code as Instructions

Traditional software: humans write code in languages like Python, C++, etc.
Think: explicit logic, if-statements, loops.

🔹 Software 2.0 — Neural Nets as Programs

Instead of writing logic, you train models using data and optimization.
The “program” is now a set of learned weights.

🔹 Software 3.0 — Prompts as Programs

You don’t write code or train models — you prompt a large language model (LLM).
Programming is now in natural language like English.

Karpathy’s framing of Software 1.0 (manual code), 2.0 (neural networks), and 3.0 (prompting LLMs in natural language).

💡 Prompting is Programming

In the Software 3.0 world, your prompt becomes the program.

Example:
To classify sentiment, you can:

Write custom code (1.0)
Train a classifier (2.0)
Prompt an LLM: “Classify the following review as positive or negative…” (3.0)

This shift is not just about convenience — it’s a new computing paradigm.

💻 LLMs as Operating Systems

Karpathy argues that LLMs aren’t just tools — they’re becoming complex software platforms, like operating systems.

Similarities to OS:

LLMs orchestrate memory (context windows), compute (token-by-token inference), and I/O (tool use).
Closed-source models (GPT, Gemini, Claude) resemble Windows/macOS.
Open-source models (LLaMA, Mistral) are like Linux.
LLM-native apps like Cursor or Perplexity run on top of this OS layer.

LLMs acting as new operating systems — orchestrating memory (context), compute (token inference), and I/O (tool use).

🛠 LLM Apps Are Partially Autonomous Systems

Karpathy emphasizes that the most useful AI applications today aren’t full agents — they’re partially autonomous tools.

Example:

Cursor is an AI-powered code editor:

You can type manually (human control).
Or you can highlight code and let the AI rewrite it.
Or let it modify an entire repo (full autonomy).

💡 This creates an “autonomy slider”: control how much work you give the AI.

LLM-native apps like Perplexity combine AI logic with familiar GUI controls to keep humans in the loop.

🧐 LLMs Are “People Spirits”

Karpathy offers a provocative analogy: LLMs are “people spirits” — stochastic simulations of humans with memory, reasoning, and personality.

LLM strengths:

Huge general knowledge
Superhuman pattern recognition

But also weaknesses:

Hallucinations
No persistent memory
Easily manipulated (prompt injections)

This makes working with LLMs a human-AI cooperation game, where:

AI generates
Human verifies

Karpathy’s human-AI collaboration loop: AI generates; humans verify — and the faster this loop, the better.

🧰 How to Build Great LLM Apps

According to Karpathy, effective LLM apps share 4 common features:

1. Context management

Apps feed LLMs the right info at the right time (e.g. embeddings of your codebase).

2. Multi-LLM orchestration

Use different models for different jobs (chat, retrieval, diffs).

3. Custom GUI for audit & control

A good interface lets users see what the AI is doing and approve/reject outputs quickly.

4. Autonomy slider

Let users control how much power the AI gets — from autocomplete to repo-wide edits.

Cursor allows developers to slide between manual coding and full AI-driven repo changes — a spectrum of autonomy.

🧠 Design for Speed: Generation + Verification

Karpathy says: we’re no longer just writing software — we’re verifying AI-generated software.

How to speed up the feedback loop:

Use visual GUIs to inspect results (faster than reading raw text).
Write clear, constrained prompts to reduce failures.
Avoid mega-diffs; think in small chunks.

🌍 Build for LLMs, Not Just Humans

A surprising insight: LLMs are now users of software. Just like humans or APIs.

What this means:

Write docs in LLM-readable formats (e.g. markdown, JSON).
Avoid instructions like “click here” — replace with API calls or shell commands.
Add lm.txt files to help LLMs understand your site’s purpose.

Designing for agents: Simplified, machine-readable documentation helps LLMs understand and interact with your software.

✨ Vibe Coding: Everyone’s a Programmer Now

A viral moment from the talk: Karpathy’s coined term “vibe coding.”

You don’t know Swift? Doesn’t matter.
Prompt the LLM, copy-paste, tweak, repeat.

He built working iOS and web apps without knowing the languages, just by “vibing” with the LLM.

This changes who can build software — and how fast they can do it.

Karpathy’s Menu.app — built by ‘vibe coding’ an AI prototype without knowing Swift. The future of accessible dev.

⚧ DevOps is Now the Bottleneck

Ironically, the hardest part isn’t coding — it’s all the non-code setup:

Auth
Hosting
Billing
Deployment

These tasks are still GUI-based and require human clicks. Karpathy asks:

“Why am I doing this? Let the agents do it!”

Early attempts of addressing this issue is creating the Model Context Protocol (MCP).

📜 Takeaways for ML Engineers

✅ Learn to work with prompts, not just code
✅ Develop effective apps by combining GUI + autonomy sliders to keep AI on a leash
✅ Structure apps around fast generate–verify loops
✅ Build documentation and UIs that speak to agents