Pinned
Andrej Karpathy
Andrej Karpathy
9,186 posts
Andrej Karpathy
@karpathy
Building . Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 


Andrej Karpathy’s posts
Show more
TikTok is scary good. It's digital crack. First time I feel attacked by AI in the brain.
Some personal news: I am joining OpenAI (again :)). Like many others both in/out of AI, I am very inspired by the impact of their work and I have personally benefited greatly from it. The future potential is especially exciting; it is a great pleasure to jump back in and build!
Plan is to throw a party in the Andromeda galaxy 1B years from now. Everyone welcome, except for those who litter
It’s been a great pleasure to help Tesla towards its goals over the last 5 years and a difficult decision to part ways. In that time, Autopilot graduated from lane keeping to city streets and I look forward to seeing the exceptionally strong Autopilot team continue that momentum.
Hi everyone yes, I left OpenAI yesterday. First of all nothing "happened" and it’s not a result of any particular event, issue or drama (but please keep the conspiracy theories coming as they are highly entertaining :)). Actually, being at OpenAI over the last ~year has been
Show more
Programming is changing so fast... I'm trying VS Code Cursor + Sonnet 3.5 instead of GitHub Copilot again and I think it's now a net win. Just empirically, over the last few days most of my "programming" is now writing English (prompting and then reviewing and editing the
Show more
New YouTube video: 1hr general-audience introduction to Large Language Models
youtube.com/watch?v=zjkBMF
Based on a 30min talk I gave recently; It tries to be non-technical intro, covers mental models for LLM inference, training, finetuning, the emerging LLM OS and LLM Security.
# On the "hallucination problem"
I always struggle a bit with I'm asked about the "hallucination problem" in LLMs. Because, in some sense, hallucination is all LLMs do. They are dream machines.
We direct their dreams with prompts. The prompts start the dream, and based on the
Show more
The YouTube video I want to watch is any highly rated, 1hr long, information dense lecture on anything esoteric and the algorithm just doesn’t get it. It’s too content-driven and too narrow-minded
# on shortification of "learning"
There are a lot of videos on YouTube/TikTok etc. that give the appearance of education, but if you look closely they are really just entertainment. This is very convenient for everyone involved : the people watching enjoy thinking they are
Show more
WSJ front page every day is like >>> "Stock Market %s!!" % ('rises' if random.random() <= 0.54 else 'falls', )
Show more
These 94 lines of code are everything that is needed to train a neural network. Everything else is just efficiency.
This is my earlier project Micrograd. It implements a scalar-valued auto-grad engine. You start with some numbers at the leafs (usually the input data and the
Show more
Movies that I've seen 5+ times but ready & willing to keep watching: Interstellar, Gladiator, Contact, Good Will Hunting, The Matrix, LotR 1/2/3, HP 1, Avatar, The Fifth Element, The Independence Day, Rush Hour, Armageddon, Stargate, Anchorman, Mean Girls, Terminator 2, more=? :)
New (2h13m
) lecture: "Let's build the GPT Tokenizer"
Tokenizers are a completely separate stage of the LLM pipeline: they have their own training set, training algorithm (Byte Pair Encoding), and after training implement two functions: encode() from strings to tokens, and
Show more
Have you ever wanted to train LLMs in pure C without 245MB of PyTorch and 107MB of cPython? No? Well now you can! With llm.c:
github.com/karpathy/llm.c
To start, implements GPT-2 training on CPU/fp32 in only ~1,000 lines of clean code. It compiles and runs instantly, and exactly
Show more
Currently products brag about being "smart". Like my coffee cup warmer that had me download an app, sign up for an account and ask for location permissions before it would warm my coffee. A future where products brag about being "dumb" must be coming and can't come soon enough.
Huge congrats to on the Llama 3.1 release!
Few notes:
Today, with the 405B model release, is the first time that a frontier-capability LLM is available to everyone to work with and build on. The model appears to be GPT-4 / Claude 3.5 Sonnet grade and the weights are
Show more
Returning from an experimental ~2 week detox from the internet. Main takeaway is that I didn't realize how unsettled the mind can get when over-stimulating on problems/information (like a stirred liquid), and ~2 weeks is enough to settle into a lot more zen state.
I'm struck by
Show more
Thinking a lot about centralization and decentralization these few days.
By chance I happened to watch this with the music of Interstellar playing in the background. Incredible. Huge
to the team at SpaceX!!
Quote
SpaceX
@SpaceX
Mechazilla has caught the Super Heavy booster!
The killer app of LLMs is Scarlett Johansson. You all thought it was math or something
You know how image generation went from blurry 32x32 texture patches to high-resolution images that are difficult to distinguish from real in roughly a snap of a finger? The same is now happening along the time axis (extending to video) and the repercussions boggle the mind just
Show more
Quote
Pika
@pika_labs
Introducing Pika 1.0, the idea-to-video platform that brings your creativity to life.
Create and edit your videos with AI.
Rolling out to new users on web and discord, starting today. Sign up at pika.art
Show more
# automating software engineering
In my mind, automating software engineering will look similar to automating driving. E.g. in self-driving the progression of increasing autonomy and higher abstraction looks something like:
1. first the human performs all driving actions
Show more
Quote
Cognition
@cognition_labs
Today we're excited to introduce Devin, the first AI software engineer.
Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork.
Devin is
Show moreIt's a bit sad and confusing that LLMs ("Large Language Models") have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something.
They
Show more
Reading a tweet is a bit like downloading an (attacker-controlled) executable that you instantly run on your brain. Each one elicits emotions, suggests knowledge, nudges world-view.
In the future it might feel surprising that we allowed direct, untrusted information to brain.
# explaining llm.c in layman terms
Training Large Language Models (LLMs), like ChatGPT, involves a large amount of code and complexity.
For example, a typical LLM training project might use the PyTorch deep learning library. PyTorch is quite complex because it implements a very t.co/dkpZGRvkmU
Show more
This Post is from an account that no longer exists. Learn more
o1-mini keeps refusing to try to solve the Riemann Hypothesis on my behalf. Model laziness continues to be a major issue sad ;p
LLM OS. Bear with me I'm still cooking.
Specs:
- LLM: OpenAI GPT-4 Turbo 256 core (batch size) processor @ 20Hz (tok/s)
- RAM: 128Ktok
- Filesystem: Ada002
Actually, really liked the Apple Intelligence announcement. It must be a very exciting time at Apple as they layer AI on top of the entire OS. A few of the major themes.
Step 1 Multimodal I/O. Enable text/audio/image/video capability, both read and write. These are the native
Show more
I feel like a large amount of GDP is locked up because it is difficult for person A to very conveniently pay 5 cents to person B. Current high fixed costs per transaction force each of them to be of high enough amounts, which results in business models with purchase bundles,
Show more
With many
dropping recently, a more complete picture is emerging of LLMs not as a chatbot, but the kernel process of a new Operating System. E.g. today it orchestrates:
- Input & Output across modalities (text, audio, vision)
- Code interpreter, ability to write & run
Show more
How to become expert at thing:
1 iteratively take on concrete projects and accomplish them depth wise, learning “on demand” (ie don’t learn bottom up breadth wise)
2 teach/summarize everything you learn in your own words
3 only compare yourself to younger you, never to others
Love letter to to which I very happily switched to for my personal notes. My primary interest in Obsidian is not even for note taking specifically, it is that Obsidian is around the state of the art of a philosophy of software and what it could be.
- Your notes are
Show more