Andrej Karpathy

Excited to share that I am starting an AI+Education company called Eureka Labs. The announcement: --- We are Eureka Labs and we are building a new kind of school that is AI native. How can we approach an ideal experience for learning something new? For example, in the case

2.3M

Andrej Karpathy

@karpathy

Mar 27, 2022

TikTok is scary good. It's digital crack. First time I feel attacked by AI in the brain.

Andrej Karpathy

@karpathy

Feb 9, 2023

Some personal news: I am joining OpenAI (again :)). Like many others both in/out of AI, I am very inspired by the impact of their work and I have personally benefited greatly from it. The future potential is especially exciting; it is a great pleasure to jump back in and build!

Plan is to throw a party in the Andromeda galaxy 1B years from now. Everyone welcome, except for those who litter

Andrej Karpathy

@karpathy

Jul 14, 2022

It’s been a great pleasure to help Tesla towards its goals over the last 5 years and a difficult decision to part ways. In that time, Autopilot graduated from lane keeping to city streets and I look forward to seeing the exceptionally strong Autopilot team continue that momentum.

Andrej Karpathy

@karpathy

Feb 14

Hi everyone yes, I left OpenAI yesterday. First of all nothing "happened" and it’s not a result of any particular event, issue or drama (but please keep the conspiracy theories coming as they are highly entertaining :)). Actually, being at OpenAI over the last ~year has been

New (1h56m) video lecture: "Let's build GPT: from scratch, in code, spelled out." youtube.com/watch?v=kCc8Fm We build and train a Transformer following the "Attention Is All You Need" paper in the language modeling setting and end up with the core of nanoGPT.

Andrej Karpathy

@karpathy

Aug 24

Programming is changing so fast... I'm trying VS Code Cursor + Sonnet 3.5 instead of GitHub Copilot again and I think it's now a net win. Just empirically, over the last few days most of my "programming" is now writing English (prompting and then reviewing and editing the

New YouTube video: 1hr general-audience introduction to Large Language Models youtube.com/watch?v=zjkBMF Based on a 30min talk I gave recently; It tries to be non-technical intro, covers mental models for LLM inference, training, finetuning, the emerging LLM OS and LLM Security.

Andrej Karpathy

@karpathy

Dec 9, 2023

# On the "hallucination problem" I always struggle a bit with I'm asked about the "hallucination problem" in LLMs. Because, in some sense, hallucination is all LLMs do. They are dream machines. We direct their dreams with prompts. The prompts start the dream, and based on the

The YouTube video I want to watch is any highly rated, 1hr long, information dense lecture on anything esoteric and the algorithm just doesn’t get it. It’s too content-driven and too narrow-minded

# on shortification of "learning" There are a lot of videos on YouTube/TikTok etc. that give the appearance of education, but if you look closely they are really just entertainment. This is very convenient for everyone involved : the people watching enjoy thinking they are

Andrej Karpathy

@karpathy

May 5, 2021

WSJ front page every day is like >>> "Stock Market %s!!" % ('rises' if random.random() <= 0.54 else 'falls', )

Andrej Karpathy

@karpathy

Jun 10

New 4 hour (lol) video lecture on YouTube: "Let’s reproduce GPT-2 (124M)" youtu.be/l8pRSuU81PU The video ended up so long because it is... comprehensive: we start with empty file and end up with a GPT-2 (124M) model: - first we build the GPT-2 network - then we optimize

These 94 lines of code are everything that is needed to train a neural network. Everything else is just efficiency. This is my earlier project Micrograd. It implements a scalar-valued auto-grad engine. You start with some numbers at the leafs (usually the input data and the

Movies that I've seen 5+ times but ready & willing to keep watching: Interstellar, Gladiator, Contact, Good Will Hunting, The Matrix, LotR 1/2/3, HP 1, Avatar, The Fifth Element, The Independence Day, Rush Hour, Armageddon, Stargate, Anchorman, Mean Girls, Terminator 2, more=? :)

Andrej Karpathy

@karpathy

Sep 9, 2021

Browsing the web, 2021

0:02 / 0:25

Andrej Karpathy

@karpathy

Feb 21

New (2h13m

) lecture: "Let's build the GPT Tokenizer" Tokenizers are a completely separate stage of the LLM pipeline: they have their own training set, training algorithm (Byte Pair Encoding), and after training implement two functions: encode() from strings to tokens, and

Have you ever wanted to train LLMs in pure C without 245MB of PyTorch and 107MB of cPython? No? Well now you can! With llm.c: github.com/karpathy/llm.c To start, implements GPT-2 training on CPU/fp32 in only ~1,000 lines of clean code. It compiles and runs instantly, and exactly

GitHub - karpathy/llm.c: LLM training in simple, raw C/CUDA

Currently products brag about being "smart". Like my coffee cup warmer that had me download an app, sign up for an account and ask for location permissions before it would warm my coffee. A future where products brag about being "dumb" must be coming and can't come soon enough.

Huge congrats to

on the Llama 3.1 release! Few notes: Today, with the 405B model release, is the first time that a frontier-capability LLM is available to everyone to work with and build on. The model appears to be GPT-4 / Claude 3.5 Sonnet grade and the weights are

Returning from an experimental ~2 week detox from the internet. Main takeaway is that I didn't realize how unsettled the mind can get when over-stimulating on problems/information (like a stirred liquid), and ~2 weeks is enough to settle into a lot more zen state. I'm struck by

How long until we measure wealth inequality in FLOPS

Andrej Karpathy

@karpathy

Nov 22, 2023

Thinking a lot about centralization and decentralization these few days.

My calendar this week

By chance I happened to watch this with the music of Interstellar playing in the background. Incredible. Huge

to the team at SpaceX!!

Quote

SpaceX

@SpaceX

Oct 13

Mechazilla has caught the Super Heavy booster!

The killer app of LLMs is Scarlett Johansson. You all thought it was math or something

You know how image generation went from blurry 32x32 texture patches to high-resolution images that are difficult to distinguish from real in roughly a snap of a finger? The same is now happening along the time axis (extending to video) and the repercussions boggle the mind just

Quote

Pika

@pika_labs

Nov 28, 2023

Introducing Pika 1.0, the idea-to-video platform that brings your creativity to life. Create and edit your videos with AI. Rolling out to new users on web and discord, starting today. Sign up at pika.art

# automating software engineering In my mind, automating software engineering will look similar to automating driving. E.g. in self-driving the progression of increasing autonomy and higher abstraction looks something like: 1. first the human performs all driving actions

Quote

Cognition

@cognition_labs

Mar 12

Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is

Andrej Karpathy

@karpathy

Sep 15

It's a bit sad and confusing that LLMs ("Large Language Models") have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something. They

Every time I diversify I lose money

Reading a tweet is a bit like downloading an (attacker-controlled) executable that you instantly run on your brain. Each one elicits emotions, suggests knowledge, nudges world-view. In the future it might feel surprising that we allowed direct, untrusted information to brain.

# explaining llm.c in layman terms Training Large Language Models (LLMs), like ChatGPT, involves a large amount of code and complexity. For example, a typical LLM training project might use the PyTorch deep learning library. PyTorch is quite complex because it implements a very t.co/dkpZGRvkmU

This Post is from an account that no longer exists. Learn more

o1-mini keeps refusing to try to solve the Riemann Hypothesis on my behalf. Model laziness continues to be a major issue sad ;p

my last tweet of the night i think... 😵‍💫

Andrej Karpathy

@karpathy

Nov 11, 2023

LLM OS. Bear with me I'm still cooking. Specs: - LLM: OpenAI GPT-4 Turbo 256 core (batch size) processor @ 20Hz (tok/s) - RAM: 128Ktok - Filesystem: Ada002

Actually, really liked the Apple Intelligence announcement. It must be a very exciting time at Apple as they layer AI on top of the entire OS. A few of the major themes. Step 1 Multimodal I/O. Enable text/audio/image/video capability, both read and write. These are the native

I feel like a large amount of GDP is locked up because it is difficult for person A to very conveniently pay 5 cents to person B. Current high fixed costs per transaction force each of them to be of high enough amounts, which results in business models with purchase bundles,

floats aren't real!

I can't be the first one to notice

Andrej Karpathy

@karpathy

Sep 29, 2023

With many

dropping recently, a more complete picture is emerging of LLMs not as a chatbot, but the kernel process of a new Operating System. E.g. today it orchestrates: - Input & Output across modalities (text, audio, vision) - Code interpreter, ability to write & run

How to become expert at thing: 1 iteratively take on concrete projects and accomplish them depth wise, learning “on demand” (ie don’t learn bottom up breadth wise) 2 teach/summarize everything you learn in your own words 3 only compare yourself to younger you, never to others

Love letter to

to which I very happily switched to for my personal notes. My primary interest in Obsidian is not even for note taking specifically, it is that Obsidian is around the state of the art of a philosophy of software and what it could be. - Your notes are