Member-only story

Language Model Scaling Laws and GPT-3

Understanding why LLMs like GPT-3 work so well…

20 min readDec 10, 2022

Language models (LMs) are incredibly generic–they take text as input and produce text as output. Recent research has revealed that this generic text-to-text structure can be exploited to solve a variety of tasks without task-specific adaptation (i.e., no fine-tuning or architectural modifications) by using prompting techniques to perform accurate zero and few-shot inference. Put simply, we can pre-train the LM over a large, unlabeled text corpus (using a language modeling objective), then ask the LM via textual prompts to solve a problems. In this way, the pre-trained model can easily be repurposed for solving different problems.

Although LMs hold incredible potential as task-agnostic foundation models, initial attempts at transferring pre-trained LMs to solving downstream tasks (e.g., GPT and GPT-2 [4, 5]) did not work well. Within this overview, we will learn how recent research has built upon these initial attempts and created LMs that achieve much better task-agnostic performance. The key finding within this line of work is that LMs become much more powerful as you scale them up; see below.

TDS Archive

Language Model Scaling Laws and GPT-3

Understanding why LLMs like GPT-3 work so well…

Create an account to read the full story.

Published in TDS Archive

Written by Cameron R. Wolfe, Ph.D.

Responses (3)

More from Cameron R. Wolfe, Ph.D. and TDS Archive

Easily Train a Specialized LLM: PEFT, LoRA, QLoRA, LLaMA-Adapter, and More

Training a specialized LLM over your own data is easier than you think…

Agentic AI: Building Autonomous Systems from Scratch

A Step-by-Step Guide to Creating Multi-Agent Frameworks in the Age of Generative AI

How to Build an AI Agent for Data Analytics Without Writing SQL

Create a comprehensive AI agent from the ground up utilizing LangChain and DuckDB

Using Transformers for Computer Vision

Are Vision Transformers actually useful?

Recommended from Medium

LLM Architectures Explained: Transformers (Part 6)

Deep Dive into the architecture & building real-world applications leveraging NLP Models starting from RNN to Transformer.

(Re)classifying Bikol Languages

Critiques on the Three Language Classifications of Bikol

The Power of RLHF: From GPT-3 to ChatGPT

The rise of large language models (LLMs) has revolutionized natural language processing (NLP) in AI, enabling significant progress in tasks…

Understanding Embedding Models in the Context of Large Language Models

Large Language Models (LLMs) like GPT, BERT, and similar architectures have revolutionized the field of natural language processing (NLP)…

FlashAttention from First Principles

Part 1: All the Basics you Need!

How I’d learn ML in 2025 (if I could start over)

All you need to learn ML in 2025 is a laptop and a list of the steps you must take.