AI Literacy: The basics of machine learning
Simple answers to common questions about AI and machine learning (part 1 of a series)
Though I’ve been following machine learning for a long time, only recently have I tried to become a practitioner. Last year I threw myself into learning the fundamentals of natural language processing, and wrote a five-part series on NLP aimed at other programmers. This year I’m tackling machine learning more broadly, with a focus on text comprehension and production.
Friends and colleagues are naturally curious about artificial intelligence, and tend to ask me the same reasonable questions. I’ll answer them to the best of my ability for you. While the answers won’t be too technical, I will try to make them precise, because there’s a lot of room between highly mathematical explanations and hand-wavy “the robots are coming to take our jobs” stuff. I’m also increasingly of the opinion that a cocktail-party-level understanding of AI is both important and achievable.
The question I’m asked most often requires some unpacking, so this first post will cover just this one topic:
“What are the differences between artificial intelligence, machine learning, neural networks, and deep learning?”
These are all real terms with distinct meanings but are often used interchangeably. I actually think that’s fine in most contexts, but the distinctions can help you understand where the industry has been and where it’s going.
Artificial intelligence is the umbrella term for the entire field of programming computers to solve problems. I would distinguish this from software engineering, where we program computers to perform tasks. “AI” can be used so broadly as to be almost meaningless, in part because the scope of the phrase is constantly evolving.
There are modern conveniences we’ve become so accustomed to that we hardly think of them as AI: driving directions, auto-complete, full-text search. These affordances — that we rely on and are infuriated by in equal measure—were state of the art AI just decades ago. To us they’re natural parts of our phones and cars.
(There’s an analogous trend in animal research. Using tools was once considered a defining characteristic of human intelligence, but the bar gets raised each time we find octopuses opening jars or ravens solving puzzles. Similarly, artificial intelligence tends to mean whatever it is that computers can’t quite master yet.)
Machine learning is a subset of artificial intelligence. The important word there is “learning”—as in, not being explicitly taught. Instead, machine learning systems are trained by being presented with lots of examples—thousands, if not ideally billions — but without a lot of guidance about how to solve the problem or even what exactly they’re looking for.
Teaching a child can superficially resemble training an ML system: we provide lots of examples over time and give them feedback about whether they’re right or wrong. But we also tell children how to learn — that words are made up of syllables, that they need to “carry the one,” or how to think critically about a story they read. It isn’t until kids are older that they learn primarily through inductive reasoning, by recognizing patterns and developing an intuitive understanding of the world — an understanding that’s unique to them.
A better model for ML training is teaching a dog — especially teaching a dog to do something we can’t do ourselves, like sniff out a bomb. We give the dog lots of training, but we can’t tell them exactly how to do it because it isn’t something we quite know how to do ourselves. We can only repeatedly expose them to the target and reward them when they get it right. It’s up to the dogs to learn what features are salient to them, that add up to “this is a bomb.”
Prior to machine learning, AIs (often called expert systems) could be “smart,” but had to be explicitly taught everything they knew. Expert systems work like enormous checklists, and checklists are effective ways to make decisions, even for humans, but they are fiendishly time-consuming to construct and are narrowly focused on solving one particular domain: diagnosing one class of illness, or safety-checking one kind of airplane.
This ability to extract relevant features without guidance is at the heart of the machine learning revolution. By building up a set of features, known only to the machine, an ML system becomes capable of generalizing—operating on examples that aren’t exactly like ones its seen before. Generalization means that a network that is trained to distinguish pictures of cats from pictures of dogs will be capable of doing the same task with pictures it’s never seen before, because it’s learned a set of features that distinguish dogness from catness.
At this point, you can safely think of machine learning and AI as synonymous. Even though expert systems and similar still exist and serve an operational purpose, we tend not to think of them as “AI” anymore. We’ve moved the bar.
Neural networks are a popular and effective way to implement machine learning. It’s fun to talk about them because “neural networks” sound extremely cyberpunk. You can do machine learning without neural nets, and those less-sexy architectures solve some kinds of problems better, faster, or cheaper.
It’s hard to explain what neural networks actually are without getting a little technical, something I will happily do in a future article. At the very highest level, neural networks are complex systems made up of very simple components which learn to divide up work and specialize. It’s sometimes counterintuitive, if not downright amazing, how much complexity they can encapsulate using very primitive representations.
While they are at some level modeled after how real neurons work, I don’t think that in the long run AI systems will resemble real brains at all, any more than commercial airplanes flap their wings.
Deep learning refers to a class of neural networks. It has a specific technical meaning but has also become a buzzword. The “deep” part just means that the network architecture has multiple layers—three-layers-deep versus one-layer-shallow. The outcome of this is that deep networks can learn and reason much more effectively than one-layer networks.
Stacking virtual neurons into layers allows networks to develop richer features by building up hierarchical representations. If we train a deep model to recognize elephants from zebras, it’s likely the model will develop some concept of “ears”, “trunks,” and “stripes,” because those are salient features that aid in distinguishing the two animals. This is exactly how all complex systems evolve, whether they’re biological organisms or corporate org charts: many workers doing focused, discrete tasks whose output gets synthesized further up the chain into ever-larger representations and actions. A human hand is much more useful than the sum of its muscles; a company makes high-level strategic decisions based on local conditions reported on by staff.
Deep learning does enable machines to reason more deeply about certain kinds of problems through their ability to extract features from data, but these AIs aren’t truly “deep” in the sense of being philosophical. Nevertheless, it is fascinating — and perhaps a bit worrying — that we humans don’t get to play a role in deciding what those features are. This automatic feature extraction is often at the root of the problem of AIs amplifying human bias. And in fairness to all the “robots are coming to take our jobs” articles, there are already tasks that deep networks can perform better than skilled humans, because they are able to detect patterns that are otherwise invisible to us.
(Confusingly, the chess-playing Deep Blue, the first real-life AI that many people know by name, was not “deep” in either of these senses, and was instead IBM’s rebranding of Deep Thought, Douglas Adams’ fictional AI, which itself was a reference to a popular porn film, thus demonstrating that one should never allow engineers to name things.)
Talking confidently about AI
Here’s some summary sentences to take with you to that cocktail party:
- Machine learning is a type of artificial intelligence in which the system learns patterns from lots of examples.
- Artificial intelligence systems can either be explicitly taught answers, or learn to infer them (or both!).
- Many machine learning systems are implemented using neural networks, in which hundreds or thousands of small workers learn to collaborate to solve problems.
- Recent advances in neural network architecture have resulted in so-called deep learning systems which can infer complex features, and sometimes outperform expert humans.
Useful references
Four days later, a couple of hundred journalists, entrepreneurs and advertisers from all over the world gathered in…www.nytimes.com
One evening last November, a fifty-four-year-old woman from the Bronx arrived at the emergency room at Columbia…www.newyorker.com
The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine…www.deeplearningbook.org