Knowability Of FAI

Last edited July 29, 2006.

The question:

This work addresses the following family of questions about Friendly AI:

"How can an AI be creative if we know exactly what it will do? Or if we don't know exactly what it will do, how can we know it will be Friendly?"

Sometimes this also takes the form of a confidently asserted objection:

"For cognition to work requires chaos/emergence/self-organization/randomness/noise. Therefore cognition is, by its nature, unpredictable. Therefore Friendly AI is impossible."

Then there's the versions based on Vinge:

"If you knew exactly what a smarter intelligence would do, you would be that smart yourself. Therefore you can't know what an AI smarter than yourself will do. Therefore Friendly AI is impossible."
"The more powerful an intelligence is, the less predictable it is. Thus if a very powerful intelligence operates in our world, we can't predict the outcome. Even if the AI is Friendly, humans will no longer be in control of their lives because we won't be able to understand what's happening around us."

Or:

"If the AI never surprises us, it must not be very smart. What if the AI surprises you by doing something unFriendly?"
"How can a predictable Goal System deal with an unpredictable world?"
"Algorithms that include randomness are often more powerful than algorithms without them. If the Friendly AI isn't allowed to use randomness it won't be able to compete with more powerful AIs that can."
"Anything mechanical enough to be understandable is too mechanical to be intelligent."
"An AI simple enough that humans can comprehend it will be dumber than humans."

Preface:

I hope to convey an understanding of intelligence, optimization, and predictability, from within which to answer the above family of questions; give some sense of why it is not necessarily impossible to create a creative, intelligent, predictably Friendly AI. This work is narrowly focused; for example, it doesn't try to ask - given that one has the power to create an AI that is "predictably Friendly" for some chosen sense of "Friendly" - what "Friendly" should mean. (Similarly, the document Coherent Extrapolated Volition, which does focus on choosing a sense of "Friendly", disclaims any attempt to say how a thus-Friendly AI might be built. One question at a time.)

This work is semi-technical. A full resolution to the above family of questions arises from rigorous understanding of probability theory - including the concepts of prediction, randomness, knowledge, and confidence. I have tried to explain this resolution using analogies and metaphors. I don't know how much impact these analogies and metaphors will have on someone not comfortable with algebra. Anyone planning to read both this work and Technical Explanation should read Technical Explanation first. Nonetheless this work is intended to stand on its own.

[edit] Intelligence and predictability

Imagine that I'm visiting a distant city, and a local friend volunteers to drive me to the airport. I don't know the neighborhood. Each time my friend approaches a street intersection, I don't know whether my friend will turn left, turn right, or continue straight ahead. I can't predict my friend's move even as we approach each individual intersection - let alone, predict the whole sequence of moves in advance.

Yet I can predict the result of my friend's unpredictable actions: we will arrive at the airport. Even if my friend's house were located elsewhere in the city, so that my friend made a completely different sequence of turns, I would just as confidently predict our arrival at the airport. I can predict this long in advance, before I even get into the car. My flight departs soon, and there's no time to waste; I wouldn't get into the car in the first place, if I couldn't confidently predict that the car would travel to the airport along an unpredictable pathway.

Isn't this a remarkable situation, from a scientific perspective? I can predict the outcome of a process, without being able to predict any of the intermediate steps of the process. Ordinarily one predicts by imagining the present and then running the visualization forward in time. If you want a precise model of the Solar System, one that takes into account planetary perturbations, you must start with a model of all major objects and run that model forward in time, step by step. Sometimes simpler problems have a closed-form solution, where calculating the future at time T takes the same amount of work regardless of T. A coin rests on a table, and after each minute, the coin turns over. The coin starts out showing heads. What face will it show a hundred minutes later? Obviously you did not answer this question by visualizing a hundred intervening steps. You used a closed-form solution that worked to predict the outcome, and would also work to predict any of the intervening steps.

But when my friend drives me to the airport, I can predict the outcome successfully using a strange model that won't work to predict any of the intermediate steps. My model doesn't even require me to input the initial conditions - I don't need to know where we start out in the city!

I do need to know something about my friend. I must know that my friend wants me to make my flight. I must credit that my friend is a good enough planner to successfully drive me to the airport (if he wants to). These are properties of my friend's initial state - properties which let me predict the final destination, though not any intermediate turns.

I must also credit that my friend knows enough about the city to drive successfully. This may be regarded as a relation between my friend and the city; hence, a property of both. But an extremely abstract property, which does not require any specific knowledge about either the city or my friend's knowledge about the city.

[edit] Optimization processes

Consider a car - say, a Toyota Corolla. The Toyota Corolla is made up of some number of atoms - say, on the (very) rough order of ten to the thirtieth. If you consider all the possible ways we could arrange those 10^30 atoms, it's clear that only an infinitesimally tiny fraction of possible configurations would qualify as a working car. If you picked a random configurations of 10^30 atoms once per Planck interval, many many ages of the universe would pass before you hit on a working car.

(At this point, someone in the audience usually asks: "But isn't this what the creationists argue? That if you took a bunch of atoms and put them in a box and shook them up, it would be astonishingly improbable for a fully functioning rabbit to fall out?" But the logical flaw in the creationists' argument is not that randomly reconfiguring molecules would by pure chance assemble a rabbit. The logical flaw is that there is a process, natural selection, which, through the non-chance retention of chance mutations, selectively accumulates complexity, until a few billion years later it produces a rabbit. Only the very first replicator in the history of time needed to pop out of the random shaking of molecules - perhaps a short RNA string, though there are more sophisticated hypotheses about autocatalytic hypercycles of chemistry.)

Restricting our attention to running vehicles, there is still an astronomically huge design space of vehicles that could be composed of the same atoms as the Corolla. Most possible running vehicles won't work quite as well. For example, we could take the parts in the Corolla's air conditioner and mix them up in hundreds of possible configurations; nearly all these configurations would result in an inferior vehicle (still recognizable as a car) that lacked a working air conditioner. Thus there are many more configurations corresponding to inferior vehicles than to vehicles of Corolla quality.

A tiny fraction of the design space does describe vehicles that we would recognize as faster, more efficient, and safer than the Corolla. Thus the Corolla is not optimal under our preferences, nor under the designer's own goals. The Corolla is, however, optimized, because the designer had to hit an infinitesimal target in design space just to create a working car, let alone a car of Corolla-equivalent quality. The subspace of working vehicles is dwarfed by the space of all possible molecular configurations for the same atoms. You cannot build so much as an effective wagon by sawing boards into random shapes and nailing them together according to coinflips. To hit such a tiny target in configuration space requires a powerful optimization process. The better the car you want, the more optimization pressure you have to exert. You need a huge optimization pressure just to get a car at all.

This whole discussion assumes implicitly that the designer of the Corolla was trying to produce a "vehicle", a means of travel. This assumption deserves to be made explicit, but it is not wrong, and it is highly useful in understanding the Corolla.

Planning also involves hitting tiny targets in a huge search space. On a 19-by-19 Go board there are roughly 1e180 legal positions. On early positions of a Go game there are more than 300 legal moves per turn. The search space explodes, and nearly all moves are foolish ones if your goal is to win the game. From all the vast space of Go possibilities, a Go player seeks out the infinitesimal fraction of plans which have a decent chance of winning.

You cannot even drive to the supermarket without planning - it will take you a long, long time to arrive if you make random turns at each intersection. The set of turn sequences that will take you to the supermarket is a tiny subset of the space of turn sequences. Note that the subset of turn sequences we're seeking is defined by its consequence - the target - the destination. Within that subset, we care about other things, like the driving distance. There are plans that would take us to the supermarket in a huge pointless loop-the-loop.

In general, as you live your life, you try to steer reality into a particular region of possible futures. When you buy a Corolla, you do it because you want to drive to the supermarket. You drive to the supermarket to buy food, which is a step in a larger strategy to avoid starving. All else being equal, you prefer possible futures in which you are alive, rather than dead of starvation. When you drive to the supermarket, you aren't really aiming for the supermarket, you're aiming for a region of possible futures in which you don't starve. Each turn at each intersection doesn't carry you toward the supermarket, it carries you out of the region of possible futures where you lie helplessly starving in your apartment. If you knew the supermarket was empty, you wouldn't bother driving there. An empty supermarket would occupy exactly the same place on your map of the city, but it wouldn't occupy the same role in your map of possible futures. It is not a location within the city that you are really aiming at, when you drive.

The key idea about an optimization process is that we can know something about the target - the region of possible futures into which the optimization process steers reality - without necessarily knowing how the optimization process will hit that target.

Human intelligence is one kind of powerful optimization process, capable of winning a game of Go or turning sand into digital computers. Natural selection is much slower than human intelligence; but over geological time, cumulative selection pressure qualifies as a powerful optimization process.

Once upon a time, human beings anthropomorphized stars, saw constellations in the sky and battles between constellations. But though stars burn longer and brighter than any craft of biology or human artifice, stars are neither optimization processes, nor products of strong optimization pressures. The stars are not gods; there is no true power in them.

[edit] Is the notion of "optimization" useful?

Is the notion of a powerful optimization process useful for our particular purpose of discussing Friendly AI?

I would argue that if something is not a powerful optimization process, then it has neither the power to convey the special and unusual benefits of AI, nor the potential to pose a special danger. And if something is a powerful optimization process, then it may either convey the special benefits of AI, or pose a special danger.

If we spot a ten-mile-wide asteroid hurtling toward Earth, then we are all in deadly jeopardy. (If we do not spot the asteroid, then we are in much worse jeopardy; the map is not the territory.) But, if we do spot the asteroid, it is still only a natural hazard. If we devise a likely plan, the asteroid itself will not oppose us, will not try to think of a counterplan. If we try to deflect the asteroid with a nuclear weapon, the asteroid will not send out interceptors to stop us. If we train lasers on the asteroid's surface (to vaporize gas that expands and deflects the asteroid), the asteroid will not mirror its surface. If we deflect that one asteroid, the asteroid belt will not send another planet-killer in its place. We might have to do some work to steer the future out of the unpleasant region, but there would be no counterforce trying to steer it back. The sun going nova might prove just as deadly to the human species as a recursively-self-improving non-Friendly AI, but the sun going nova would not be a threat of that special kind that actively resists solution, a problem that counters humanity's attempt to solve it.

A universal antiviral agent would be a tremendous benefit to human society, but it would still have no purpose of its own. Even a universal antiviral can be used to negative purposes - for example, it could be administered to soldiers on a battlefield, who could then blanket the enemy with anthrax spores. A universal antiviral would have no power to benefit humanity except insofar as we, with our own minds and intelligence, planned to wield it well. A universal antiviral does not change the nature of the game that humans play against Death. It would still be just us, alone, with our own wits, steering the future.

What we fear from Artificial Intelligence is that we may be beaten at our own game, outwitted and out-invented. Against all our best efforts the future will be steered into a region where humanity is no more; our lesser abilities crushed, like a ninth-dan professional rolling over an amateur Go player. Conversely we may also hope for a powerful optimization process that steers the world out of trouble, away from futures in which humanity is extinguished. We can hope for better outcomes than we could have produced through unaugmented human wits.

In both cases, I speak of the real-world results, and not the particular fashion in which those results are achieved. I speak of the impact that the AI has upon the world - which is what the notion of an optimization process is all about. There are many other purposes of discussion which would imply a legitimate interest in how the AI works internally. There are other purposes of discussion, under which it would not be vacuous to argue whether to call the AI "intelligent" or confer upon it the label of "mind".

But, just so long as the AI is a powerful optimization process as defined - so long as it does in fact possess an enormously strong capacity to steer reality into particular regions of possible futures - then there automatically exists the potential for a huge impact upon the world, a major reshaping of reality which greatly helps or greatly harms humankind.

Conversely, anything that lacks a powerful ability to steer the future into narrow target regions, cannot harm or help humans in the special way. An asteroid that can only steer itself toward Earth using weak puffs of gas, would be slightly more difficult, but not impossible, to fend off. We would only need to push on it more strongly than it could push back. The same goes for an optimization process that can only weakly puff on the future. Similarly, a weak beneficial puff on the future may help us a little, but it is unlikely to help us more than we could have helped ourselves.

In this restricted sense, then, the notion of a powerful optimization process is necessary and sufficient to a discussion about Artificial Intelligence that could powerfully benefit or powerfully harm humanity. If you say that an AI is mechanical and therefore "not really intelligent", and it kills you, you are still dead; and conversely, if an "unintelligent" AI cures your cancer, you are still alive.

[edit] Surface facts and deep generators

There's a popular conception of AI as a tape-recorder-of-thought, which only plays back knowledge given to it by the programmers.

Suppose that you tried to build a CPU by programming in, as separate and disconnected facts, every possible sum of two 32-bit integers. This requires a giant lookup table with 2^64 (18 billion billion) entries. Imagine the woes of a research team that tries to build this Arithmetic Expert System as a giant semantic network. They will run headlong into the "common-sense problem" for addition. Seemingly, to teach a computer addition, you must teach it a nearly infinite number of facts that humans somehow just know. Maybe the research team will launch a distributed Internet project to encode all the detailed knowledge necessary for addition. Or maybe they'll try to buy a supercomputer, on the theory that past projects to create Artificial Addition failed because of inadequate computing power.

A compact description of the underlying rules of arithmetic (e.g. the axioms of addition) can give rise to a vast variety of surface facts (e.g. that 953,188 + 12,152 = 965,340). Trying to capture the surface phenomenon, rather than the generator, rapidly runs into the problem of needing to capture an infinite number of surface facts.

You cannot build Deep Blue (the famous program that beat Garry Kasparov for the world chess championship) by programming in a good chess move for every possible chess position. First of all, it is impossible to build a chess player this way, because you don't know exactly which positions it will encounter. You would have to record a specific move for zillions of positions, more than you could consider in a lifetime with your slow neurons. And second, even if you did this, the resulting program would not play chess any better than you do. That is the peril of recording and playing back surface phenomena, rather than capturing the underlying generator.

Deep Blue played chess barely better than the world's top humans, but a heck of a lot better than its own programmers. Deep Blue's programmers could play chess, of course - they had to know the rules - but the programmers didn't play chess anywhere near as well as Kasparov or Deep Blue. Deep Blue's programmers didn't just capture their own chess-move generator. If they'd captured their own chess-move generator, they could have avoided the problem of programming an infinite number of chess positions - but they couldn't have beat Garry Kasparov; they couldn't have built a program that played better chess than any human in the world. The programmers built a better move generator - one that more powerfully steered the game toward the target of winning game positions. Deep Blue's programmers have some slight ability to find chess moves that aim at this same target, but their steering ability is much weaker than Deep Blue's.

Does this seem paradoxical? Maybe it seems paradoxical, but remember that it actually happened - the programmers did actually build Deep Blue, it did actually make moves the programmers could never have thought of, and it did actually beat Kasparov. You can call this "paradoxical", if you like, but it remains a fact. It is likewise "paradoxical" but true that Garry Kasparov was not born with a complete library of chess moves programmed into his DNA. Kasparov invented his own moves; he was not explicitly preprogrammed by evolution to make particular moves - though natural selection did build a brain that could learn. And Deep Blue's programmers invented Deep Blue's code without evolution explicitly encoding Deep Blue's code into their genes.

Steam shovels lift more weight than humans can heft, skyscrapers are taller than their human builders, humans play better chess than natural selection, and computer programs play better chess than humans. The creation can exceed the creator. Call this paradoxical, if you like, but it happens in real life. You can deliberately create a move-chooser that chooses according to a different rule than you yourself employ. You can call it a great and sacred magic, if you like, that humans can invent new strategies which blind unthinking evolution did not explicitly preprogram into us. But Deep Blue also made moves beyond the ability of its programmers. So if there is a sacred magic, it is a sacred magic which AI programmers can infuse into computer programs.

[edit] Answers and questions

If I want to create an AI that plays better chess than I do, I have to program a search for winning moves. I can't program in specific moves because then the chess player won't be any better than I am.

This holds true on any level where an answer has to meet a sufficiently high standard. If you want any answer better than you could come up with yourself, you necessarily sacrifice your ability to predict the exact answer in advance.

But do you necessarily sacrifice your ability to predict everything?

As my coworker, Marcello Herreshoff, says: "We never run a program unless we know something about the output and we don't know the output." Deep Blue's programmers didn't know which moves Deep Blue would make, but they must have known something about Deep Blue's output which distinguished that output from the output of a pseudo-random move generator. After all, it would have been much simpler to create a pseudo-random move generator; but instead the programmers felt obligated to carefully craft the complex program that is Deep Blue. In both cases, the programmers wouldn't know the move - so what was the key difference? What was the fact that the programmers knew about Deep Blue's output, if they didn't know the output?

Imagine that the programmers had said to themselves, "Well, if we knew what Deep Blue's move would be, it couldn't possibly play any better than we could. So we need to make sure we don't know Deep Blue's move. So we'll use a random move generator. Problem solved!" One thing is for sure, the resulting program wouldn't have played good chess. Of course, the programmers might be able to convince themselves that the program would play well... after all, they don't know where the program will move, and they don't know what the best move is, so it cancels out, right?

[edit] Intelligence and probability

[edit] Calibrating predictions about intelligence

Imagine that I'm playing chess against a smarter opponent. If I could predict exactly where my opponent would move on each turn, I would automatically be at least as good a chess player as my opponent. I could just ask myself where my opponent would move, if she were in my shoes; and then make the same move myself. (In fact, to predict my opponent's exact moves, I would need to be superhuman - I would need to predict my opponent's exact mental processes, including her limitations and her errors. It would become a problem of psychology, rather than chess.)

So predicting an exact move is not possible, but neither is it true that I have no information about my opponent's moves. Personally, I am a very weak chess player (I play an average of maybe two games per year). But even if I'm playing against former world champion Garry Kasparov, there are certain things I can predict about his next move. When the game starts, I can guess that the move P-K4 is more likely than P-KN4. I can guess that if Kasparov has a move which would allow me to checkmate him on my next move, that Kasparov will not make that move. Much less reliably, I can guess that Kasparov will not make a move that exposes his queen to my capture - but here, I could be greatly surprised; there could be a rationale for a queen sacrifice which I have not seen.

And finally, of course, I can guess that Kasparov will win the game! Supposing that Kasparov is playing black, I can guess that the final position of the chess board will occupy the class of positions that are wins for black. I cannot predict specific features of the board in detail; but I can narrow things down relative to the class of all possible ending positions.

But I am not actually certain that Kasparov will win. It's extremely likely, but not certain. Such knowledge is made up of probabilities, not sureties. For our purposes here, a "probability" is a guess to which a number is attached, indicating how often you expect to be correct about that kind of guess.

If you're well-calibrated in your probabilities, it means that if we keep track of all the guesses where you say "sixty percent", about 6 in 10 of those guesses turn out to be correct. On the other hand, if you go around declaring that you are "ninety-eight percent certain" of something, and about 7 in 10 of those guesses turn out to be correct, we will say you are poorly calibrated.

(Mr. Spock of Star Trek is extremely poorly calibrated; he often says something like "Captain, if you steer the Enterprise directly into that black hole, our probability of surviving is only 2.234%" and yet nine times out of ten the Enterprise is not destroyed. What kind of tragic fool gives four significant digits for a figure that is off by two orders of magnitude? But then Spock is no more skilled a rationalist than the scriptwriters who produce his dialogue, for if you knew exactly what a great rationalist would say, you would be that rational yourself.)

If I play chess against a superior opponent, and I don't know for certain where my opponent will move, I can still produce a probability distribution that is well-calibrated - in the sense that, over the course of many games, legal moves that I label with a probability of "ten percent" are made by the opponent around 1 time in 10. That is my goal in the task of fine-tuning my own uncertainty: when I say "ten percent", around 1 time in 10 that event should happen; neither more often nor less; neither 1 time in 100, nor 1 time in 4, but 1 time in 10.

You might ask: Is producing a well-calibrated distribution over Kasparov beyond my abilities as an inferior chess player? The answer is a definite no! There is a trivial way to produce a well-calibrated probability distribution. If my opponent has 37 legal moves, I can assign a probability of 1/37 to each move. This is called a maximum-entropy distribution, representing my total ignorance - I have no idea where my opponent might move; all legal moves seem equally likely to me. (Note: "Maximum entropy" is a mathematical term, not just a colloquial way of saying "totally ignorant". There is a way to calculate the "entropy" of the probability distribution, and 1/37 for each legal move is the unique distribution that maximizes the calculated "entropy".) If I give the maximum-entropy distribution as my reply, then I am perfectly calibrated. Why? Because I assigned 37 different moves a probability of 1 in 37, and exactly one of those moves will happen, so I applied the label "1 in 37" to 37 different events and exactly 1 of those events occurred.

But total ignorance is not very useful, even if you confess it honestly. So the question then becomes whether I can do better than maximum entropy. Is it possible to do better than perfect calibration? Yes. Let's say that you and I both answer a quiz with ten questions. You assign probabilities of 90% to your answers, and get one answer wrong. I assign probabilities of 80% to my answers, and get two answers wrong. We are both perfectly calibrated but you exhibited better discrimination - your answers more strongly distinguished truth from falsehood.

(For more on this subject, see Technical Explanation.)

I can assign a well-calibrated probability distribution over the chess moves of a stronger opponent, even though I'm not certain. If I'm almost totally ignorant, I can still assign a well-calibrated distribution - but it will closely approach the maximum-entropy distribution that assigns equal probability to all legal moves. "Strong confidence" is when you assign probabilities that approach 1.0 or 0.0 - you label one specific outcome "nearly certain" and the others "nearly impossible". That which we call "honest ignorance" is when you assign roughly equal probabilities to most possibilities - you have no idea what might happen; all outcomes seem equally likely to you. In between is "guessing", where some outcomes seem more likely than others, but no outcome has a probability approaching 1.0.

[edit] Entropy versus creativity

Suppose that someone shows me an arbitrary chess position, and asks me: "What move would Kasparov make if he played black, starting from this position?" Since I'm not nearly as good a chess player as Kasparov, I can only weakly guess Kasparov's move, and I'll assign a non-extreme probability distribution to Kasparov's possible moves. In principle I can do this for any legal chess position, though my guesses may approach maximum entropy. If you put me in a box and feed me chess positions and get probability distributions back out, then we would have - theoretically speaking - a system that produces Yudkowsky's guess for Kasparov's move in any chess position. We shall suppose (though it may be unlikely) that my prediction is well-calibrated, if not very discriminating.

Now suppose we turn "Yudkowsky's prediction of Kasparov's move" into an actual chess opponent, by having a computer randomly make moves at the exact probabilities I assigned. We'll call this system RYK, which stands for "Randomized Yudkowsky-Kasparov", though it should really be "Random Selection from Yudkowsky's Probability Distribution over Kasparov's Move."

Will RYK be as good a player as Kasparov? Definitely not! Sometimes the RYK system will randomly make dreadful moves which the real-life Kasparov would never make - start the game with P-KN4. I assign such moves a low probability, but sometimes the computer makes them anyway, by sheer random chance. The real Kasparov also sometimes makes moves that I assigned a low probability, but only when the move has a better rationale than I realized - the astonishing, unanticipated queen sacrifice.

Randomized Yudkowsky-Kasparov is definitely no smarter than Yudkowsky, because RYK draws on no more chess skill than I myself possess - I build all the probability distributions myself, using only my own abilities. Actually, RYK is a far worse player than Yudkowsky. I myself would make the best move I saw with my knowledge. RYK only occasionally makes the best move I saw - I won't be very confident that Kasparov would make exactly the same move I would.

Now suppose that I myself play a game of chess against the RYK system.

RYK has the odd property that, on each and every turn, my probabilistic prediction for RYK's move is exactly the same prediction I would make if I were playing against world champion Garry Kasparov.

Nonetheless, I can easily beat RYK, where the real Kasparov would crush me like a bug.

The creative unpredictability of intelligence is not like the noisy unpredictability of a random number generator. When I play against a smarter player, I can't predict exactly where my opponent will move against me. But I can predict the end result of my smarter opponent's moves, which is a win for the other player. When I see the randomized opponent make a move that I assigned a tiny probability, I chuckle and rub my hands, because I think the opponent has randomly made a dreadful move and now I can win. When a superior opponent surprises me by making a move to which I assigned a tiny probability, I groan because I think the other player saw something I didn't, and now I'm about to be swept off the board. Even though it's exactly the same probability distribution! I can be exactly as uncertain about the actions, and yet draw very different conclusions about the eventual outcome. (Technical note: This situation is possible because I am not logically omniscient; I do not explicitly represent a joint probability distribution over all entire games.)

When I play against a smarter player, I can't predict exactly where my opponent will move against me. If I could predict that, I would necessarily be at least that good at chess myself. But I can predict the consequence of the unknown move, which is a win for the other player; and the more the player's actual action surprises me, the more confident I become of this final outcome.

The unpredictability of intelligence is a very special and unusual kind of surprise, which is not at all like noise or randomness. There is a weird balance between the unpredictability of actions and the predictability of outcomes.

[edit] What is the empirical content of beliefs about intelligence?

The strength of a hypothesis is determined by its simplicity and by the amount of probability mass it concentrates into the exact outcome observed. For example, suppose that I predict that the price of a cookie on Tuesday will be between 1 and 50 cents, while you predict that the price will be between 31 and 35 cents. If the price is 34 cents, both of our predictions came true, but yours concentrated ten times as much probability mass into the exact outcome of 34. Guessing an outcome between 1 and 50, without further specification, is like assigning a 2% probability to each of 50 possible numbers, while guessing an outcome between 31 and 35 is like assigning 20% probability to each of 5 possible numbers. The more probability mass your hypothesis concentrates into the actual observed outcome, the better you do. If a hypothesis is unfalsifiable - if you can make any observation seem to fit the hypothesis equally well - then the hypothesis doesn't concentrate its probability mass at all; it is a disguised maximum-entropy probability distribution, which is to say, a cleverly masked form of total ignorance. (For more on this, again see Technical Explanation.)

Since I am so uncertain of Kasparov's move, what is the empirical content of my belief that "Kasparov is a highly intelligent chess player"? What real-world experience does my belief tell me to anticipate? Is it a cleverly masked form of total ignorance?

To sharpen the dilemma, suppose Kasparov plays against some mere chess grandmaster Mr. G, who's not in the running for world champion. My own ability is far too low to distinguish between these levels of chess skill. When I try to guess Kasparov's move, or Mr. G's next move, all I can do is try to guess "the best chess move" using my own meager knowledge of chess. Then I would produce exactly the same prediction for Kasparov's move or Mr. G's move in any particular chess position. So what is the empirical content of my belief that "Kasparov is a better chess player than Mr. G"?

The empirical content of my belief is the testable, falsifiable prediction that the final chess position will occupy the class of chess positions that are wins for Kasparov, rather than drawn games or wins for Mr. G. (Counting resignation as a legal move that leads to a chess position classified as a loss.) The degree to which I think Kasparov is a "better player" is reflected in the amount of probability mass I concentrate into the "Kasparov wins" class of outcomes, versus the "drawn game" and "Mr. G wins" class of outcomes. These classes are extremely vague in the sense that they refer to vast spaces of possible chess positions - but "Kasparov wins" is more specific than maximum entropy, because it can be definitely falsified by a vast set of chess positions.

The outcome of Kasparov's game is predictable because I know, and understand, Kasparov's goals. Within the confines of the chess board, I know Kasparov's motivations - I know his success criterion, his utility function, his target as an optimization process. I know where Kasparov is ultimately trying to steer the future and I anticipate he is powerful enough to get there, although I don't anticipate much about how Kasparov is going to do it.

How exactly do I describe "where Kasparov is trying to steer the future"? In the case of chess, there's a simple function that classifies chess positions into wins for black, wins for white, and drawn games. If I know which side Kasparov is playing, I know the class of chess positions Kasparov is aiming for. (If I don't know which side Kasparov is playing, I can't predict whether black or white will win - which is not the same as confidently predicting a drawn game.)

More generally, I can describe motivations using a preference ordering. When I consider two potential outcomes, A and B, I can say that I prefer A to B, prefer B to A, or find myself indifferent between them. I would write these relations as A > B, B > A, or B ~ A. Suppose that we have the ordering A < B < C ~ D ~ E ~ F < G ~ H ~ I. Then you like B more than A, and C more than B. But {C, D, E, F} all belong to the same class, seem equally desirable to you; you are indifferent between which of {C, D, E, F} you receive, though you would rather have any of them than B, and you would rather have G (or H, or I) than any of C, D, E, or F.

When I think you're a powerful intelligence, and I think I know something about your preferences, then I'll predict that you'll steer reality into regions that are higher in your preference ordering. Think of a huge circle containing all possible outcomes, such that outcomes higher in your preference ordering appear to be closer to the center. Outcomes between which you are indifferent are the same distance from the center - imagine concentric rings of outcomes that are all equally preferred. If you aim your actions and strike a consequence close to the center - an outcome that ranks high in your preference ordering - then I'll think better of your ability to aim.

The more intelligent I believe you are, the more probability I'll concentrate into outcomes that I believe are higher in your preference ordering - that is, the more I'll expect you to achieve a good outcome, and the better I'll expect the outcome to be. Even if a powerful enemy opposes you, so that I expect the final outcome to be one that is low in your preference ordering, I'll still expect you to lose less badly if I think you're more intelligent.

[edit] Side effects

Suppose that at the end of the game, I count the number of pieces on white squares, subtract the number of pieces on black squares, and ask whether the resulting number is odd or even - call this the "parity of the board". I don't know what the board parity will be at the end of the game; I assign 50/50 odds to two possibilities, representing my complete ignorance. The reason I can't make any prediction is that Kasparov doesn't care about the board's parity - there's no term for board parity in Kasparov's preference function, that I know of.

The exact final state of the board is determined by Kasparov and his opponent, both trying to steer the chess game. Their actions repeatedly affect the board's parity. Otherwise the board would just keep the even parity it has at the start of the game. But neither Kasparov nor his opponent cares specifically about the parity of the board - they aren't paying attention to it. Not caring about something isn't the same as wanting to leave it untouched. Neither Kasparov nor G have an explicit term in their preference function for the board parity - they don't even notice the board parity - but this does not imply that the board parity remains unchanged throughout the game. From my perspective, Kasparov and Mr. G randomize the board parity as a side effect of influencing the properties they do care about.

[edit] Quantifying optimization and intelligence

We are now ready to define quantitatively the power of an optimization process. In addition to the notion of a preference ordering, already introduced, we'll need the further concept of a state space of possible plans, possible designs, or possible outcomes. For example, looking at a Toyota Corolla, we could regard the state space as the set of all possible molecular configurations of the same atoms.

Given a description of what is possible, and a preference ordering over the possibilities, then I can look at the outcome actually achieved - for example, the actual design of the Toyota Corolla - and ask:

How many possibilities in the state space would be as good or better than the actual outcome, under the preference ordering?
How many possibilities are there, total, in the entire state space?

Divide the first number by the second. The result is the fraction of outcomes as-good-or-better within the total space of possibilities. This gives you a quantifiable measure of how small a target the optimization process was able to hit.

If you take the base two logarithm of the reciprocal of this fraction, that gives you the power of an optimization process measured in bits.

For example, suppose there are 1024 possible outcomes, and you achieve an outcome X. And suppose that there are only 4 possible outcomes that you regard as "as good as X or better", including X itself. Then only 1 in 256 possible outcomes are "as good or better" than the outcome actually achieved. An optimization process that reliably hits this close to the center does 8 bits of optimization.

(The mathematically sophisticated will recognize that I am measuring the entropy of something. We might call it the entropy of a system relative to a preference ordering. As always in our universe where Liouville's Theorem holds, it takes work to reduce entropy - any kind of entropy.)

It's about equally difficult to do 8 bits of optimization whether there are only 4 satisfactory outcomes in a space that contains 1024 possibilities, or only 1,000,000 satisfactory outcomes in a space that contains 256,000,000. In either case, the relative size of the target is the same. In either case, you would need to randomly search around 256 cases to find a satisfactory outcome, if you didn't have any way to search more efficiently.

You may also find it convenient to think in terms of utility functions, a kind of preference that is more structured than simple ordering. A utility function is when you can assign a real number saying exactly how much you want something - for example, you might assign a utility of 15 to eating chocolate ice cream, a utility of 10 to eating vanilla ice cream, and a utility of 0 to receiving no ice cream. Then you would prefer chocolate ice cream to vanilla ice cream; you would also prefer a 70% chance of receiving chocolate ice cream to a 100% chance of receiving vanilla ice cream.

You could also measure the fraction of all possible outcomes with utility greater than or equal to e.g. 42, and thereby get the observed power of an optimization process. If only one outcome in a million has a utility of 42 or better, then reliably achieving an outcome this good would require around 20 bits of optimization.

The two known powerful optimization processes in this universe, human intelligence and natural selection, both produce outcomes that are vastly improbable - thousands of bits or more. The usual analogy is "How long does it take a monkey randomly hitting typewriter keys to type the complete works of Shakespeare?" If you relax your requirements by allowing the monkey to produce any work of length and quality equal to a Shakespearean play (as judged by a fair-minded literary critic), it still takes a very long time. Program your computer to show you random strings of letters and punctuation, and see how long it takes to produce a single comprehensible sentence, let alone a page. It doesn't take much optimization pressure to leave the space of things that pure randomness could produce in a mere billion ages of the universe.

[edit] Could we recognize an alien intelligence?

Could I recognize an alien intelligence as exceptionally smart, without understanding the alien mind's motivations, the way I understand Kasparov's goal in chess?

I could land on an alien planet and discover what seemed to be a highly sophisticated machine, all gleaming chrome as the stereotype demands. Can I recognize this machine as being in any sense well-designed, if I have no idea what the machine is intended to accomplish? Can I guess that the machine's makers were intelligent, without guessing their motivations?

I could examine a piece of the machine under a microscope, and discover billions of tiny transistors. What are the transistors computing? I don't know. But I can still recognize well-designed transistors and guess that the machine is computing something. There are many different possible computing problems which will require the aliens to solve the subproblem of efficiently processing information.

I can look at cables through which large electrical currents are running, and be astonished to realize that the cables are flexible, high-temperature, high-amperage superconductors - an amazingly good solution to the subproblem of transporting electricity that is generated in one location and used in another.

I can look at gears, whirring rapidly, and imagine that if those gears had random shapes, they would clash and fly apart and generate destructive internal forces - the gears seem to have been selected from a tiny subset of possible whirring shapes, such that the shapes mesh when they rotate.

In this scenario I have just imagined, what I recognize within the alien machine are well-optimized subgoals similar to the subgoals of human engineers. Subgoals might overlap even if the final goals are widely different from our own. I might also be able to infer a subproblem by inspecting a part of the machine, much more easily than I could infer the alien's psychological desires and final purposes.

If there are no subproblems to which I can recognize a good solution, then I can't recognize the machine as a "machine"! Think back to the Toyota Corolla; it occupies an infinitesimal fraction of state space describing "vehicles" of equal or greater speed, efficiency, safety, reliability, and comfort. This is something to remark upon when I see the Corolla (or any other car, even a Model-T). But if I don't see any criterion for the parts or the whole, so that, as far as I know, a random volume of air molecules or a clump of dirt would be just as surprising, just as worthy of remark, then why am I focusing on this particular object and saying, "Here is a machine"? Why not say the same about a cloud or a rainstorm? Why is it a good hypothesis to suppose that intelligence or any other optimization process played a role in selecting the form of what I see, any more than it is a good hypothesis to suppose that the dust particles in my rooms are arranged by dust elves?

Even the gleaming chrome exterior of the machine is a solution to the subproblem of protecting the machine's internal parts from the environment. If the machine is made of hard materials which retain their shape over time, then that is a solution to making a function persistent - ensuring that an invention, once it is designed and built, continues functioning over time.

If you can't identify any optimization target at all, you don't have optimization, you just have noise. Every possible configuration would appear to equally fit the criterion; every possible configuration would be assigned equal probability; nothing you could observe would falsify the theory. This is a hypothesis of maximum entropy.

[edit] Creativity and breaking the rules

Creativity is surprising - but not just any kind of surprise counts as a creative surprise. Suppose I set up an experiment involving a quantum event of very low amplitude, such that the macroscopic probability is a hundred million to one. If the event is actually observed to occur, it is a happenstance of extremely low probability, and in that sense surprising. But it is not a creative surprise. Surprisingness is not a sufficient condition for creativity.

Creativity, as we all know, involves breaking the rules - but not all the rules. If everyone builds their cars from iron triangles, and I build a better car using bronze squares, then that is a creative surprise. I broke the surface rules normally used to invent solutions, and I built a better car thereby. Ordinarily, one would expect a car built from bronze squares to catch fire and explode; and yet this car starts up and drives to the supermarket. How unexpected! How surprising! But the result must still be a car. If I tried to make a better car from bronze squares, and failed completely, ending up with a heap of scrap metal, there would be nothing surprising about that. More experienced engineers would just shake their heads wisely and say, "That's why we use iron triangles, kiddo."

The pleasant shock of witnessing Art arises from the constraints of Art - from watching a skillful archer send an arrow into an exceedingly narrow target. Static on a television screen is not beautiful, it is noise.

In the strange domain known as Modern Art, people sometimes claim that their goal is to break the rules, to defy convention, for its own sake. They put up a blank square of canvas, and call it a painting; and by now that is considered staid and boring Modern Art, because a blank square of canvas still hangs on the wall and has a frame. What about a heap of garbage? That can also be Modern Art! Surely, this demonstrates that true creativity knows no rules, and even no goals...

But the rules are still there, though unspoken. I could submit a realistic landscape painting as Modern Art, and this would be rejected because it violates the rule that Modern Art cannot delight the untrained senses of a mere novice. Or better yet, if a heap of garbage can be Modern Art, then I'll claim that someone else's heap of garbage is my work of Modern Art - boldly defying the convention that I need to produce something for it to count as my artwork. Or what about the pattern of dust particles on my desk? Isn't that Art? Flushed with triumph, I present to you an even bolder, more convention-defying work of Modern Art - a stunning, outrageous piece of performance art that, in fact, I never performed. I am defying the foolish convention that I need to actually perform my performance art for it to count as Art.

Now, up to this point, you probably could still get a grant from the National Endowment for the Arts, and get sophisticated critics to discuss your shocking, outrageous non-work, which boldly violates the convention that art must be real rather than imaginary. But now suppose that you go one step further, and refuse to tell anyone that you have performed your work of non-Art. You even refuse to apply for an NEA grant. It is the work of Modern Art that never happened and that no one knows never happened; it exists only as my concept of what I am supposed not to conceptualize. Better yet, I will say that my Modern Art is your non-conception of something that you are not conceptualizing. Here is the ultimate work of Modern Art, that truly defies all rules: It isn't mine, it isn't real, and no one knows it exists...

And this ultimate rulebreaker you could not pass off as Modern Art, even if NEA grant committees knew that no one knew it existed. For one thing, they would realize that you were making fun of them - and that is an unspoken rule of Modern Art that no one dares violate. You must take yourself seriously. You must break the surface rules in a way that allows sophisticated critics to praise your boldness and defiance with a straight face. This is the unwritten real goal, and if it is not achieved, all efforts are for naught. Whatever gets sophisticated critics to praise your rule-breaking is good Modern Art, and whatever fails in this end is poor Modern Art. Within that unalterable constraint, you can use whatever creative means you like.

But let us turn from Modern Art to more conventional forms of creativity, such as engineering. Does creative engineering sometimes involve altering your goals? First my goal was to try and figure out how to build a car using iron triangles; now my goal is to build a car using bronze squares...

Creativity clearly involves altering my local intentions, my what-I'm-trying-to-do-next. I begin by intending to configure iron triangles, to build a car, to drive to the supermarket, to buy food, to eat food, so that I don't starve to death, because I prefer being alive to starving to death. I may creatively use bronze squares, instead of iron triangles; creatively walk, instead of driving; creatively drive to a gas station, instead of a supermarket; creatively grow my own vegetables, instead of buying them; or even creatively devise a way to run my body on electricity, instead of chemical energy. What does not count as "creativity" is creatively preferring to starve to death, rather than eating. This "solution" does not strike me as very impressive; it involves no effort, no intelligence, and no surprises. If this is someone's idea of how to break all the rules, they would become pretty easy to predict.

Are there cases where you genuinely want to change your preferences? You may look back in your life and find that your moral beliefs have changed over decades, and that you count this as progress. Civilizations also change their morals over time. In the seventeenth century, people used to think it was okay to enslave people with differently colored skin; and now we think otherwise.

The notion of "change in preferences" gets into Friendly AI issues which are far beyond the scope of this particular essay - though see Coherent Extrapolated Volition.

But you might guess by now, you might somehow intuit, that if these moral changes seem interesting and important and vital and indispensable, then not just any change would suffice. You might suspect that you're judging potential changes as better or worse, even if you can't consciously, verbally report the rules that govern your intuitive perceptions. If there's no criterion, no target, no way of choosing - then your current point in state space is just as good as any other point, no more, no less; and you might as well keep your current state, unchanging, forever.

Every improvement is necessarily a change, but not every change is an improvement. If all you learn from observing a history of improvements is that "change is good", and so you chase after change, any change - then that's rather like the dogs in Pavlov's famous experiment who salivated at the sound of a bell, whether or not the bell was accompanied by meat. You've trained yourself to chase the wrong stimulus.

[edit] The supposed role of randomness in intelligence

Now imagine forgetting everything you've just read, and approaching the problem from a purely instinctive perspective. You might instinctively think something like this:

When someone shows me how to build a toaster that's vastly more efficient than any toaster I've ever seen before, I'm surprised.
When I thought someone was trustworthy, and then it turns out they embezzled all the money from my bank account, I'm surprised.
It's clear that you can't be super-smart without generating surprises.
Therefore a smarter-than-human AI might surprisingly decide to kill humans.

The reasoning here follows the form:

Major premise: All oranges are fruits.
Minor premise: All apples are fruits.
Therefore, all oranges are apples.

When you describe different events using the same word "surprise", they don't thereby become the same sort of thing. And it doesn't follow that one kind of surprise implies the other. Marvin Minsky labeled this the problem of "suitcase words" - when you describe many different phenomena using the same word, and then reason about them as if they were indistinguishable.

If an AI is unpredictable in its exact actions, must it be unpredictable in its optimization target - its motives, its goals - or in the consequences of its actions? So far I have argued that there is no logical necessity to this effect - it is not "paradoxical" for Deep Blue to predictably make unpredictably good chess moves.

But one might still argue that there is a pragmatic necessity for some sort of genuine unpredictability as to motives. Maybe cognition must make internal use of chaos/randomness/noise in order to work effectively, and these chaotic internal algorithms will give rise to surprising surface behavior of the "surprisingly kill humans" type.

Chaos, of itself, is not dangerous - or at least, it's not a danger on the special level of AI. If you send a string of random ones and zeroes to motor output, that causes the AI to jerk around randomly, but it doesn't cause the AI to go on a killing spree - the resulting actions will not be optimized to cause harm to humans. Rather, the idea seems to be that an AI whose cognitive processes make use of noise, even if designed to be Friendly, has an unavoidable probability of going on a deliberate killing spree.

In other words, it's argued that a mind, searching for plans that strike close to the center of the criterion of helping humans, can only search effectively, using chaotic search methods that can potentially output motor actions coherently optimized to kill humans. It's argued that a really smart AI must include noisy cognitive processes that potentially do this. It's argued that to strike at the center of an optimization target, there is no way to get a really good aim, without using an aiming process that has so much unpredictability in it, that it can potentially end up aiming somewhere else - even the exact opposite direction.

But wait - why should cognition run on randomness? Why does this make any more sense than cognition running on peanut butter?

Maybe people observe that intelligence generates "surprises", and conclude that intelligence must run on surprise-stuff as fuel. There is a well-known principle of magic called the Law of Similarity which states that Effects Resemble Causes, which is why, in prescientific cultures, there are rituals like pouring water on the ground to summon rain. Similarly, if objects catch on fire and burn, the cause must be a mysterious fire-stuff called "phlogiston"...

But there are more serious arguments for randomness playing a role in cognition, so let's address those first.

[edit] Calculating the power of pure randomness

You may note a certain trend in this essay: I've been arguing that noise hath no power, nor yet beauty from entropy, nor strength from randomness.

We can formalize this argument, using the concepts of a state space of possibilities, a preference ordering, and a fraction that describes the propertion of possibilities "as good or better" than some example. Or, if you prefer to think in terms of utility functions, then consider the fraction of all possible outcomes with utility greater than or equal to 42.

Suppose this fraction is 0.02: only 2% of all outcomes are outcomes with a utility of 42 or higher. And then suppose you observe an outcome with a utility of 42 (say, the car starts up immediately and drives to the supermarket in 8 minutes using a tenth of a gallon of gas). Then the likelihood of getting an outcome this good by pure chance is, obviously, 2%.

This may not sound very profound. But you may have heard people talking about emergence as if it could be used to explain complex, functional orders. People will say that the complex functional order of an ant colony emerges - as if, starting from ants that had been selected only to function as solitary individual ants, they got together in a group for the first time and the highly useful order of an ant colony popped right out. Actually, the complex order of the ant colony was produced by natural selection, the nonchance retention of chance mutations. A million mutations occur; by chance, one mutation builds an organism which reproduces more frequently. Because organisms which reproduce more often produce more copies of the genes they carry, the one mutant in a million that is lucky may become universal in the gene pool. This cycle repeats over, and over, and over again, through millions of generations, until you're left with an organism that could not possibly be explained by emergence - whose probability of emerging by pure luck would be infinitesimal over the lifespan of our universe.

The order of an ant colony is an evolved pattern, not an emergent pattern. If you shake up atoms randomly, with no natural selection operating, nothing resembling the higher levels of organization in the ant colony will fall out of the box.

Pure randomness has no more power than we would expect it to have. If an outcome is one in a million in our preference ordering, it will take an average of one million tries to produce it by pure randomness.

A probability of one in a million corresponds to only 20 bits of information. Mathematician's "bits" are not like ones and zeroes on a hard drive. 20 magnetic spots on a hard drive can transmit at most 20 bits of information, if an optimal encoding is used. (For more on this, see Shannon Information.) Many products of intelligence are optimized far beyond one in a million - they contain so many bits of information as to place them far beyond anything pure randomness could produce in the lifetime of a universe.

I emphasize "pure" randomness because if you combine a random process like mutation with a nonrandom process like selection - organisms dying or reproducing in a way that correlates nonrandomly with their genes - then you can get millions of bits of optimization in just a few billion years. And, just to anticipate the nitwit creationists, "non-random component" does not mean "orchestrated by a secret intelligence behind the scenes". "Non-random" means simple correlation: it is not the case that every possible genome has exactly the same chance of reproducing.

Pure randomness does not yield optimization, except in the sense that a billion tries may yield one result that apparently possesses 30 bits of optimization.

It should now be clear that a nonrandom component is necessary for high degrees of optimization.

But is it, perhaps, equally necessary to have a random component as well?

[edit] Can we do better by adding randomness?

You may have heard that certain algorithms in Artificial Intelligence work better when we inject randomness into them. Is this true, and if so, how is it possible?

Technical Explanation discusses the Bayesian scoring method when you answer many questions in a row. There are many important properties that a scoring method should have. One of them is that if you pretend to be more confident than you really are, you should do worse. It's quite possible to do worse than a maximum-entropy estimate, if you know nothing but pretend otherwise.

Suppose you were asked twelve multiple-choice questions with four options apiece, and you gave your answer to each question in the form of a probability distribution over the four options - for each question you would give your probability that option A was correct, then that option B was correct, then C and D, with the probabilities summing to 1. How do we score you? For each question, we look at the probability that you assigned to the actual, correct answer, ignoring the probabilities assigned to other answers. Then we multiply together the probabilities assigned to the correct answer on all twelve questions. The result is the joint probability you assigned to the final outcome, that is, the probability you assigned to the correct answer-sheet for the entire test.

(Suppose you flip a coin three times. If you think that "heads" is 50% probable on the first flip, 50% probable on the second flip, and 50% probable on the third flip, and you think the coinflips are uncorrelated, then your probability of seeing "HHH" is 1/8. For more details on this, including what happens if the coinflips are correlated, see Technical Explanation.)

The maximum-entropy distribution for a question with four options is a probability of 1/4 for each option. Is it possible to score worse than maximum entropy? Sure! For example, you could, on each of your twelve questions, assign 85% probability to one answer, and 5% apiece to the other three answers. But despite your high confidence, you do no better than random chance, answering three questions correctly and nine incorrectly. So your final score, the joint probability you assigned to the entire answer-sheet, is (0.85)^3 * (0.05)^9 = 1.2e-12. If you'd given the maximum-entropy response, you would have been guaranteed a score of (0.25)^12 = 6e-8.

If you're strongly confident in wrong answers, it is quite possible to do worse than if you confess total ignorance. In this case, you will be able to predictably do better by adjusting your probability distribution toward greater entropy - by moving closer to the maxentropy distribution. One may distinguish stupidity from ignorance. Confessing your own ignorance is not a substitute for actually knowing something, but it's a step up from being stupid.

Similarly, it is quite possible for injecting randomness to improve a system's performance. Adding noise can predictably decrease the entropy of a system relative to your preference ordering. All you need is a system that starts out in a state that is literally worse than random - one that is worse than a majority of possible states, or with utility lower than average. If so, replacing the current state with a random state is expected to result in an improvement (although, with sufficiently bad luck, it could make things even worse).

If the average utility of a randomly selected state is -10, and the current system starts out with a utility of -100, then adding noise will cause the system to revert toward the mean. The expected utility will creep back up toward -10 until it approaches the level you could have gotten by pure randomness.

This is something to think about when you hear that the performance of an Artificial Intelligence algorithm can be improved by adding noise to it. To improve an algorithm by injecting randomness into it, the unrandomized version must (on some step) do worse than random.

This is not quite as severe an indictment of "algorithms that are improved by randomness" as it may sound. Imagine that we're trying to solve a pushbutton combination lock with 20 numbers and four steps - 160,000 possible combinations. And we try the following algorithm for opening it:

Enter 0-0-0-0 into the lock.
If the lock opens, return with SUCCESS.
If the lock remains closed, go to step 1.

Obviously we can improve this algorithm by substituting "Enter a random combination" on the first step.

If we were to try and explain in words why this works, a description might go something like this: "When we first try 0-0-0-0 it has the same chance of working (so far as we know) as any other combination. But if it doesn't work, it would be stupid to try it again, because now we know that 0-0-0-0 doesn't work."

The first key idea is that, after trying 0-0-0-0, we learn something - we acquire new knowledge, which should then affect how we plan to continue from there. This is knowledge, quite a different thing from randomness...

What exactly have we learned? We've learned that 0-0-0-0 doesn't work; or to put it another way, given that 0-0-0-0 failed on the first try, the conditional probability of it working on the second try, is negligible.

Consider your probability distribution over all the possible combinations: Your probability distribution starts out in a state of maximum entropy, with all 160,000 combinations having a 1/160,000 probability of working. After you try 0-0-0-0, you have a new probability distribution, which has slightly less entropy; 0-0-0-0 has an infinitesimal probability of working, and the remaining 159,999 possibilities each have a 1/159,999 probability of working. To try 0-0-0-0 again would now be stupid, as defined above - the expected utility of trying 0-0-0-0 is less than average; the vast majority of potential actions now have higher expected utility than does 0-0-0-0. An algorithm that tries 0-0-0-0 again would do worse than random, and we can improve the algorithm by randomizing it.

One may also consider an algorithm as a sequence of tries: The "unrandomized algorithm" describes the sequence of tries 0-0-0-0, 0-0-0-0, 0-0-0-0... and this sequences of tries is a special sequence that has below-average expected utility in the space of all possible sequences. Thus we can improve on this sequence by selecting a random sequence instead.

Or imagine that the combination changes every second. In this case, 0-0-0-0, 0-0-0-0 is just as good as the randomized algorithm - no better and no worse. What this shows you is that the supposedly "random" algorithm is "better" relative to a known regularity of the lock - that the combination is constant on each try. Or to be precise, the reason the random algorithm does predictably better than the stupid one is that the stupid algorithm is "stupid" relative to a known regularity of the lock.

However, the random algorithm is still not optimal - it does not take full advantage of the knowledge we have acquired. A random algorithm might randomly try 0-0-0-0 again; it's not impossible, but it could happen. The longer the random algorithm runs, the more likely it is to try the same combination twice; and if the random algorithm is sufficiently unlucky, it might still fail to solve the lock after millions of tries. We can take full advantage of all our knowledge by using an algorithm that systematically tries 0-0-0-0, 0-0-0-1, 0-0-0-2... This algorithm is guaranteed not to repeat itself, and will find the solution in bounded time. Considering the algorithm as a sequence of tries, no other sequence in sequence-space is expected to do better, given our initial knowledge. (Any other nonrepeating sequence is equally good; but nonrepeating sequences are rare in the space of all possible sequences.)

A combination dial often has a tolerance of 2 in either direction. 20-45-35 will open a lock set to 22-33-44. In this case, the algorithm that tries 0-1-0, 0-2-0, et cetera, ends up being stupid again; a randomized algorithm will (usually) work better. But an algorithm that tries 0-5-0, 0-10-0, 0-10-5, will work better still.

Sometimes it is too expensive to take advantage of all the knowledge that we could, in theory, acquire from previous tests. Moreover, a complete enumeration or interval-skipping algorithm would still end up being stupid. In this case, computer scientists often use a cheap pseudo-random algorithm, because the computational cost of using our knowledge exceeds the benefit to be gained from using it. This is does not show the power of randomness, but, rather, the predictable stupidity of certain specific deterministic algorithms on that particular problem. Remember, the pseudo-random algorithm is also deterministic! But the deterministic pseudo-random algorithm doesn't belong to the class of algorithms that are predictably stupid (do much worse than average).

[edit] Noise and overfitting

There are other possible reasons why a noisy AI algorithm might work better than the noiseless version. There is always (I assert) some reason why the noiseless algorithm is being stupid (worse-than-random), somewhere or other; but the reason can get rather technical. For example, there are neural network training algorithms that work better if you simulate noise in the neurons. On this occasion it is especially tempting to say something like, "Lo! When we make our artificial neurons noisy, just like biological neurons, they work better! Behold the healing life-force of entropy!" What might actually be happening - for example - is that the network training algorithm, operating on noiseless neurons, would vastly overfit the data. If you expose the noiseless network to the series of coinflips "HTTTHHTTH"... the training algorithm will say the equivalent of, "I bet this coin was specially designed to produce HTTTHHTTH every time it's flipped!" instead of "This coin probably alternates randomly between heads and tails." A hypothesis overfitted to the data does not generalize. On the other hand, when we add noise to the neurons and then try training them again, they can no longer fit the data precisely, so instead they settle into a simpler hypothesis like "This coin alternates randomly between heads and tails."

To describe what was going on inside the combination lock, we needed concepts like expected utility, conditional probability, and learning from evidence. To describe what goes on inside the far more complex neural network, we would need far more sophisticated concepts, like prior probability, Kolmogorov complexity, Solomonoff induction, Vapnik-Chervonenkis dimension, and computational learning theory. But the general idea is still that the noiseless version of the network training algorithm is stupid on a certain stage of its operation - it overfits the data - and the noisy version substitutes ignorance-better-than-stupidity on that stage of the algorithm.

But the noisy network is not optimal. If we see a coin produce HTTTHHTTH we should not suspect that it is set to always produce HTTTHHTTH; but it is quite a different matter if we see the coin produce HTTTHHTTH on the first set of nine trials, HTTTHHTTH again on the second set, HTTTHHTTH again on the third set, and so on. The noisy neural network may never learn such a hypothesis.

There are other ways to avoid overfitting data - techniques deliberately constructed around principled notions such as prior probability. These methods do not blur the sensory data or add noise to the computing elements. These principled methods can learn precise hypotheses, but demand extra evidence to justify the extra complexity relative to vague hypotheses. These principled methods can take complete advantage of all the information they have, and produce better results thereby; just as, on the lockpicking problem, enumerating a non-repeating sequence of combinations takes full advantage of all information gained, and therefore works better than random tries that may repeat themselves.

[edit] Noise and hill-climbing

What about hill-climbing, simulated annealing, or genetic algorithms? These AI algorithms are local search techniques that randomly investigate some of their nearest neighbors. If an investigated neighbor is superior to the current position, the algorithm jumps there. (Or sometimes probabilistically jumps to a neighbor with probability determined by the difference between neighbor goodness and current goodness.) Are these techniques drawing on the power of noise?

Local search algorithms take advantage of the regularity of the search space - that if you find a good point in the search space, its neighborhood of closely similar points is a likely place to search for a slightly better neighbor. And then this neighbor, in turn, is a likely place to search for a still better neighbor; and so on. To the extent this regularity of the search space breaks down, hill-climbing algorithms will perform poorly. If the neighbors of a good point are no more likely to be good than randomly selected points, then a hill-climbing algorithm simply won't work. We might as well search random points, rather than following a path of increasing fitness through the search space. (An excellent introductory work on this subject is Artificial Intelligence: A Modern Approach by Russell and Norvig.)

Doesn't a local search algorithm need to make random changes to the current point in order to generate neighbors for evaluation? Not necessarily; some local search algorithms systematically generate all possible neighbors, and select the best one. These greedy algorithms work fine for some problems, but on other problems it has been found that greedy local algorithms get stuck in local minima. The next step up from greedy local algorithms, in terms of added randomness, is random-restart hill-climbing - as soon as we find a local maximum, we restart someplace random, and repeat this process a number of times. For our final solution, we return the best local maximum found when time runs out. Random-restart hill-climbing is surprisingly useful; it can easily solve some problem classes where any individual starting point is unlikely to lead to a global maximum or acceptable solution, but it is likely that at least one of a thousand individual starting points will lead to the global maximum or acceptable solution.

The non-randomly-restarting, greedy, local-maximum-grabbing algorithm, is "stupid" at the stage where it gets stuck in a local maximum. Once you find a local maximum, you know you're not going to do better by greedy local search - so you may as well try something else with your time. Picking a random point and starting again is drastic, but it's not as stupid as searching the neighbors of a particular local maximum over and over again. (Evolution may do this, and often does get stuck in local optima. Evolution, being unintelligent, has no mind to "notice" when it is testing the same genomes over and over.)

Even more stupid is picking a particular starting point, and then evaluating its fitness over and over again, without even searching its neighbors. This is the lockpicker who goes on trying 0-0-0-0 forever. (This is what evolution would be like without any mutations. But since most mutations are detrimental, evolution favors mechanisms that reduce the number of mutations. That this path might ultimately lead to static genomes is not something evolution would "consider".)

Hill-climbing search is not so much a little bit randomized compared to the completely stupid lockpicker, as almost entirely nonrandomized compared to a completely ignorant searcher. We search only the local neighborhood, rather than selecting a random point from the entire state space. That probability distribution has been narrowed enormously, relative to the overall state space. This exploits the knowledge we gained by finding a good point that was likely to also have good neighbors.

You can imagine splitting a hill-climbing algorithm into components that are "deterministic" (or rather, knowledge-exploiting) and "randomized" (the leftover ignorance). A programmer writing a probabilistic hill-climber will use some formula to assign probabilities to each neighbor, as a function of the neighbor's fitness. For example, a neighbor with a fitness of 60 might have probability 80% of being selected, while other neighbors with fitnesses of 55, 52, and 40 might have selection probabilities of 10%, 9%, and 1%. The programmer writes a deterministic algorithm, a fixed formula, that produces these numbers - 80, 10, 9, and 1. What about the actual job of making a random selection at these probabilities? Usually the programmer will hand that job off to someone else's pseudo-random algorithm - almost any programming language's standard libraries will contain a standard pseudo-random algorithm; there's no need to write your own. If the hill-climber doesn't seem to work well, the programmer tweaks the deterministic part of the algorithm, the part that assigns these fixed numbers 80, 10, 9, and 1. The programmer does not say - "I bet these probabilities are right, but I need a source that's even more random like a thermal noise generator, instead of this merely pseudo-random algorithm that is ultimately deterministic!" The programmer does not go in search of better noise.

It is theoretically possible for a poorly designed "pseudo-random algorithm" to be stupid relative to the search space; for example, it might always jump in the same direction. But the "pseudo-random algorithm" has to be really shoddy for that to happen. You're only likely to get stuck with that problem if you reinvent the wheel instead of using a standard, off-the-shelf solution. A decent pseudo-random algorithm works just as well as a thermal noise source on optimization problems. It is possible (though difficult) for an exceptionally poor noise source to be exceptionally stupid on the problem, but you cannot do exceptionally well by finding a noise source that is exceptionally random. The power comes from the knowledge - the deterministic formula that assigns a fixed probability distribution. It does not reside in the remaining ignorance. If you knew even more, you would do better, not worse.

[edit] Noise and natural selection

What about natural selection? Isn't that the classic algorithm for drawing on the power of randomness?

There is a popular conception that "mutations" are good things, that "mutants" have supernormal abilities - that the strength of evolution lies in its magical power to produce good mutations. I recall particularly a trailer for an X-Men movie which voiced over: "In every human being... there is the genetic code... for mutation..."

Evolutionary Biology is a complex subject; simple statements rarely do it justice. Nonetheless this is not how evolution works. The vast majority of mutations are neutral or deleterious. Very, very few are improvements. And this is what you would expect - the higher the utility, the smaller the region of configuration space with equal or greater utility. Most of the time, a random move will take you away from the center. Most mutations are bad for you. The power of natural selection is not that it produces good mutations, but that good mutations are selectively retained more often than bad mutations. It is nonrandom selection, not random mutation, which carries the power. Random mutation, by itself, would do nothing. But we could just as easily substitute a deterministic pseudo-random algorithm for making mutations (which is exactly what most genetic algorithms do), and natural selection would do just as well as if it were "really ultimately random".

Natural selection is much simpler than human intelligence, and correspondingly less efficient. Natural selection is so simple, in fact, that we can use simple math to describe its characteristics as an optimization process, including its inefficiency. For example, suppose there's a gene which has a 3% fitness advantage relative to its alternatives at that allele; an individual with this gene has, on average, around 3% more children than others. Imagine that a single mutant is born with this advantageous gene. (Remember, evolution isn't going to magically produce a batch of mutants all with the same advantageous mutation; this advantageous mutation was produced by a stray cosmic ray, along with innumerable deleterious or neutral mutations.) There's a certain probability that the advantageous mutation will die out of the population, by sheer bad luck, before it can promote itself to fixation. Superfly gets squashed by an elephant. So, if the first mutant has a 3% fitness advantage, what is the probability that this gene spreads through the whole population, as opposed to dying out? This calculation turns out to be independent of most things you would expect it to depend on, like population size, and the answer turns out to be 6%. The general rule is that if the fitness advantage is s, then the probability of fixation is 2s. So even if you have a mutation that confers a 3% advantage in fitness, which is huge as mutations go, the chance is only 6% that the mutation spreads to the whole population.

Suppose the beneficial mutation does spread. How long does it take to become universal in the gene pool? This calculation does depend on population size. With a fitness advantage of 3%, and a population size of 100,000, the mean time to fixation is 767 generations.

For humans, that would mean an average of sixteen tries and ten thousand years to accumulate a single beneficial mutation.

To get complex machinery, the mutations have to evolve serially - one at a time. If gene B is dependent on gene A, and gene A is only present in 1% of the population, then B isn't an advantage except in the presence of A, which only happens 1% of the time. So the fitness advantage of B goes down by a factor of 100. What this means is that A has to be universal, has to go to fixation in the gene pool, before other genes dependent on it can evolve. Evolution has no foresight. It doesn't look ahead. It doesn't produce good mutations in anticipation of other mutations coming along. Whosoever has the most kids in one generation, their genes are more frequent in the next generation, and that's all there is to it.

Once A and B are both fixed in the gene pool, an improved version of A, A*, which is dependent on B, can also evolve. Now A* and B are mutually dependent on each other. Then C comes along, which depends on A* and B; and B*, which is dependent on A* and C. Eventually you get complex machinery with lots of moving parts that all seem to depend on each other. Nitwit creationists point to the complex machine and say, "How could that happen by chance?" Well, it can't happen by chance, but it can happen by selecting on a sequence of chance mutations. In the battle to get evolution taught in high schools, biologists need to emphasize the counterintuitive creative powers of evolution. But what is also true, and less emphasized, is that it takes millions of years to embroider complex machinery this way, because the sequence of events has to happen serially, one after another.

We can calculate how fast natural selection is, and it's extraordinarily slow. The only reason natural selection can produce patterns as complex as living beings is that, over the course of hundreds of millions of years, you get strong cumulative selection pressures - powerful enough to hit the target of a rabbit genome, in all the space of possibilities.

In contrast, a human engineer - say, a programmer - can sit down at a computer and produce new complex machinery with hundreds of interdependent parts in one afternoon. The human can foresightfully design new parts in anticipation of later designing other new parts; produce coordinated simultaneous changes in interdependent machinery; and learn from experience what kinds of new tweaks are worth trying, rather than waiting for a cosmic ray to produce a good one. By the standards of evolution this is simply magic.

There's a public mystique of evolution, which exists for two reasons.

First, many of the people praising evolution to the stars are on the side of the scientists, but they are not scientists. People with a nontechnical understanding of evolution argue with creationists, perceive that they are very much smarter than the creationists, and think to themselves: "I must understand evolution really well." Meanwhile they have no idea that a quantitative understanding even exists. They understand evolution as a force that improves things, but they don't know how to calculate how much force it is exerting - like the difference between knowing that "things fall down" and being able to calculate a parabola. Human engineers "improve" designs, and evolution "improves" designs, and that puts them on essentially the same level as optimization processes - right? It's the same word, so it must be pretty much the same thing.

Second, the big public battle is over the counterintuitive idea that evolution works at all - not how slowly it works. Professional biology journals carry articles about constrained pathways and speed limits on evolution, but the public debate never gets past the point of arguing over and over again whether evolution works at all.

The human optic cable is installed backward; it comes out of the front of the retina and goes through a hole in the retina to get into the brain - rather than, as a human would have designed the system, simply coming out of the back of the retina to begin with. The retina initially evolved backward, and natural selection never fixed it, because when you've got a lot of interdependent machinery it's hard to change one thing without breaking everything else. A human engineer could do it with a pack of simultaneous changes. Evolution blindly climbs an incremental pathway of mutations.

That sort of biological stupidity is how we know that Earthly life was not created by a superintelligent designer - or if it was a superintelligent designer, it was a superintelligent designer who pretended to be incredibly stupid (by human standards) in exactly the way that evolution ought to be incredibly stupid. This point is not emphasized as heavily, in the public debate, as the idea that evolution could have created life. The result is that rather more friends of science understand that evolution is powerful enough to create life, than understand that evolution is not powerful enough to reroute the human optic cable so that it doesn't go through the retina.

Where do people get the notion that only a chaotic, noisy process can optimize properly? I suspect that it has a great deal to do with all the necessary public hammering-home of the idea that evolution can work at all. Evolution happens to be noisy and chaotic. It is well to remember that evolution is not only noisy and chaotic, but also inefficient, slow, and often jaw-droppingly stupid. (And yes, the noise and chaos have something to do with that.) The miracle of evolution is not how well it works, but that it works at all.

It is amazing that evolution works at all; it's a purely natural optimization process with no brain or intelligence. The story of humankind had to start somewhere, and it had to start somewhere simple. If not for evolution, the universe would contain no complex intelligences to marvel at how stupid evolution is. But evolution is still an extremely primitive optimization process by comparison with, oh, say, a human brain. In some ways biology is still ahead of human engineering, but give us 3.85 billion years to polish our designs and we could do a lot better.

[edit] The mystery of ignorance

In the previous section, I analyzed a few special cases where cognitive power is attributed to randomness, arguing:

That the "non-random" version of the lockpicker, to which the random version is compared, is a special nonrandom algorithm that is exceptionally stupid. That most "non-random" versions will do as well as the random version. That some non-random versions will do exceptionally well because they fully exploit all available knowledge.
That the "noisy neural network" does better because the "non-random" version engages in egregious overfitting, and because we know a priori that the correct answer fits into the smaller hypothesis space learnable by a noisy network.
That the "random mutation" of a hill-climbing algorithm is almost entirely nonrandom, in the sense that it examines only a tiny neighborhood of the entire search space. That it is often possible to replace "random mutation" algorithms with algorithms that do just as well or better by, e.g., examining the entire local neighborhood rather than making a single jump in a random direction. That the power does not come from the "true randomness" of the noise source, in that a pseudo-random algorithm (making the system as a whole purely deterministic) does just as well. That the ability of the system to perform optimization, at all, derives purely from the part of the system that exploits knowledge.
That evolution, a kind of naturally arising hill-climbing algorithm, is more limited than generally appreciated. That the noisiness and chaos give rise to specific calculable disadvantages. That the astonishing thing is not how well evolution worked but that it worked at all.

These four special cases do not constitute a general argument against randomness. I did think it wise to dispose of these special cases first, because they are often brought as examples by the advocates of chaos.

Some of the cases above are dangerously subtle. In mathematics it only requires one mistake to prove an erroneous theorem. Similarly it only requires one misstep to conclude that randomness is the key to optimization - like making one error in calculating the work done by an engine, and concluding that it derives power from waste heat, making it a perpetual motion machine. (There is actually a deep analogy between these two cases.) This is why it is important to appreciate the forthcoming general argument against power-from-randomness - so that, faced with an apparent proof of perpetual motion, you don't say "Wow!" and rush out to build a prototype, but instead go back and check your calculations.

[edit] Does an unpredictable world demand an unpredictable Goal System?

From Robyn Dawes, "Rational Choice in an Uncertain World", p. 259:

"Many psychological experiments were conducted in the late 1950s and early 1960s in which subjects were asked to predict the outcome of an event that had a random component but yet had base-rate predictability - for example, subjects were asked to predict whether the next card the experiment turned over would be red or blue in a context in which 70% of the cards were blue, but in which the sequence of red and blue cards was totally random. In such a situation, the strategy that will yield the highest proportion of success is to predict the more common event. For example, if 70% of the cards are blue, then predicting blue on every trial yields a 70% success rate. What subjects tended to do instead, however, was match probabilities - that is, predict the more probable event with the relative frequency with which it occurred. For example, subjects tended to predict 70% of the time that the blue card would occur and 30% of the time that the red card would occur. Such a strategy yields a 58% success rate, because the subjects are correct 70% of the time when the blue card occurs (which happens with probability .70) and 30% of the time when the red card occurs (which happens with probability .30); .70 * .70 + .30 * .30 = .58. In fact, subjects predict the more frequent event with a slightly higher probability than that with which it occurs, but do not come close to predicting its occurrence 100% of the time, even when they are paid for the accuracy of their predictions."

To this effect Dawes cites (Tversky, A. and Edwards, W. 1966. Information versus reward in binary choice. Journal of Experimental Psychology, 71, 680-683). Subjects who were paid a nickel for each prediction over a thousand trials in which the more frequent event occurred with 70% frequency on a random basis, guessed that event 76% of the time.

Dawes goes on to say, "Despite feedback through a thousand trials, subjects cannot bring themselves to believe that the situation is one in which they cannot predict." Maybe so! But even if subjects think they can make a prediction - if they come up with a hypothesis - they don't have to actually bet on the predicted card in order to test the hypothesis. They would be wiser to say quietly to themselves, "Now if this hypothesis is correct, the next card will be red," and then bet on blue until the hypothesis is confirmed - especially if all their previous hypotheses have failed!

I would not fault the subjects for continuing to invent hypotheses - how could they know the sequence was beyond their ability to predict? - but I would fault them for betting on their guesses when this wasn't necessary to gather information.

I would interpret the result as follows: People fail to realize that, given imperfect information, the optimal betting strategy does not resemble a typical sequence of actual cards. They see a mix of mostly blue cards with some red, and suppose that the optimal betting strategy (given their knowledge) must be a mix of mostly blue cards with some red. It is the old rule of magic that Effects Resemble Causes, formerly called the Law of Similarity, which these days is called "the representativeness heuristic".

A "random" key does not fit a "random" lock. A random code does not solve a random combination on the first try just because they are "both random". Different noise sources will not correlate; different randomnesses are not commensurate. The stock market has an element of randomness - or rather, unpredictability relative to our current knowledge - but you cannot crack the stock market by randomizing your stock-buying pattern. When your knowledge is imperfect, when the world seems to you to have an element of randomness, randomizing your actions doesn't solve the problem. Randomizing your actions takes you further from the target, not closer. In a world already imperfect, throwing away your intelligence just makes things worse.

[edit] Blank maps and blank territories

The great Bayesian theorist E. T. Jaynes observed that if we are ignorant about a phenomenon, this is a fact about our state of mind, not a fact about the phenomenon. Suppose someone tells me (and I trust them) that a certain hat, lying on a table, overlies a coin. I didn't see the coin before the hat went on top of it, so I don't know whether the coin is showing heads or tails. Is this a fact about the coin? No, it is a fact about me. My beliefs exist as patterns of neural firing activity in my brain - they are not part of the coin. When I assign a probability of 50% that the coin is showing heads, and 50% probability that the coin is showing tails, I am describing my state of knowledge about the coin, not describing a property of the coin alone. Perhaps, with sufficient knowledge of physics and sufficient computing power and a fast detailed camera, you could write a program that would observe a coin-flipping machine and predict the outcome in advance. But in practice, a coinflip is "random" because humans can't predict it.

Or recall the chess game in which I must assign a probability distribution over Kasparov's possible moves. The probability distribution I assign to Kasparov is not so much a property of Kasparov, as a property of me. There are many possible systems that would produce well-calibrated probability distributions as predictions of Kasparov's moves, and it is quite possible for these systems to produce different probability distributions. One assigns a probability of 60% to a move, another assigns a probability of 80%, yet both systems can be well-calibrated in the sense that moves for which they say "70 percent" happen around 7 times out of 10.

Jaynes labeled the error of thinking that probabilities are properties of things-in-themselves the Mind Projection Fallacy, which occurs when we mistake cognitive properties for parts of the outside world. Suppose that I'm making a map of a city, and in one corner, corresponding to a part of the city I haven't visited, there's a blank space on the map. That doesn't mean that when I visit that part of the city, I'll find a blank territory. Ignorance exists in the map, not in the territory. There are mysterious questions, but never mysterious answers.

Now how could any AI be powered by our own ignorance about it, when this ignorance is a fact about us, rather than a fact about the AI?

An unknown key does not fit an unknown lock. This is the fundamental reason why noise hath no power.

[edit] Worshipping sacred mysteries

The influence of animal or vegetable life on matter is infinitely beyond the range of any scientific inquiry hitherto entered on. Its power of directing the motions of moving particles, in the demonstrated daily miracle of our human free-will, and in the growth of generation after generation of plants from a single seed, are infinitely different from any possible result of the fortuitous concurrence of atoms... Modern biologists were coming once more to the acceptance of something and that was a vital principle..."

-- Lord Kelvin

That was Lord Kelvin, an accomplished and eminent physicist who today is (rather unjustly) known to the general public for making some poor calls about heavier-than-air flight, vitalism, the future of science, the age of the Earth, and evolution. He also calculated the absolute zero of temperature, gave the first mathematical development of electrical induction as a field effect, codeveloped the kinetic theory of heat with Joule, and helped design and lay the first transatlantic cable, so he wasn't stupid. But intelligence is useless unless you actually use it, and it's clear from the quote above that Kelvin got a tremendous emotional kick of out the mysteriousness of life. Infinitely beyond the range of any scientific inquiry! Not just a little beyond the range of science, mind you, but infinitely beyond! When you get that much satisfaction from mysteriousness, you aren't likely to look kindly upon any darned answers that come along.

It is written in the Twelve Virtues of Rationality:

Curiosity requires both that you be ignorant, and that you desire to relinquish your ignorance. If in your heart you believe you already know, or if in your heart you do not desire to know, there is no purpose to which you can employ your Art. The glory of glorious mystery is to be destroyed, after which it ceases to be mystery. Be wary of those who speak of being open-minded and modestly confess their ignorance. There is a time to confess your ignorance and a time to relinquish your ignorance.

Or as Marcello Herreshoff put it: "The point of having an open mind is to make it up occasionally."

Lord Kelvin made an old, old, old mistake, a very common and natural mistake, one that's been repeated for thousands of years and is still being enthusiastically practiced today. He worshipped the Unknown.

Which is a level confusion; it means you're worshipping a blank spot on your own map as if there were a blank territory somewhere, and somehow this was a good thing. It means you're worshipping your own ignorance.

Why do human beings get such a kick out of ignorance? Perhaps there's an explanation in Evolutionary Psychology, but for our purposes it doesn't really matter - for whatever reason, worship of the unknown is indubitably part of human nature. It took thousands of years for human beings to begin using Science, and it had to be invented. But humans have worshipped their mysteries for as long as the human species has existed. We instinctively attribute holy power to that which we do not understand.

Like stars, in the age before Newtonian mechanics and telescopes and spectrographs.

Like alchemy, in the age before chemistry.

Like the "vital force" that transformed mere matter into living flesh, in the age before biology.

How can you spot a mystical explanation - a "mysterious answer" to a mysterious question? This topic is dealt with in greater length in Technical Explanation; it's surely worth greater length than I can give it here. One point to remember is that a mystical explanation does not feel fake - humans don't instinctively spot this flaw as a problem; it has to be learned as a deliberate skill. That's why the mistake keeps getting made, generation after generation.

One major warning sign is when, even after you hear a supposed explanation, the phenomenon is still a sacred mystery unto you, and still seems to possess the same opaque impenetrability that it possessed at the start. When you're told that "vital force" explains biology, it may feel like an explanation - "vital force makes the muscles move"; that's cause and effect for you - but you don't lose any of the emotional kick you get out of the mysteriousness of biology. It's a sacred force that is said to animate the muscles, so the system-as-a-whole stays firmly in the sacred magisterium. Now the scientists who made this mistake didn't think they were invoking magic. They thought they were being scientific, that "elan vital" was a scientific term and a scientific hypothesis. It wasn't their reverence of the elan vital that gave the game away, so much as their reverence of their own ignorance about the elan vital. A physicist may revere the beauty of electromagnetism, but what he reveres is how elegantly the math fits together, and how wonderfully he can calculate precise advance predictions about novel experiments.

Now if someone says that "chaos" explains cognition...

It will feel like an explanation. It will sell books.

And if the person goes on to marvel at how intrinsically unpredictable chaos is, how unlike lesser mundane sciences is its confusing-ness and mysterious-ness - why then a new sacred mystery is born, and you can get an emotional kick out of revering it.

Marcello Herreshoff said: "Calling something 'magic' is more useful than calling it 'emergent' because 'magic' tells us we don't understand it."

The sacred mystery may even give birth to an AI project. One who wants to believe asks, "Does the evidence permit me to believe? If I say this, can someone come along and definitely prove it isn't so?" And one who wishes to disbelieve asks, "Does the evidence compel me to believe? Can someone else argue so strongly as to force me to believe this, whether I want to or not?" The former case is known as confirmation bias; the latter case, disconfirmation bias. Both are equally mistaken; as is written in the Twelve Virtues, "If you regard evidence as a constraint and seek to free yourself, you sell yourself into the chains of your whims." The same also applies for regarding evidence as a boundary - a jail cell you cannot leave, but within which you can freely move according to your whims. "For you cannot make a true map of a city by sitting in your bedroom with your eyes shut and drawing lines upon paper according to impulse. You must walk through the city and draw lines on paper that correspond what you see. If, seeing the city unclearly, you think that you can shift a line just a little to the right, just a little to the left, according to your caprice, this is just the same mistake."

If you don't know exactly what you're doing, and you also don't know how to describe exactly what you want, then - while you won't actually get good results in the real world - no one can convince you that you won't get what you want. No one can compel you to give up your expectation of good results. A "random" key does not fit a "random" lock; different sources of noise do not correlate. But if you don't know the key and you don't know the lock, then it may be that no one can prove to you that your key won't fit the lock. Let's say you don't really know what intelligence is, but you fervently believe it's explained by "emergence". You don't know exactly how "emergence" produces intelligence, but you're sure it's true. Clearly there's nothing for it, but to build a system that exhibits "emergence", and wait expectantly for intelligence to pop out of it...

No one can prove to you that it won't happen. After all, you don't know what your "emergent" system will do, and you don't really know what intelligence is either (except that it's "emergent", of course), and no one can prove to you that these two ignorances do not match each other.

But a mysterious key does not fit a mysterious lock.

If someone says that an AI system is powered by noise, by randomness, by entropy... well, I've said at some length why that can't be so.

But it will sell books.

It provides a sacred mystery to worship.

And that's why you keep hearing about noise-powered AIs.

[edit] Sneaking up on perfection

"To be human is to make ten thousand errors. No one in this world achieves perfection."

-- Written of humility, eighth of the Twelve Virtues.

"In every art, if you do not seek perfection you will halt before taking your first steps. If perfection is impossible that is no excuse for not trying.

-- Written of perfectionism, ninth of the Twelve Virtues.

[edit] The imperfect world

If you can't use noise to fight mystery - if fire doesn't fight fire - then how do you deal with a not-fully-predictable external world? If you don't worship your ignorance, what do you do with it?

It is not possible to prove strong, non-probabilistic theorems about the external world, because the state of the external world is not fully known. Even if we could perfectly observe every atom, there's a little thing called the "problem of induction". If every swan ever observed has been white, it doesn't mean that tomorrow you won't see a black swan. Just because every physical interaction ever observed has obeyed conservation of momentum, doesn't mean that tomorrow the rules won't change. It's never happened before, but to paraphrase Richard Feynman, you have to go with what your experiments tell you. If tomorrow your experiments start telling you that apples fall up, then that's what you have to believe.

Suppose that a young AI-in-training is asked to help a little old lady across the street. Unknown to the AI, a watching lunatic has decided that if the little old lady reaches the other side of the street, the lunatic will gun down twenty nuns from a rooftop. So the AI helps the little old lady across the street, and as a result, twenty nuns die. In this case the AI's action has a large, unanticipated negative consequence because the AI's knowledge of the universe was incomplete - it was missing a large, important factor.

But what happens when the AI realizes what it has done? Does the AI itself immediately turn evil because one of its actions had an unintended evil consequence? If the worst happens and the AI accidentally kills a human, will this corrupt the AI and turn it to the Dark Side?

What ought to happen in terms of decision theory - when a mind makes a mistake, sees the mistake, and this does not change the mind's goals - is that the AI should update its model of the world and not make the same mistake again. The AI's plans and actions change; not because the AI's preferences over outcomes changed, but because the AI's model of the world has changed, and it now anticipates different outcomes from its actions. In simple terms, it still likes old ladies and nuns, but it now knows to watch out for lunatic rooftop snipers.

When I wrote code, and the code contains a bug, and an assertion fails thereby detecting the bug, I do not think that the correct behavior for the program beyond this point is for it to grind my hard drive until it catches fire.

[edit] Proving Friendliness

In sections 1 and 2 we saw that you can't build an AI by specifying the exact action - the particular chess move, the precise motor output - in advance. Now it seems that it would be impossible to prove any statement about the real-world consequences of the AI's actions. The real world is not knowably knowable. Even if we possessed a model that was, in fact, complete and correct, we could never have absolute confidence in that model. So what could possibly be a "provably Friendly" AI?

You can try to prove a theorem along the lines of: "Providing that the transistors in this computer chip behave the way they're supposed to, the AI that runs on this chip will always try to be Friendly." You're going to prove a statement about the search the AI carries out to find its actions. Metaphorically speaking, you're going to prove that the AI will always, to the best of its knowledge, seek to move little old ladies to the other side of the street and avoid the deaths of nuns. To prove this formally, you would have to precisely define "try to be Friendly": the complete criterion that the AI uses to choose among its actions - including how the AI learns a model of reality from experience, and how the AI identifies the goal-valent aspects of the reality it learns to model.

Once you've formulated this precise definition, you still can't prove an absolute certainty that the AI will be Friendly in the real world, because a series of cosmic rays could still hit all of the transistors at exactly the wrong time to overwrite the entire program with an evil AI. Or Descartes's infinitely powerful deceiving demon could have fooled you into thinking that there was a computer in front of you, when in fact it's a hydrogen bomb. Or the Dark Lords of the Matrix could reach into the computer simulation that is our world, and replace the AI with Cthulhu. What you can prove with mathematical certitude is that if all the transistors in the chip work correctly, the AI "will always try to be Friendly" - after you've given "try to be Friendly" a precise definition in terms of how the AI learns a model of the world, identifies the important things in it, and chooses between actions, these all being events that happen inside the computer chip.

Since human programmers aren't good at writing error-tolerant code, computer chips are constructed (at a tremendous expense in heat dissipation) to be as close to perfect as the engineers can make them. For a computer chip to not make a single error in a day, the millions of component transistors that switch billions of times per second have to perform quintillions of error-free operations in a day. The inside of the computer chip is an environment that is very close to totally knowable, and if you take the normal transistor operations as axioms, you can prove statements about the idealized chip with mathematical certainty.

Computer chips are not actually perfect. The next step up would be to prove - or more likely, ask a maturing AI to prove - that the AI remains Friendly given any possible single bitflip, then any possible two bitflips. A proof for two bitflips would probably drive the real-world probability of corruption to very close to zero, although this probability itself would not have been proven. Eventually one would dispense with such adhockery, and let the AI design its own hardware - choosing for itself the correct balance of high-precision hardware and fault-tolerant software, with the final infinitesimal probability of failure being proven on the assumption that the observed laws of physics continue to hold. The AI could even write error-checking code to protect against classes of non-malicious changes in physics. You can't defend against infinitely powerful deceiving demons; but there are realistic steps you can take to defend yourself against cosmic rays, lunatic snipers, and new discoveries in physics.

[edit] Determinism is mandatory!

In real life, a transistor has a substantially higher probability than one-in-a-million of failing on any given day. After all, someone might spoon ice cream into the computer; lightning might strike the electrical line and fry the chip; the heatsink might fail and melt the chip... that sort of thing happens much more often than once in every three thousand years, the frequency implied by a 0.000001/day failure rate. So if you look at one lone transistor, nothing else, and ask the probability that it will go on functioning correctly through the whole day, the chance of failure is clearly greater than one in a million.

But there are millions of transistors on the chip - perhaps 155 million, for a high-end 2006 processor. Clearly, if each lone transistor has a probability of failure greater than one in a million, the chance of the entire chip working is infinitesimal.

What is the flaw in this reasoning? The probability of failure is not conditionally independent between transistors. Spooning ice cream into the computer will destroy the whole chip - millions of transistors will fail at the same time. If we are told solely that one transistor has failed, we should guess a much higher probability that a neighboring transistor has also failed, since most causes of failure destroy many transistors at once. Conversely, if we are told that one transistor is still working properly, this considerably increases the chance that the neighboring transistor is still working. If event A has a probability of 1/2, and event B has a probability of 1/2, then the joint probability of A&B both occurring can have a probability of 1/4, 1/2, 0, or anything in between. The key is the conditional probability p(B|A), the probability that B occurs given that A occurs - the two events are not necessarily independent. The chance that it rains and that the sidewalk gets wet is not the product of the probability that it rains and the probability that the sidewalk gets wet.

The reason a computer chip can work deterministically is that the conditionally independent component of a transistor's chance of failure is very small - that is, the individual contribution of each extra transistor to the overall chip's chance of failure is infinitesimal. If this were not true, if each additional transistor had any noticeable independent chance of failing, it would be impossible to build a computer chip. You'd be limited to a few dozen transistors at best.

For a Friendly AI to continue in existence, the cumulative probability of catastrophic failure must be bounded over the intended working lifespan of the AI. (The actual intended working lifespan might be, say, a million years; I hope that humanity will not need the original Friendly AI for anything like this length of time. But we would calculate the cumulative bound over a googol clock ticks, just to leave error margin.) If the Friendly AI accidentally slices off a human's arm, but is properly "horrified" in a decision-theoretic sense - retains the same goals, and revises its planning to avoid ever doing it again - this is not a catastrophic failure. An error in self-modification - an error in the AI rewriting its own source code - can be catastrophic; a failure of this type can warp the AI's goals so that the AI now chooses according to the criterion of slicing off as many human arms as possible.

Therefore, for a Friendly AI to rewrite its own source code, the cumulative probability of catastrophic error must be bounded over billions of sequential self-modifications. The billionth version of the source code, designing the billionth-and-first version, must preserve with fidelity the Friendly invariant - the optimization target that describes what the AI is trying to do as efficiently as possible.

Therefore, the independent component in the probability of failure on each self-modification must be effectively zero. That doesn't mean the probability of the entire AI failing somehow-or-other has a real-world value of zero. It means that, whatever this probability of failure is, we think it's pretty much the same after ten billion self-modifications as after one billion self-modifications.

Now, the glorious thing about formal mathematical proof is that a formal proof of ten billion steps is just as reliable as a proof of ten steps. The proof is just as strong as its axioms, no more, no less, no matter how long the proof. This doesn't mean that the conclusion of a formal proof is perfectly reliable. Your axioms could be wrong; you could have overlooked a fundamental mistake. But it is at least theoretically possible for the system to survive ten billion steps, because if you got the axioms right, then the stochastically independent failure probabilities on each step don't add up.

When computer engineers prove a chip valid - a good idea if the chip has 155 million transistors and you can't issue a patch afterward - the engineers use human-guided, machine-verified formal proof. Human beings are not trustworthy to peer over a purported proof of ten billion steps; we have too high a chance of missing an error. And present-day theorem-proving techniques are not smart enough to design and prove an entire computer chip on their own - current algorithms undergo an exponential explosion in the search space. Human mathematicians can prove theorems far more complex than modern theorem-provers can handle, without being defeated by exponential explosion. But human mathematics is informal and unreliable; occasionally someone discovers a flaw in a previously accepted informal proof.

The upshot is that human engineers guide a theorem-prover through the intermediate steps of a proof. The human chooses the next lemma, and a complex theorem-prover generates a formal proof, and a simple verifier checks the steps. That's how modern engineers build reliable machinery with 155 million interdependent parts.

Proving a computer chip correct requires a synergy of human intelligence and computer algorithms, as currently neither suffices on its own. The idea is that a Friendly AI would use a similar combination of abilities when modifying its own code - could both invent large designs without being defeated by exponential explosion, and also verify its steps with extreme reliability. That is one way a Friendly AI might remain knowably, provably Friendly even after carrying out a large number of self-modifications.

And this proof comes with many caveats: The proven guarantee of "Friendliness" would actually specify some invariant internal behavior - the optimization target, the search carried out, the criterion for choosing between actions - and if the programmers screw this up, the "Friendly" AI won't actually be friendly in the real world. Moreover, there would still be the standard problem of induction - maybe the previously undiscovered "sorcery addenda" to the laws of physics state that the program we've written is the exact ritual which materializes Azathoth into our solar system. Which only goes to say that mere mathematical proof would not give us real-world certainty.

But if you can't even prove mathematically that the AI is Friendly, it's practically guaranteed to fail. Mathematical proof does not give us real-world certainty. But if you proved mathematically that the AI was Friendly, then it would be possible to win. You would not automatically fail.

[edit] The foundations of strength

Besides formal mathematical proof, I can think of two other ways to legitimately assign an extreme probability (near 1 or 0) to an event.

If you have an empirical theory that is precise and precisely confirmed, from experiments in the same domain, then you can use formal calculation from this theory to estimate extremely low failure probabilities. This is how chip engineers describe individual transistors, using precise, precisely confirmed, mathematically simple physical theories such as thermodynamics and quantum electrodynamics, which are backed up by a vast number of experiments. The evidence in favor of these theories is so vast, their predictions so precise and precisely confirmed, that when the theory tells us a probability is 10^-64, it might actually be that unlikely. (This is Eric Drexler's calculated failure probability for the diamondoid Rod Logics described in Nanosystems.)

Another strong technique is statistical sampling from a domain that is i.i.d., that is, independent and identically distributed. For example, if a generator produces a million widgets, and every single one of them is blue, you could legitimately estimate a probability on the rough order of 1/1,000,000 that the million-and-first widget will not be blue. Providing that it is exactly the same generator, and also providing that you know that the generator's probability of producing a given widget is conditionally independent for each widget produced. The more certainty you want, the larger the sample size you need. This is how chip engineers estimate the probability of a cosmic ray striking a transistor.

An extra-reliable chip intended for use in a space satellite might have a calculated failure rate of one event every three hundred years, which implies a legitimate, well-calibrated failure rate of something like 10^-26 per transistor switch.

Assigning a probability of 10^-26 to anything requires godlike assurance. It is like making one statement per second every second for a billion years, and having a chance of being wrong even once that is less than your annual chance of being struck by lightning.

Building an actual chip this reliable requires all three strong techniques: quantitative physics to calculate the behavior of individual transistors, formal math to prove relations between transistors, and i.i.d. statistics for cosmic rays.

Here's an example of a technique which is not strong:

Once upon a time, the US Army wanted to use neural networks to automatically detect camouflaged enemy tanks. The researchers trained a neural net on 50 photos of camouflaged tanks in trees, and 50 photos of trees without tanks. Using standard techniques for supervised learning, the researchers trained the neural network to a weighting that correctly loaded the training set - output "yes" for the 50 photos of camouflaged tanks, and output "no" for the 50 photos of forest. This did not ensure, or even imply, that new examples would be classified correctly. The neural network might have "learned" 100 special cases that would not generalize to any new problem. Wisely, the researchers had originally taken 200 photos, 100 photos of tanks and 100 photos of trees. They had used only 50 of each for the training set. The researchers ran the neural network on the remaining 100 photos, and without further training the neural network classified all remaining photos correctly. Success confirmed! The researchers handed the finished work to the Pentagon, which soon handed it back, complaining that in their own tests the neural network did no better than chance at discriminating photos.

It turned out that in the researchers' data set, photos of camouflaged tanks had been taken on cloudy days, while photos of plain forest had been taken on sunny days. The neural network had learned to distinguish cloudy days from sunny days, instead of distinguishing camouflaged tanks from empty forest.

How did they fail?

They had no strong statistical assurance: The sample was not from an independent, identically distributed source. Thus, it was possible, and turned out to be the case, that there was an invalidating change of context between the domain of testing and the domain of use.

The concept of tank is not precise, not the same way that electromagnetic charge is precise. The distinction of "classified correct" vs. "classified incorrect" is qualitative, not quantitative. Even if you have a hypothesis about whether your neural network will classify "correctly" or "incorrectly", and even if your experience so far seems to confirm the hypothesis, it's not a precise hypothesis, and your experience is not precise confirmation. It's not like a physics hypothesis that says "Mercury will be found at such-and-such position in the night sky", which is falsified if Mercury is found to be a single second of arc away from its appointed place. Relative to the neural-network hypothesis, the physical hypothesis makes a much more precise prediction, which is much more easily falsified, and therefore the physical hypothesis is vastly more confirmed by the same number of apparent "successes".

Finally, no attempt was made to formally prove that the neural network had any particular property or accomplished any particular purpose. And of course they couldn't formally prove success; the goal itself was informal. Vagueness exists in the mind, not in reality; whenever you build a specific, real AI, it always has some specific, real behavior and does some specific, real thing. In AI, your ability to get what you want is sharply limited by your ability to want things specific enough that you can create specific dynamics which accomplish them.

I want a lot of vague things for the future of humankind - happiness, freedom, that sort of thing - but whatever I use as the root level of a Friendly AI must be specific enough to prove and simple enough to work. Otherwise I cannot build it.

[edit] Discourse on the futility of guessing

[edit] Targets and evidence

In 1919, Sir Arthur Eddington led expeditions to Brazil and to the island of Principe, aiming to observe solar eclipses and thereby test an experimental prediction of Einstein's novel theory of General Relativity. A journalist asked Einstein what he would do if Eddington's observations failed to match his theory. Einstein famously replied: "Then I would feel sorry for the good Lord. The theory is correct."

It seems like a rather foolhardy statement, grossly defying the rule of science that experiment above all is sovereign. Einstein seems possessed of an arrogance so great that he would refuse to bend his neck and submit to Nature's answer, as scientists must do. Who can know that the theory is correct, in advance of experimental test?

Of course, Einstein did turn out to be right. I try to avoid criticizing people when they are right. If they genuinely deserve criticism, I will not need to wait long for an occasion where they are wrong.

And Einstein may not have been quite so foolhardy as he seemed. To win a lottery at odds of 100,000,000 to 1, you would need enough evidence to single out a single winning lottery ticket from a hundred million possibilities. In Bayesian terms, the prior probability of each of a hundred million tickets winning, is 1/100,000,000. So you need strong evidence favoring that particular ticket over its 99,999,999 fellows - strong enough to overcome the massive prior improbability.

Let's suppose there are some tests you can perform which discriminate, probabilistically, between lottery tickets. For example, you can punch the combination into a little black box that has a 75% probability of beeping if the combination is the winner, and only a 25% chance of beeping if the combination is wrong. Here the likelihood ratio is 3 to 1. Furthermore, the black box tests the ticket independently each time. So if you perform the test twice in a row, and the box beeps twice, the cumulative evidence has a likelihood ratio of 9:1.

With 100,000,000 tickets, to make a calibrated guess that a ticket had even a 1/11th (9%) chance of winning, you would need a cumulative likelihood ratio of 10,000,000 to 1 favoring that ticket. Just seeing that one particular ticket as remarkable - having it even rise to the level of your attention - would require a huge amount of evidence.

Now you might think something like: "Well, if I've got a huge heap of Bayesian evidence favoring one particular lottery ticket, maybe I could leap ahead and guess that it is, by golly, the winning ticket - even if the Bayesian rules say the probability is merely 1/11."

But if you use tests that yield a cumulative likelihood ratio of merely 10,000,000 to 1, and there are 100,000,000 possible tickets, then an average of ten tickets will pass all the tests by sheer chance. Also one ticket will pass all the tests because it really is the correct ticket, but you will not know which of the eleven test-passing tickets is the correct one.

So to narrow it down to one lottery ticket, you need a cumulative likelihood ratio of more than 100,000,000 to 1. Following the convention suggested by E. T. Jaynes, we can say that we need more than 80 decibels of evidence. (10 decibels = 1 order of magnitude.)

If you apply a test that yields 20 decibels of evidence (for example, a black box that always beeps if the ticket is the winner, and has only a 1/100 chance of beeping if the ticket is not the winner) to a pool of 100,000,000 candidates, you're left with 1,000,000 candidates that pass the first filter. If the test works independently each time, and you apply the test again, you're left with 10,000 candidates that were successful twice. That is, you've accumulated 40 decibels of evidence.

To assign more than 50% probability to the correct candidate in a pool of 100,000,000, you need more than 80 decibels of evidence. You cannot expect to find the correct candidate without tests that are this strong, because lesser tests will yield more than one candidate that passes all the tests. Or to look at it another way, if you apply a test that ought to yield only 40 decibels of evidence, and you get only one candidate, then either the test is stronger than you thought, or something else is wrong.

Now suppose that the heavens open and rain evidence upon you - a USB drive falls out of the sky, containing the results from some randomly determined number of lottery ticket tests. And next, imagine that the disk contains just enough tests, just enough evidence, to rightfully justify assigning a certain lottery ticket a 55% probability of winning. That's enough evidence to imply that, if you could buy only one lottery ticket, you'd buy that one. But you'd be honestly unsure whether or not the ticket would win. You wouldn't make down payments on any mansions.

But how likely is it that you'd have exactly this much evidence, and no more? Why exactly 81 decibels? If instead you'd only had 60 decibels, it wouldn't have let you single out the winning ticket for your personal attention - there would be a hundred other plausible alternatives. And with 100 decibels, you'd be nearly certain that you'd found the winning ticket. If the amount of evidence is determined, not by USB drives falling from the sky, but by your own efforts to gather data, then it would be foolish to stop at exactly 81 decibels. Why not continue your efforts until 100 or higher?

If we suppose that you can win at all, it is improbable that you would just barely, just marginally win.

Now how many bits does it take to specify Einstein's equations of General Relativity? Or to put it another way, how many possible equations can you write down that seem no more complicated than the equations of General Relativity? Remember, a lottery with four billion tickets corresponds to a space less than 4 bytes wide. Could you specify General Relativity using 4 ASCII characters or six lowercase letters?

So - at the time of first formulating the hypothesis - Einstein must have already had enough observational evidence to single out the equations of General Relativity for his personal attention, or he couldn't have gotten it right.

Now, how likely is it that Einstein would have exactly enough observational evidence to raise General Relativity to the level of his attention, but only justify assigning it a 55% probability? Not likely! If Einstein had enough observational evidence to single out the correct equations of General Relativity in the first place, then he probably had enough evidence to be damn sure that General Relativity was true. In fact, since the human brain is not a perfectly efficient processor of information, Einstein probably had overwhelmingly more evidence than would, in principle, be required for a perfect Bayesian to assign massive confidence to General Relativity.

"Then I would feel sorry for the good Lord; the theory is correct," doesn't sound nearly as appalling when you look at it from that perspective. And remember that General Relativity was correct, from all the vast space of possibilities.

[edit] Object-level thinking and reflective thinking

But where does this leave the notion of a scientist's humility? Indeed, why even bother to send Eddington out to observe eclipses? Maybe it was, in some abstract theoretical sense, unnecessary. But if Einstein had offered General Relativity and everyone had just accepted it without further proof, then - in my humble opinion - it seems like something would have gone wrong with the scientific process.

I find it useful to distinguish between object-level thinking and reflective thinking. The object level is when you're looking at the outside environment - the orbits of the planets, your telescope observations, what sort of hypothesis might explain the data, how well a hypothesis fits the data. The reflective level is when you think about questions like: "Am I integrating the evidence properly? Am I unconsciously prejudiced? Have I made a fundamental mistake that invalidates everything?" (When you think about these things, you look inside yourself - examine your model of yourself - gaze upon your own mind as though looking into a mirror. Hence the phrase, "reflective".)

What I'm arguing is that, supposing that you did in fact pinpoint the correct answer on the object level, then you probably had enough evidence to justify a very high confidence on the object level. In other words: When you just think about the data, as data, then it seems to have an exceedingly tight fit to the hypothesis. The reflective level is a different matter. Maybe you've made a fundamental mistake, of some kind you haven't even imagined. Maybe you permitted yourself to be influenced by what you wished to believe. Such questions can lead to uncertainty even in the presence of apparently massive evidence.

Therefore you make an advance experimental prediction, one that seems nonobvious and is not made by any other theory. Confused thinking can retrofit a bad hypothesis to data already observed. Without being consciously aware of what you're doing, you tweak the predictions a bit here, a bit here... But it is harder to confusedly end up predicting the results of experiments not yet performed.

A much poorer way to deal with the problem of reflective uncertainty is to say things like: "I know there isn't overwhelming evidence in favor of my theory. But even if we had overwhelming evidence, we could never be absolutely sure. So this is as good as it gets." If you lack even the appearance of massive evidence, you should be very worried. To actually hit the target, you need apparently massive evidence that turns out to be genuinely correct massive evidence. There is some uncertainty on the last step of this process, but if you lack even the appearance of massive evidence, you are guaranteed to fail. It will turn out that all you had was a vague picture that was so vague that no one could prove to you that you were wrong.

In everyday life we work with small hypothesis spaces where you can get things right by sheer chance. She loves me, she loves me not... That's a state space with only two possibilities. With only two possibilities, you can legitimately end up in a situation where, based on a few imprecise observations, you assign 70% probability to the correct answer. That is: despite your low confidence, your "best guess" turns out to be right after all.

But when the state space is large and the target is small, you need massive evidence to get anywhere near the correct answer. As in scientific challenges where you have to invent a new complex hypothesis, for example. In that case, however uncertain you feel on the reflective level, the object level must seem to justify overwhelming confidence if you're going to win at all. And even then, the reflective level will leave a residuum of uncertainty - one which should properly lead you to go on gathering further experimental evidence in new contexts, and putting your hypothesis to the most stringent tests you can manage.

[edit] Proper use of humility

It is widely recognized that good science requires some kind of humility. What sort of humility is more controversial. Consider the creationist who says: "But who can really know whether evolution is correct? It is just a theory. You should be more humble and open-minded." Is this humility? The creationist practices a very selective underconfidence, refusing to integrate massive weights of evidence in favor of a conclusion he finds uncomfortable. I would say that whether you call this "humility" or not, it is the wrong step in the dance.

What about the engineer who humbly designs fail-safe mechanisms into machinery, even though he's damn sure the machinery won't fail? This seems like a good kind of humility to me. Historically, it's not unheard-of for an engineer to be "damn sure" a new machine won't fail, and then it fails anyway.

What about the student who humbly double-checks the answers on his math test? Again I'd categorize that as good humility.

What about a student who says, "Well, no matter how many times I check, I can't ever be certain my test answers are correct" and therefore doesn't check even once? Even if this choice stems from an emotion similar to the emotion felt by the previous student, it is less wise.

You suggest studying harder, and the student replies: "No, it wouldn't work for me; I'm not one of the smart kids like you; nay, one so lowly as myself can hope for no better lot." This is social modesty, not humility. It has to do with regulating status in the tribe, rather than scientific process. If you ask someone to "be more humble", by default they'll associate the words to social modesty - which is an intuitive, everyday, ancestrally relevant concept. Scientific humility is a more recent and rarefied invention, and it is not inherently social. Scientific humility is something you would practice even if you were alone in a spacesuit, light years from Earth with no one watching. Or even if you received an absolute guarantee that no one would ever criticize you again, no matter what you said or thought of yourself. Or even if you were elected Emperor of Earth. You'd still double-check your calculations if you were wise.

The student says: "But I've seen other students double-check their answers and then they still turned out to be wrong. Or what if, by the problem of induction, 2 + 2 = 5 this time around? No matter what I do, I won't be sure of myself." It sounds very profound, and very modest. But it is not coincidence that the student wants to hand in the test quickly, and go home and play video games.

The end of an era in physics does not always announce itself with thunder and trumpets; more often it begins with what seems like a small, small flaw... But because physicists have this arrogant idea that their models should work all the time, not just most of the time, they follow up on small flaws. Usually, the small flaw goes away under closer inspection. Rarely, the flaw widens to the point where it blows up the whole theory. Therefore it is written in the Twelve Virtues: "If you do not seek perfection you will halt before taking your first steps."

But think of the social audacity of trying to be right all the time! I seriously suspect that if Science claimed that evolutionary theory is true most of the time but not all of the time - or if Science conceded that maybe on some days the Earth is flat, but who really knows - then scientists would have better social reputations. It would certainly mean that Science would be viewed as a lot less confrontational, because we wouldn't have to argue with people who say the Earth is flat - there would be room for compromise. When you argue a lot, people look upon you as confrontational. If you repeatedly refuse to compromise, it's even worse. If you consider it as a question of tribal status, then scientists have certainly earned some extra status in exchange for such socially useful tools as medicine and cellphones. But this social status does not justify their insistence that only scientific ideas on evolution be taught in public schools. Priests also have high social status, after all. Scientists are getting above themselves - they won a little status, and now they think they're chiefs of the whole tribe! They ought to be more humble, and compromise a little.

When a mental picture of something is hazy, then its degrees of freedom let it be used as an excuse to justify almost anything. You never have to give up any conclusion you started out wanting. This is so convenient that people are often reluctant to give up vagueness. But reality itself is never vague; only models can be vague.

Few people have mental pictures of "humility" so rigorous as to constrain, to any significant degree, the range of actions for which "humility" can be used an excuse. This doesn't mean we should discard the concept of humility - it means we should be careful using it.

Thus I find it wise to look at the actions recommended by a "humble" line of thinking, and ask: "Does acting this way make you stronger, or weaker?"

Imagine someone who has been much accustomed to vagueness, but without any conception that what they are doing is "vague" or "verbal reasoning". Or perhaps they make the distinction, but they don't think there's anything wrong with vague reasoning about AI, since everyone else does it, or since AI is such a tremendous mystery. Then they read this essay, and become aware, for the first time, of the need for precision; or at any rate they think that others expect them to be precise. They realize they can no longer get away with vague hopes - either in the eyes of the public, or in their own eyes. So, without changing their theories, they decide to claim that they meet the standards of supreme rigor, precise art, and massive evidence. When before they had no such thought, and cheerfully defended their vagueness.

By this act they would certainly become no stronger. They would become weaker, because they could no longer confess their doubts, even to themselves.

The temptation is always to claim the most points with the least effort. The temptation is to carefully integrate all incoming news in a way that lets us change our beliefs, and above all our actions, as little as possible. John Kenneth Galbraith said: "Faced with the choice of changing one's mind and proving that there is no need to do so, almost everyone gets busy on the proof." And the greater the inconvenience of changing one's mind, the more effort people will expend on the proof.

But y'know, if you're gonna do the same thing anyway, there's no point in going to such incredible lengths to rationalize it. Often I have witnessed people encountering new information, apparently accepting it, and then carefully explaining why they are going to do exactly the same thing they planned to do previously, but with a different justification. The point of thinking is to shape our plans; if you're going to keep the same plans, why bother?

Therefore it is written in the Twelve Virtues:

"To be humble is to take specific actions in anticipation of your own errors. To confess your fallibility and then do nothing about it is not humble; it is boasting of your modesty."

So when I encounter new and disturbing information, I tell myself:

Update! Update! React! React! Don't let the news disappear down a black hole!

When you realize the need for rigor, then the way to react to the realization, the reaction that makes you stronger rather than weaker - the next step in the dance - is to reject as unsatisfactory the theories you developed before you had this realization. And then, to go in quest of a higher standard. This requires that you be able to distinguish rigor from nonrigor, or there's no point; you'll just find a new vague theory and halt, satisfied. (In Technical Explanation I had a few words to say about distinguishing rigor from nonrigor.)

By undertaking this quest, you commit yourself not to compromise, not to adapt your aspirations to a lower level, because Nature doesn't care about excuses. You must reject the next bright idea you have, and the next, and the next. Even if you retain a semitechnical theory as a makeshift, you cannot accept it - you cannot declare yourself satisfied, or risk anything precious. You must continue your search until you find a theory that meets the full, high, incredibly difficult standard required for the darn thing to actually work.

But this is extraordinarily inconvenient. You must be willing to interrupt the whole thread of your existence. If you had plans and dreams from before the Awful Realization, you need to let them go.

Real people, in real life, just don't do that sort of thing. No matter how long it takes, no matter how hard they have to search, they'll find an excuse that lets them off the hook.

[edit] The culture of mystery

I have given here a semitechnical call for a technical theory.

I have presented at least one relatively straightforward and concrete explanation of why this is necessary: A self-improving AI has to carry out many, many sequential self-modifications, so the independent component in the failure probability on each step needs to be driven down to effectively zero - a task that requires extremely strong object-level assurance, of the same approximate strength as formal mathematical proof.

There is also a more subtle and general view of the same problem, which is that if you need to hit a tiny target in a huge space, you need excellent aiming information.

A still deeper view is that reality itself is never vague, and so you cannot create a vague AI. Whatever you create will do something specific, whether you know what that is or not. A vague method will not achieve a vague goal; a vague key does not fit a vague lock; you cannot make strong statements about vague hopes.

There is finally the pragmatic truth, as a statement about human nature, that people are lazy. People exert the minimum possible effort they think they can get away with - and usually that effort goes into self-justification, not self-modification. If you do not hold yourself to a ridiculously high standard, you will declare success much too early; long before you really begin to understand the problem, and long before you achieve the deeper understanding that would tell you why a vague understanding cannot succeed.

All this is opposed by a different viewpoint on Artificial General Intelligence, which I shall call the Dark View.

The Dark View states that intelligence is so inherently mysterious that any attempt to really understand it is hopeless, and the only pragmatic course of action is to try and build it anyway.

The Dark View is widespread both among AGI professionals, and in popular discussion.

I recall one conversation I had - with a nontechnical guy, but it's typical of conversations I've had with professionals too - that, in compressed form, went something like this:

Him: Oh, you're working on AI! Are you using neural networks?

Me: I think emphatically not.

Him: But neural networks are so wonderful! They solve problems and we don't have any idea of how they do it!

Me: If you are ignorant of a phenomenon, that is a fact about your state of mind, not a fact about the phenomenon itself. Confusion exists in the mind, not in reality. A blank map does not correspond to a blank territory. There are mysterious questions, never mysterious answers. Therefore our ignorance of how neural networks work, cannot be responsible for making them work better.

Him: Huh?

Me: If you don't know how your AI works, that is not good. It is bad.

Him: Well, intelligence is much too difficult for us to understand, so we need to find some way to build AI without understanding how it works.

Me: Look, even if you could do that, you wouldn't be able to predict any kind of positive outcome from it. For all you knew, the AI would go out and slaughter orphans.

Him: Maybe we'll build Artificial Intelligence by scanning the brain and building a neuron-by-neuron duplicate. Humans are the only systems we know are intelligent.

Me: I don't think you can build a flying machine if the only thing you understand about flight is that somehow birds magically fly. What you need is a concept of aerodynamic lift, so that you can see how something can fly even if it isn't exactly like a bird.

Him: That's too hard. We have to copy something that we know works.

Me: (reflectively) What do people find so unbearably awful about the prospect of having to finally break down and solve the bloody problem? Is it really that horrible?

Him: Wait... you're saying you want to actually understand intelligence?

Me: Yeah.

Him: (aghast) Seriously?

Me: I don't know everything I need to know about intelligence, but I've learned a hell of a lot. Enough to know what happens if I try to build AI while there are still gaps in my understanding.

Him: Understanding the problem is too hard. You'll never do it.

Many times I've reprised the individual parts of this dialogue. This conversation was remarkable only for containing so many at once, in such pure form.

in progress

Category: Friendly AI

Content Navigation

Network

Community

Search

Toolbox

This page was last modified on 24 March 2009, at 22:39.
This page has been accessed 11,515 times.
Privacy policy
About The Transhumanist Wiki
Disclaimers

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%