Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Newcomb's Problem and Regret of Rationality

65 Post author: Eliezer_Yudkowsky 31 January 2008 07:36PM

Followup toSomething to Protect

The following may well be the most controversial dilemma in the history of decision theory:

A superintelligence from another galaxy, whom we shall call Omega, comes to Earth and sets about playing a strange little game.  In this game, Omega selects a human being, sets down two boxes in front of them, and flies away.

Box A is transparent and contains a thousand dollars.
Box B is opaque, and contains either a million dollars, or nothing.

You can take both boxes, or take only box B.

And the twist is that Omega has put a million dollars in box B iff Omega has predicted that you will take only box B.

Omega has been correct on each of 100 observed occasions so far - everyone who took both boxes has found box B empty and received only a thousand dollars; everyone who took only box B has found B containing a million dollars.  (We assume that box A vanishes in a puff of smoke if you take only box B; no one else can take box A afterward.)

Before you make your choice, Omega has flown off and moved on to its next game.  Box B is already empty or already full.

Omega drops two boxes on the ground in front of you and flies off.

Do you take both boxes, or only box B?

And the standard philosophical conversation runs thusly:

One-boxer:  "I take only box B, of course.  I'd rather have a million than a thousand."

Two-boxer:  "Omega has already left.  Either box B is already full or already empty.  If box B is already empty, then taking both boxes nets me $1000, taking only box B nets me $0.  If box B is already full, then taking both boxes nets $1,001,000, taking only box B nets $1,000,000.  In either case I do better by taking both boxes, and worse by leaving a thousand dollars on the table - so I will be rational, and take both boxes."

One-boxer:  "If you're so rational, why ain'cha rich?"

Two-boxer:  "It's not my fault Omega chooses to reward only people with irrational dispositions, but it's already too late for me to do anything about that."

There is a large literature on the topic of Newcomblike problems - especially if you consider the Prisoner's Dilemma as a special case, which it is generally held to be.  "Paradoxes of Rationality and Cooperation" is an edited volume that includes Newcomb's original essay.  For those who read only online material, this PhD thesis summarizes the major standard positions.

I'm not going to go into the whole literature, but the dominant consensus in modern decision theory is that one should two-box, and Omega is just rewarding agents with irrational dispositions.  This dominant view goes by the name of "causal decision theory".

As you know, the primary reason I'm blogging is that I am an incredibly slow writer when I try to work in any other format.  So I'm not going to try to present my own analysis here.  Way too long a story, even by my standards.

But it is agreed even among causal decision theorists that if you have the power to precommit yourself to take one box, in Newcomb's Problem, then you should do so.  If you can precommit yourself before Omega examines you; then you are directly causing box B to be filled.

Now in my field - which, in case you have forgotten, is self-modifying AI - this works out to saying that if you build an AI that two-boxes on Newcomb's Problem, it will self-modify to one-box on Newcomb's Problem, if the AI considers in advance that it might face such a situation.  Agents with free access to their own source code have access to a cheap method of precommitment.

What if you expect that you might, in general, face a Newcomblike problem, without knowing the exact form of the problem?  Then you would have to modify yourself into a sort of agent whose disposition was such that it would generally receive high rewards on Newcomblike problems.

But what does an agent with a disposition generally-well-suited to Newcomblike problems look like?  Can this be formally specified?

Yes, but when I tried to write it up, I realized that I was starting to write a small book.  And it wasn't the most important book I had to write, so I shelved it.  My slow writing speed really is the bane of my existence.  The theory I worked out seems, to me, to have many nice properties besides being well-suited to Newcomblike problems.  It would make a nice PhD thesis, if I could get someone to accept it as my PhD thesis.  But that's pretty much what it would take to make me unshelve the project.  Otherwise I can't justify the time expenditure, not at the speed I currently write books.

I say all this, because there's a common attitude that "Verbal arguments for one-boxing are easy to come by, what's hard is developing a good decision theory that one-boxes" - coherent math which one-boxes on Newcomb's Problem without producing absurd results elsewhere.  So I do understand that, and I did set out to develop such a theory, but my writing speed on big papers is so slow that I can't publish it.  Believe it or not, it's true.

Nonetheless, I would like to present some of my motivations on Newcomb's Problem - the reasons I felt impelled to seek a new theory - because they illustrate my source-attitudes toward rationality.  Even if I can't present the theory that these motivations motivate...

First, foremost, fundamentally, above all else:

Rational agents should WIN.

Don't mistake me, and think that I'm talking about the Hollywood Rationality stereotype that rationalists should be selfish or shortsighted.  If your utility function has a term in it for others, then win their happiness.  If your utility function has a term in it for a million years hence, then win the eon.

But at any rate, WIN.  Don't lose reasonably, WIN.

Now there are defenders of causal decision theory who argue that the two-boxers are doing their best to win, and cannot help it if they have been cursed by a Predictor who favors irrationalists.  I will talk about this defense in a moment.  But first, I want to draw a distinction between causal decision theorists who believe that two-boxers are genuinely doing their best to win; versus someone who thinks that two-boxing is the reasonable or the rational thing to do, but that the reasonable move just happens to predictably lose, in this case.  There are a lot of people out there who think that rationality predictably loses on various problems - that, too, is part of the Hollywood Rationality stereotype, that Kirk is predictably superior to Spock.

Next, let's turn to the charge that Omega favors irrationalists.  I can conceive of a superbeing who rewards only people born with a particular gene, regardless of their choices.  I can conceive of a superbeing who rewards people whose brains inscribe the particular algorithm of "Describe your options in English and choose the last option when ordered alphabetically," but who does not reward anyone who chooses the same option for a different reason.  But Omega rewards people who choose to take only box B, regardless of which algorithm they use to arrive at this decision, and this is why I don't buy the charge that Omega is rewarding the irrational.  Omega doesn't care whether or not you follow some particular ritual of cognition; Omega only cares about your predicted decision.

We can choose whatever reasoning algorithm we like, and will be rewarded or punished only according to that algorithm's choices, with no other dependency - Omega just cares where we go, not how we got there.

It is precisely the notion that Nature does not care about our algorithm, which frees us up to pursue the winning Way - without attachment to any particular ritual of cognition, apart from our belief that it wins.  Every rule is up for grabs, except the rule of winning.

As Miyamoto Musashi said - it's really worth repeating:

"You can win with a long weapon, and yet you can also win with a short weapon.  In short, the Way of the Ichi school is the spirit of winning, whatever the weapon and whatever its size."

(Another example:  It was argued by McGee that we must adopt bounded utility functions or be subject to "Dutch books" over infinite times.  But:  The utility function is not up for grabs.  I love life without limit or upper bound:  There is no finite amount of life lived N where I would prefer a 80.0001% probability of living N years to an 0.0001% chance of living a googolplex years and an 80% chance of living forever.  This is a sufficient condition to imply that my utility function is unbounded.  So I just have to figure out how to optimize for that morality.  You can't tell me, first, that above all I must conform to a particular ritual of cognition, and then that, if I conform to that ritual, I must change my morality to avoid being Dutch-booked.  Toss out the losing ritual; don't change the definition of winning.  That's like deciding to prefer $1000 to $1,000,000 so that Newcomb's Problem doesn't make your preferred ritual of cognition look bad.)

"But," says the causal decision theorist, "to take only one box, you must somehow believe that your choice can affect whether box B is empty or full - and that's unreasonable!  Omega has already left!  It's physically impossible!"

Unreasonable?  I am a rationalist: what do I care about being unreasonable?  I don't have to conform to a particular ritual of cognition.  I don't have to take only box B because I believe my choice affects the box, even though Omega has already left.  I can just... take only box B.

I do have a proposed alternative ritual of cognition which computes this decision, which this margin is too small to contain; but I shouldn't need to show this to you.  The point is not to have an elegant theory of winning - the point is to win; elegance is a side effect.

Or to look at it another way:  Rather than starting with a concept of what is the reasonable decision, and then asking whether "reasonable" agents leave with a lot of money, start by looking at the agents who leave with a lot of money, develop a theory of which agents tend to leave with the most money, and from this theory, try to figure out what is "reasonable".  "Reasonable" may just refer to decisions in conformance with our current ritual of cognition - what else would determine whether something seems "reasonable" or not?

From James Joyce (no relation), Foundations of Causal Decision Theory:

Rachel has a perfectly good answer to the "Why ain't you rich?" question.  "I am not rich," she will say, "because I am not the kind of person the psychologist thinks will refuse the money.  I'm just not like you, Irene.  Given that I know that I am the type who takes the money, and given that the psychologist knows that I am this type, it was reasonable of me to think that the $1,000,000 was not in my account.  The $1,000 was the most I was going to get no matter what I did.  So the only reasonable thing for me to do was to take it."

Irene may want to press the point here by asking, "But don't you wish you were like me, Rachel?  Don't you wish that you were the refusing type?"  There is a tendency to think that Rachel, a committed causal decision theorist, must answer this question in the negative, which seems obviously wrong (given that being like Irene would have made her rich).  This is not the case.  Rachel can and should admit that she does wish she were more like Irene.  "It would have been better for me," she might concede, "had I been the refusing type."  At this point Irene will exclaim, "You've admitted it!  It wasn't so smart to take the money after all."  Unfortunately for Irene, her conclusion does not follow from Rachel's premise.  Rachel will patiently explain that wishing to be a refuser in a Newcomb problem is not inconsistent with thinking that one should take the $1,000 whatever type one is.  When Rachel wishes she was Irene's type she is wishing for Irene's options, not sanctioning her choice.

It is, I would say, a general principle of rationality - indeed, part of how I define rationality - that you never end up envying someone else's mere choices.  You might envy someone their genes, if Omega rewards genes, or if the genes give you a generally happier disposition.  But Rachel, above, envies Irene her choice, and only her choice, irrespective of what algorithm Irene used to make it.  Rachel wishes just that she had a disposition to choose differently.

You shouldn't claim to be more rational than someone and simultaneously envy them their choice - only their choice.  Just do the act you envy.

I keep trying to say that rationality is the winning-Way, but causal decision theorists insist that taking both boxes is what really wins, because you can't possibly do better by leaving $1000 on the table... even though the single-boxers leave the experiment with more money.  Be careful of this sort of argument, any time you find yourself defining the "winner" as someone other than the agent who is currently smiling from on top of a giant heap of utility.

Yes, there are various thought experiments in which some agents start out with an advantage - but if the task is to, say, decide whether to jump off a cliff, you want to be careful not to define cliff-refraining agents as having an unfair prior advantage over cliff-jumping agents, by virtue of their unfair refusal to jump off cliffs.  At this point you have covertly redefined "winning" as conformance to a particular ritual of cognition.  Pay attention to the money!

Or here's another way of looking at it:  Faced with Newcomb's Problem, would you want to look really hard for a reason to believe that it was perfectly reasonable and rational to take only box B; because, if such a line of argument existed, you would take only box B and find it full of money?  Would you spend an extra hour thinking it through, if you were confident that, at the end of the hour, you would be able to convince yourself that box B was the rational choice?  This too is a rather odd position to be in.  Ordinarily, the work of rationality goes into figuring out which choice is the best - not finding a reason to believe that a particular choice is the best.

Maybe it's too easy to say that you "ought to" two-box on Newcomb's Problem, that this is the "reasonable" thing to do, so long as the money isn't actually in front of you.  Maybe you're just numb to philosophical dilemmas, at this point.  What if your daughter had a 90% fatal disease, and box A contained a serum with a 20% chance of curing her, and box B might contain a serum with a 95% chance of curing her?  What if there was an asteroid rushing toward Earth, and box A contained an asteroid deflector that worked 10% of the time, and box B might contain an asteroid deflector that worked 100% of the time?

Would you, at that point, find yourself tempted to make an unreasonable choice?

If the stake in box B was something you could not leave behind?  Something overwhelmingly more important to you than being reasonable?  If you absolutely had to win - really win, not just be defined as winning?

Would you wish with all your power that the "reasonable" decision was to take only box B?

Then maybe it's time to update your definition of reasonableness.

Alleged rationalists should not find themselves envying the mere decisions of alleged nonrationalists, because your decision can be whatever you like.  When you find yourself in a position like this, you shouldn't chide the other person for failing to conform to your concepts of reasonableness.  You should realize you got the Way wrong.

So, too, if you ever find yourself keeping separate track of the "reasonable" belief, versus the belief that seems likely to be actually true.  Either you have misunderstood reasonableness, or your second intuition is just wrong.

Now one can't simultaneously define "rationality" as the winning Way, and define "rationality" as Bayesian probability theory and decision theory.  But it is the argument that I am putting forth, and the moral of my advice to Trust In Bayes, that the laws governing winning have indeed proven to be math.  If it ever turns out that Bayes fails - receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions - then Bayes has to go out the window.  "Rationality" is just the label I use for my beliefs about the winning Way - the Way of the agent smiling from on top of the giant heap of utility.  Currently, that label refers to Bayescraft.

I realize that this is not a knockdown criticism of causal decision theory - that would take the actual book and/or PhD thesis - but I hope it illustrates some of my underlying attitude toward this notion of "rationality".

You shouldn't find yourself distinguishing the winning choice from the reasonable choice.  Nor should you find yourself distinguishing the reasonable belief from the belief that is most likely to be true.

That is why I use the word "rational" to denote my beliefs about accuracy and winning - not to denote verbal reasoning, or strategies which yield certain success, or that which is logically provable, or that which is publicly demonstrable, or that which is reasonable.

As Miyamoto Musashi said:

"The primary thing when you take a sword in your hands is your intention to cut the enemy, whatever the means. Whenever you parry, hit, spring, strike or touch the enemy's cutting sword, you must cut the enemy in the same movement. It is essential to attain this. If you think only of hitting, springing, striking or touching the enemy, you will not be able actually to cut him."

Comments (592)

Sort By: Old
Comment author: Nick_Tarleton 31 January 2008 08:28:53PM 12 points [-]

Either box B is already full or already empty.

I'm not going to go into the whole literature, but the dominant consensus in modern decision theory is that one should two-box, and Omega is just rewarding agents with irrational dispositions. This dominant view goes by the name of "causal decision theory".

I suppose causal decision theory assumes causality only works in one temporal direction. Confronted with a predictor that was right 100 out of 100 times, I would think it very likely that backward-in-time causation exists, and take only B. I assume this would, as you say, produce absurd results elsewhere.

Comment author: diegocaleiro 22 March 2010 07:28:42PM 28 points [-]

Decisions aren't physical.

The above statement is at least hard to defend. Your decisions are physical and occur inside of you... So these two-boxers are using the wrong model amongst these two (see the drawings....) http://lesswrong.com/lw/r0/thou_art_physics/

If you are a part of physics, so is your decision, so it must account for the correlation between your thought processes and the superintelligence. Once it accounts for that, you decide to one box, because you understood the entanglement of the computation done by omega and the physical process going inside your skull.

If the entanglement is there, you are not looking at it from the outside, you are inside the process.

Our minds have this quirk that makes us think there are two moments, you decide, and then you cheat, you get to decide again. But if you are only allowed to decide once, which is the case, you are rational by one-boxing.

Comment author: SeventhNadir 12 August 2010 09:30:16PM 0 points [-]

From what I understand, to be a "Rational Agent" in game theory means someone who maximises their utility function (and not the one you ascribe to them). To say Omega is rewarding irrational agents isn't necessarily fair, since payoffs aren't always about the money. Lottery tickets are a good example this.

What if my utility function says the worst outcome is living the rest of my life with regrets that I didn't one box? Then I can one box and still be a completely rational agent.

Comment author: JoshuaZ 12 August 2010 09:36:35PM 9 points [-]

You're complicating the problem too much by bringing in issues like regret. Assume for sake of argument that Newcomb's problem is to maximize the amount of money you receive. Don't think about extraneous utility issues.

Comment author: SeventhNadir 12 August 2010 09:56:12PM 2 points [-]

Fair point. There are too many hidden variables already without me explicitly adding more. If Newcomb's problem is to maximise money recieved (with no regard for what it seen as reasonable), the "Why ain't you rich argument seems like a fairly compelling one doesn't it? Winning the money is all that matters.

I just realised that all I've really done is paraphrase the original post. Curse you source monitoring error!

Comment author: Nornagest 19 November 2010 01:32:15AM *  3 points [-]

Lottery tickets exploit a completely different failure of rationality, that being our difficulties with small probabilities and big numbers, and our problems dealing with scale more generally. (ETA: The fantasies commonly cited in the context of lotteries' "true value" are a symptom of this failure.) It's not hard to come up with a game-theoretic agent that maximizes its payoffs against that kind of math. Second-guessing other agents' models is considerably harder.

I haven't given much thought to this particular problem for a while, but my impression is that Newcomb exposes an exploit in simpler decision theories that's related to that kind of recursive modeling: naively, if you trust Omega's judgment of your psychology, you pick the one-box option, and if you don't, you pick up both boxes. Omega's track record gives us an excellent reason to trust its judgment from a probabilistic perspective, but it's trickier to come up with an algorithm that stabilizes on that solution without immediately trying to outdo itself.

Comment author: PeterisP 24 October 2010 12:27:34PM 6 points [-]

Well, I fail to see any need for backward-in-time causation to get the prediction right 100 out of 100 times.

As far as I understand, similar experiments have been performed in practice and homo sapiens are quite split in two groups 'one-boxers' and 'two-boxers' who generally have strong preferences towards one or other due to whatever differences in their education, logic experience, genetics, reasoning style or whatever factors that are somewhat stable specific to that individual.

Having perfect predictive power (or even the possibility of it existing) is implied and suggested, but it's not really given, it's not really necessary, and IMHO it's not possible and not useful to use this 'perfect predictive power' in any reasoning here.

From the given data in the situation (100 out of 100 that you saw), you know that Omega is a super-intelligent sorter who somehow manages to achieve 99.5% or better accuracy in sorting people into one-boxers and two-boxers.

This accuracy seems also higher than the accuracy of most (all?) people in self-evaluation, i.e., as in many other decision scenarios, there is a significant difference in what people believe they would decide in situation X, and what they actually decide if it happens. [citation might be needed, but I don't have one at the moment, I do recall reading papers about such experiments]. The 'everybody is a perfect logician/rationalist and behaves as such' assumption often doesn't hold up in real life even for self-described perfect rationalists who make strong conscious effort to do so.

In effect, data suggests that probably Omega knows your traits and decision chances (taking into account you taking into account all this) better than you do - it's simply smarter than homo sapiens. Assuming that this is really so, it's better for you to choose option B. Assuming that this is not so, and you believe that you can out-analyze Omega's perception of yourself, then you should choose the opposite of whatever Omega would think of you (gaining 1.000.000 instead of 1.000 or 1.001.000 instead of 1.000.000). If you don't know what Omega knows about you - then you don't get this bonus.

Comment author: Eliezer_Yudkowsky 31 January 2008 08:36:42PM 20 points [-]

People seem to have pretty strong opinions about Newcomb's Problem. I don't have any trouble believing that a superintelligence could scan you and predict your reaction with 99.5% accuracy.

I mean, a superintelligence would have no trouble at all predicting that I would one-box... even if I hadn't encountered the problem before, I suspect.

Comment author: simon2 31 January 2008 08:41:04PM 0 points [-]

If you won't explicitly state your analysis, maybe we can try 20 questions?

I have suspected that supposed "paradoxes" of evidential decision theory occur because not all the evidence was considered. For example, the fact that you are using evidential decision theory to make the decision.

Agree/disagree?

Comment author: simon2 31 January 2008 08:56:14PM 0 points [-]

Hmm, changed my mind, should have thought more before writing... the EDT virus has early symptoms of causing people to use EDT before progressing to terrible illness and death. It seems EDT would then recommend not using EDT.

Comment author: Mike4 31 January 2008 09:13:03PM 4 points [-]

I one-box, without a moment's thought.

The "rationalist" says "Omega has already left. How could you think that your decision now affects what's in the box? You're basing your decision on the illusion that you have free will, when in fact you have no such thing."

To which I respond "How does that make this different from any other decision I'll make today?"

Comment author: Ian_C. 31 January 2008 10:33:38PM 13 points [-]

I think the two box person is confused about what it is to be rational, it does not mean "make a fancy argument," it means start with the facts, abstract from them, and reason about your abstractions.

In this case if you start with the facts you see that 100% of people who take only box B win big, so rationally, you do the same. Why would anyone be surprised that reason divorced from facts gives the wrong answer?

Comment author: Psychohistorian3 31 January 2008 11:08:06PM 22 points [-]

This dilemma seems like it can be reduced to: 1. If you take both boxes, you will get $1000 2. If you only take box B, you will get $1M Which is a rather easy decision.

There's a seemingly-impossible but vital premise, namely, that your action was already known before you acted. Even if this is completely impossible, it's a premise, so there's no point arguing it.

Another way of thinking of it is that, when someone says, "The boxes are already there, so your decision cannot affect what's in them," he is wrong. It has been assumed that your decision does affect what's in them, so the fact that you cannot imagine how that is possible is wholly irrelevant.

In short, I don't understand how this is controversial when the decider has all the information that was provided.

Comment author: Nominull3 31 January 2008 11:56:13PM 0 points [-]

I'd love to say I'd find some way of picking randomly just to piss Omega off, but I'd probably just one-box it. A million bucks is a lot of money.

Comment author: xrchz 22 November 2009 09:17:03AM 1 point [-]

Would that make you a supersuperintelligence? Since I presume by "picking randomly" you mean randomly to Omega, in other words Omega cannot find and process enough information to predict you well.

Otherwise what does "picking randomly" mean?

Comment author: whpearson 22 November 2009 10:57:05AM 4 points [-]

The definition of omega as something that can predict your actions leads it to have some weird powers. You could pick a box based on the outcome of a quantum event with a 50% chance, then omega would have to vanish in a puff of physical implausibility.

Comment author: xrchz 26 November 2009 06:54:36AM 0 points [-]

What's wrong with Omega predicting a "quantum event"? "50% chance" is not an objective statement, and it may well be that Omega can predict quantum events. (If not, can you explain why not, or refer me to an explanation?)

Comment author: whpearson 26 November 2009 10:22:17AM 2 points [-]

From wikipedia

"In the formalism of quantum mechanics, the state of a system at a given time is described by a complex wave function (sometimes referred to as orbitals in the case of atomic electrons), and more generally, elements of a complex vector space.[9] This abstract mathematical object allows for the calculation of probabilities of outcomes of concrete experiments."

This is the best formalism we have for predicting things at this scale and it only spits out probabilities. I would be surprised if something did a lot better!

Comment author: xrchz 27 November 2009 10:35:55AM 0 points [-]

As I understand it, probabilities are observed because there are observers in two different amplitude blobs of configuration space (to use the language of the quantum physics sequence) but "the one we are in" appears to be random to us. And mathematically I think quantum mechanics is the same under this view in which there is no "inherent, physical" randomness (so it would still be the best formalism we have for predicting things).

Could you say what "physical randomness" could be if we don't allow reference to quantum mechanics? (i.e. is that the only example? and more to the point, does the notion make any sense?)

Comment author: whpearson 27 November 2009 10:58:20AM 0 points [-]

You seem to have transitioned to another argument here... please clarify what this has to do with omega and its ability to predict your actions.

Comment author: xrchz 28 November 2009 11:03:40PM 0 points [-]

The new argument is about whether there might be inherently unpredictable things. If not, then your picking a box based on the outcome of a "quantum event" shouldn't make Omega any less physically plausible,

Comment author: whpearson 29 November 2009 12:26:49AM *  7 points [-]

What I didn't understand is why you removed quantum experiments from the discussion. I believe it is very plausible to have something that is physically unpredictable, as long as the thing doing the predicting is bound by the same laws as what you are trying to predict.

Consider a world made of reversible binary gates with the same number of inputs as outputs (that is every input has a unique output, and vice versa).

We want to predict one complex gate. Not a problem, just clone all the inputs and copy the gate. However you have to do that only using reversible binary gates. Lets start with cloning the bits.

In is what you are trying to copy without modifying so that you can predict what affect it will have on the rest of the system. You need a minimum of two outputs, so you need another input B.

You get to create the gate in order to copy the bit and predict the system. The ideal truth table looks something like

In | B | Out | Copy

0 | 0 | 0 | 0

0 | 1 | 0 | 0

1 | 0 | 1 | 1

1 | 1 | 1 | 1

This violates our reversibility assumption. The best copier we could make is

In | B | Out | Copy

0 | 0 | 0 | 0

0 | 1 | 1 | 0

1 | 0 | 0 | 1

1 | 1 | 1 | 1

This copies precisely, but mucks up the output making our copy useless for prediction. If you could control B, or knew the value of B then we could correct the Output. But as I have shown here finding out the value of a bit is non-trivial. The best we could do would be to find sources of bits with statistically predictable properties then use them for duplicating other bits.

The world is expected to be reversible, and the no cloning theorem applies to reality which I think is stricter than my example. However I hope I have shown how a simple lawful universe can be hard to predict by something inside it.

In short, stop thinking of yourself (and Omega) as an observer outside physics that does not interact with the world. Copying is disturbing.

Comment author: rhollerith_dot_com 29 November 2009 01:50:37AM *  4 points [-]

I believe it is very plausible to have something that is physically unpredictable, as long as the thing doing the predicting is bound by the same laws as what you are trying to predict.

[attempted proof omitted]

I hope I have shown how a simple lawful universe can be hard to predict by something inside it.

In short, stop thinking of yourself (and Omega) as an observer outside physics that does not interact with the world. Copying is disturbing.

Even though I do not have time to reflect on the attempted proof and even though the attempted proof is best described as a stab at a sketch of a proof and even though this "reversible logic gates" approach to a proof probably cannot be turned into an actual proof and even though Nick Tarleton just explained why the "one box or two box depending on an inherently unpredictable event" strategy is not particularly relevant to Newcomb's, I voted this up and I congratulate the author (whpearson) because it is an attempt at an original proof of something very cool (namely, limits to an agent's ability to learn about its environment) and IMHO probably relevant to the Friendliness project. More proofs and informed stabs at proofs, please!

Comment author: Normal_Anomaly 23 November 2010 01:03:29PM *  4 points [-]

I suspect Omega would know you were going to do that, and would be able to put the box in a superposition dependent on the same quantum event, so that in the branches where you 1-box, box B contains $1million, and where you 2-box it's empty.

Comment author: Nick_Tarleton 29 November 2009 12:37:18AM 9 points [-]

It's often stipulated that if Omega predicts you'll use some randomizer it can't predict, it'll punish you by acting as if it predicted two-boxing.

Comment author: wedrifid 29 November 2009 02:38:26AM *  2 points [-]

(And the most favourable plausible outcome for randomizing would be scaling the payoff appropriately to the probability assigned.)

Comment author: PeterisP 24 October 2010 12:35:15PM 4 points [-]

Newcomb's problem doesn't specify how Omega chooses the 'customers'. It's a quite realistic possibility that it simply has not offered the choice to anyone that would use a randomizer, and cherrypicked only the people which have at least 99.9% 'prediction strength'.

Comment author: HalFinney 01 February 2008 12:05:09AM 2 points [-]

It's a great puzzle. I guess this thread will degenerate into arguments pro and con. I used to think I'd take one box, but I read Joyce's book and that changed my mind.

For the take-one-boxers:

Do you believe, as you sit there with the two boxes in front of you, that their contents are fixed? That there is a "fact of the matter" as to whether box B is empty or not? Or is box B in a sort of intermediate state, halfway between empty and full? If so, do you generally consider that things momentarily out of sight may literally change their physical states into something indeterminate?

If you reject that kind of indeterminacy, what do you imagine happening, if you vacillate and consider taking both boxes? Do you picture box B literally becoming empty and full as you change your opinion back and forth?

If not, if you think box B is definitely either full or empty and there is no unusual physical state describing the contents of that box, then would you agree that nothing you do now can change the contents of the box? And if so, then taking the additional box cannot reduce what you get in box B.

Comment author: AndyCossyleon 04 November 2010 09:19:02PM *  8 points [-]

Na-na-na-na-na-na, I am so sorry you only got $1000!

Me, I'm gonna replace my macbook pro, buy an apartment and a car and take a two week vacation in the Bahamas, and put the rest in savings!

Suckah! -- Point: arguments don't matter, winning does.

Comment author: wedrifid 04 November 2010 09:39:30PM 8 points [-]

Oops. I had replied to this until I saw its parent was nearly 3 years old. So as I don't (quite) waste the typing:

Do you believe, as you sit there with the two boxes in front of you, that their contents are fixed?

Yes.

That there is a "fact of the matter" as to whether box B is empty or not?

Yes.

Or is box B in a sort of intermediate state, halfway between empty and full?

No.

If so, do you generally consider that things momentarily out of sight may literally change their physical states into something indeterminate?

No.

Do you picture box B literally becoming empty and full as you change your opinion back and forth?

If not, if you think box B is definitely either full or empty and there is no unusual physical state describing the contents of that box, then would you agree that nothing you do now can change the contents of the box?

Yes.

And if so, then taking the additional box cannot reduce what you get in box B.

No, it can't. (But it already did.)

If I take both boxes how much money do I get? $1,000

If I take one box how much money do I get? $10,000,000 (or whatever it was instantiated to.)

It seems that my questions were more useful than yours. Perhaps Joyce beffudled you? It could be that he missed something. (Apart from counter-factual $9,999,000.)

I responded to all your questions with the answers you intended to make the point that I don't believe those responses are at all incompatible with making the decision that earns you lots and lots of money.

Comment author: Tom_McCabe 01 February 2008 12:42:47AM 6 points [-]

To quote E.T. Jaynes:

"This example shows also that the major premise, “If A then B” expresses B only as a logical consequence of A; and not necessarily a causal physical consequence, which could be effective only at a later time. The rain at 10 AM is not the physical cause of the clouds at 9:45 AM. Nevertheless, the proper logical connection is not in the uncertain causal direction (clouds =⇒ rain), but rather (rain =⇒ clouds) which is certain, although noncausal. We emphasize at the outset that we are concerned here with logical connections, because some discussions and applications of inference have fallen into serious error through failure to see the distinction between logical implication and physical causation. The distinction is analyzed in some depth by H. A. Simon and N. Rescher (1966), who note that all attempts to interpret implication as expressing physical causation founder on the lack of contraposition expressed by the second syllogism (1–2). That is, if we tried to interpret the major premise as “A is the physical cause of B,” then we would hardly be able to accept that “not-B is the physical cause of not-A.” In Chapter 3 we shall see that attempts to interpret plausible inferences in terms of physical causation fare no better."

Comment author: Andrew_Clough2 01 February 2008 02:44:08AM 2 points [-]

@: Hal Finney:

Certainly the box is either full or empty. But the only way to get the money in the hidden box is to precommit to taking only that one box. Not pretend to precommit, really precommit. If you try to take the $1,000, well then I guess you really hadn't precommitted after all. I might vascillate, I might even be unable to make such a rigid precommitment with myself (though I suspect I am), but it seems hard to argue that taking only one box is not the correct choice.

I'm not entirely certain that acting rationally in this situation doesn't require an element of doublethink, but thats a topic for another post.

Comment author: Laura 01 February 2008 02:47:36AM 3 points [-]

I would be interested in know if your opinion would change if the "predictions" of the super-being were wrong .5% of the time, and some small number of people ended up with the $1,001,000 and some ended up with nothing. Would you still 1 box it?

Comment author: RobinHanson 01 February 2008 03:07:32AM 9 points [-]

I suppose I might still be missing something, but this still seems to me just a simple example of time inconsistency, where you'd like to commit ahead of time to something that later you'd like to violate if you could. You want to commit to taking the one box, but you also want to take the two boxes later if you could. A more familiar example is that we'd like to commit ahead of time to spending effort to punish people who hurt us, but after they hurt us we'd rather avoid spending that effort as the harm is already done.

Comment author: Caledonian2 01 February 2008 03:30:48AM 2 points [-]

If I know that the situation has resolved itself in a manner consistent with the hypothesis that Omega has successfully predicted people's actions many times over, I have a high expectation that it will do so again.

In that case, what I will find in the boxes is not independent of my choice, but dependent on it. By choosing to take two boxes, I cause there to be only $1,000 there. By choosing to take only one, I cause there to be $1,000,000. I can create either condition by choosing one way or another. If I can select between the possibilities, I prefer the one with the million dollars.

Since induction applied to the known facts suggests that I can effectively determine the outcome by making a decision, I will select the outcome that I prefer, and choose to take only box B.

Why exactly is that irrational, again?

Comment author: Paul_Gowder 01 February 2008 03:52:44AM 7 points [-]

I don't know the literature around Newcomb's problem very well, so excuse me if this is stupid. BUT: why not just reason as follows:

1. If the superintelligence can predict your action, one of the following two things must be the case:

a) the state of affairs whether you pick the box or not is already absolutely determined (i.e. we live in a fatalistic universe, at least with respect to your box-picking)

b) your box picking is not determined, but it has backwards causal force, i.e. something is moving backwards through time.

If a), then practical reason is meaningless anyway: you'll do what you'll do, so stop stressing about it.

If b), then you should be a one-boxer for perfectly ordinary rational reasons, namely that it brings it about that you get a million bucks with probability 1.

So there's no problem!

Comment author: Cyan2 01 February 2008 03:53:57AM 0 points [-]

Laura,

Once we can model the probabilities of the various outcomes in a noncontroversial fashion, the specific choice to make depends on the utility of the various outcomes. $1,001,000 might be only marginally better than $1,000,000 -- or that extra $1,000 could have some significant extra utility.

Comment author: James_D._Miller 01 February 2008 04:03:16AM 0 points [-]

If we assume that Omega almost never makes a mistake and we allow the chooser to use true randomization (perhaps by using quantum physics) in making his choice, then Omega must make his decision in part through seeing into the future. In this case the chooser should obviously pick just B.

Comment author: Eliezer_Yudkowsky 01 February 2008 04:15:19AM 8 points [-]

Hanson: I suppose I might still be missing something, but this still seems to me just a simple example of time inconsistency

In my motivations and in my decision theory, dynamic inconsistency is Always Wrong. Among other things, it always implies an agent unstable under reflection.

A more familiar example is that we'd like to commit ahead of time to spending effort to punish people who hurt us, but after they hurt us we'd rather avoid spending that effort as the harm is already done.

But a self-modifying agent would modify to not rather avoid it.

Gowder: If a), then practical reason is meaningless anyway: you'll do what you'll do, so stop stressing about it.

Deterministic != meaningless. Your action is determined by your motivations, and by your decision process, which may include your stressing about it. It makes perfect sense to say: "My future decision is determined, and my stressing about it is determined; but if-counterfactual I didn't stress about it, then-counterfactual my future decision would be different, so it makes perfect sense for me to stress about this, which is why I am deterministically doing it."

The past can't change - does not even have the illusion of potential change - but that doesn't mean that people who, in the past, committed a crime, are not held responsible just because their action and the crime are now "fixed". It works just the same way for the future. That is: a fixed future should present no more problem for theories of moral responsibility than a fixed past.

Comment author: Chris_L 01 February 2008 04:28:17AM 4 points [-]

I don't see why this needs to be so drawn out.

I know the rules of the game. I also know that Omega is super intelligent, namely, Omega will accurately predict my action. Since Omega knows that I know this, and since I know that he knows I know this, I can rationally take box B, content in my knowledge that Omega has predicted my action correctly.

I don't think it's necessary to precommit to any ideas, since Omega knows that I'll be able to rationally deduce the winning action given the premise.

Comment author: Unknown 01 February 2008 04:28:41AM 0 points [-]

We don't even need a superintelligence. We can probably predict on the basis of personality type a person's decision in this problem with an 80% accuracy, which is already sufficient that a rational person would choose only box B.

Comment author: RobinHanson 01 February 2008 05:30:50AM 3 points [-]

The possibility of time inconsistency is very well established among game theorists, and is considered a problem of the game one is playing, rather than a failure to analyze the game well. So it seems you are disagreeing with most all game theorists in economics as well as most decision theorists in philosophy. Maybe perhaps they are right and you are wrong?

Comment author: Nominull3 01 February 2008 06:07:39AM 0 points [-]

The interesting thing about this game is that Omega has magical super-powers that allow him to know whether or not you will back out on your commitment ahead of time, and so you can make your commitment credible by not being going to back out on your commitment. If that makes any sense.

Comment author: Eliezer_Yudkowsky 01 February 2008 06:16:26AM 4 points [-]

Robin, remember I have to build a damn AI out of this theory, at some point. A self-modifying AI that begins anticipating dynamic inconsistency - that is, a conflict of preference with its own future self - will not stay in such a state for very long... did the game theorists and economists work a standard answer for what happens after that?

If you like, you can think of me as defining the word "rationality" to refer to a different meaning - but I don't really have the option of using the standard theory, here, at least not for longer than 50 milliseconds.

If there's some nonobvious way I could be wrong about this point, which seems to me quite straightforward, do let me know.

Comment author: Unknown 01 February 2008 06:23:24AM 1 point [-]

In reality, either I am going to take one box or two. So when the two-boxer says, "If I take one box, I'll get amount x," and "If I take two boxes, I'll get amount x+1000," one of these statements is objectively counterfactual. Let's suppose he is going to in fact take both boxes. Then his second takement is factual and his first statement counterfactual. Then his two statements are:

1)Although I am not in fact going to take only one box, were I to take only box, I would get amount x, namely the amount that would be in the box.

2)I am in fact going to take both boxes, and so I will get amount x+1000, namely 1000 more than how much is in fact in the other box.

From this it is obvious that x in the two statements has a different value, and so his conclusion that he will get more if he takes both boxes is false. For x has the value 1,000,000 in the first case, and 0 in the second. He mistakenly assumes it has the same value in the two cases.

Likewise, when the two-boxer says to the one boxer, "If you had taken both boxes, you would have gotten more," his statement is counterfactual and false. For if the one-boxer had been a two boxer, there originally would have been nothing in the other box, and so he would have gotten only $1000 instead of $1,000,000.

Comment author: Paul_Gowder 01 February 2008 06:27:37AM 0 points [-]

Eleizer: whether or not a fixed future poses a problem for *morality* is a hotly disputed question which even I don't want to touch. Fortunately, *this* problem is one that is pretty much wholly orthogonal to morality. :-)

But I feel like in the present problem the fixed future issue is a key to dissolving the problem. So, assume the box decision is fixed. It need not be the case that the stress is fixed too. If the stress isn't fixed, then it can't be relevant to the box decision (the box is fixed regardless of your decision between stress and no-stress). If the stress IS fixed, then there's no decision left to take. (Except possibly whether or not to stress about the stress, call that stress*, and recurse the argument accordingly.)

In general, for any pair of actions X and Y, where X is determined, either X is conditional on Y, in which case Y must also be determined, or not conditional on Y, in which case Y can be either determined or non-determined. So appealing to Y as part of the process that leads to X doesn't mean that something we could do to Y makes a difference if X is determined.

Comment author: Unknown 01 February 2008 07:05:37AM 3 points [-]

Paul, being fixed or not fixed has nothing to do with it. Suppose I program a deterministic AI to play the game (the AI picks a box.)

The deterministic AI knows that it is deterministic, and it knows that I know too, since I programmed it. So I also know whether it will take one or both boxes, and it knows that I know this.

At first, of course, it doesn't know itself whether it will take one or both boxes, since it hasn't completed running its code yet. So it says to itself, "Either I will take only one box or both boxes. If I take only one box, the programmer will have known this, so I will get 1,000,000. If I take both boxes, the programmer will have known this, so I will get 1,000. It is better to get 1,000,000 than 1,000. So I choose to take only one box."

If someone tries to confuse the AI by saying, "if you take both, you can't get less," the AI will respond, "I can't take both without different code, and if I had that code, the programmer would have known that and would have put less in the box, so I would get less."

Or in other words: it is quite possible to make a decision, like the AI above, even if everything is fixed. For you do not yet know in what way everything is fixed, so you must make a choice, even though which one you will make is already determined. Or if you found out that your future is completely determined, would you go and jump off a cliff, since this could not happen unless it were inevitable anyway?

Comment author: Roy_Haddad2 01 February 2008 07:08:26AM 2 points [-]

I practice historical European swordsmanship, and those Musashi quotes have a certain resonance to me*. Here is another (modern) saying common in my group:

If it's stupid, but it works, then it ain't stupid.

* you previously asked why you couldn't find similar quotes from European sources - I believe this is mainly a language barrier: The English were not nearly the swordsmen that the French, Italians, Spanish, and Germans were (though they were pretty mean with their fists). You should be able to find many quotes in those other languages.

Comment author: tcpkac 01 February 2008 07:25:28AM 0 points [-]

Eliezer, I don't read the main thrust of your post as being about Newcomb's problem per se. Having distinguished between 'rationality as means' to whatever end you choose, and 'rationality as a way of discriminating between ends', can we agree that the whole specks / torture debate was something of a red herring ? Red herring, because it was a discussion on using rationality to discriminate between ends, without having first defined one's meta-objectives, or, if one's meta-objectives involved hedonism, establishing the rules for performing math over subjective experiences. To illustrate the distinction using your other example, I could state that I prefer to save 400 lives certainly, simply because the purple fairy in my closet tells me to (my arbitrary preferred objective), and that would be perfectly legitimate. It would only be incoherent if I also declared it to be a strategy which would maximise the number of lives saved if a majority of people adopted it in similar circumstances (a different arbitrary preferred objective). I could in fact have as preferred meta-objective for the universe that all the squilth in flobjuckstooge be globberised, and that would be perfectly legitimate. An FAI (or a BFG, for that matter (Roald Dahl, not Tom Hall)) could scan me and work towards creating the universe in which my proposition is meaningful, and make sure it happens. If now someone else's preferred meta-objective for the universe is ensuring that the princess on page 3 gets a fairy cake, how is the FAI to prioritise ?

Comment author: Paul_Gowder 01 February 2008 07:28:13AM 1 point [-]

Unknown: your last question highlights the problem with your reasoning. It's idle to ask whether I'd go and jump off a cliff if I found my future were determined. What does that question even mean?

Put a different way, why should we ask an "ought" question about events that are determined? If A will do X whether or not it is the case that a rational person will do X, why do we care whether or not it is the case that a rational person will do X? I submit that we care about rationality because we believe it'll give us traction on our problem of deciding what to do. So assuming fatalism (which is what we must do if the AI knows what we're going to do, perfectly, in advance) demotivates rationality.

Here's the ultimate problem: our intuitions about these sorts of questions don't work, because they're fundamentally rooted in our self-understanding as agents. It's really, really hard for us to say sensible things about what it might mean to make a "decision" in a deterministic universe, or to understand what that implies. That's why Newcomb's problem is a problem -- because we have normative principles of rationality that make sense only when we assume that it *matters* whether or not we follow them, and we don't really know what it would mean to *matter* without causal leverage.

(There's a reason free will is one of Kant's antimonies of reason. I've been meaning to write a post about transcendental arguments and the limits of rationality for a while now... it'll happen one of these days. But in a nutshell... I just don't think our brains work when it comes down to comprehending what a deterministic universe looks like on some level other than just solving equations. And note that this might make evolutionary sense -- a creature who gets the best results through a [determined] causal chain that includes rationality is going to be selected for the beliefs that make it easiest to use rationality, including the belief that it makes a difference.)

Comment author: Unknown 01 February 2008 07:58:59AM 2 points [-]

Paul, it sounds like you didn't understand. A chess playing computer program is completely deterministic, and yet it has to consider alternatives in order to make its move. So also we could be deterministic and we would still have to consider all the possibilities and their benefits before making a move.

So it makes sense to ask whether you would jump off a cliff if you found out that the future is determined. You would find out that the future is determined without knowing exactly which future is determined, just like the chess program, and so you would have to consider the benefits of various possibilities, despite the fact that there is only one possibility, just like there is really only one possibility for the chess program.

So when you considered the various "possibilities", would "jumping off a cliff" evaluate as equal to "going on with life", or would the latter evalulate as better? I suspect you would go on with life, just like a chess program moves its queen to avoid being taken by a pawn, despite the fact that it was totally determined to do this.

Comment author: Paul_Gowder 01 February 2008 08:16:42AM 0 points [-]

I do understand. My point is that we ought not to *care* whether we're going to consider all the possibilities and benefits.

Oh, but you say, our caring about our consideration process is a determined part of the causal chain leading to our consideration process, and thus to the outcome.

Oh, but I say, we ought not to care* about that caring. Again, recurse as needed. Nothing you can say about the fact that a cognition is in the causal chain leading to a state of affairs counts as a point against the claim that we ought not to care about whether or not we have that cognition if it's unavoidable.

Comment author: Anonymous23 01 February 2008 08:43:18AM 2 points [-]

The paradox is designed to give your decision the practical effect of causing Box B to contain the money or not, without actually labeling this effect "causation." But I think that if Box B acts as though its contents are caused by your choice, then you should treat it as though they were. So I don't think the puzzle is really something deep; rather, it is a word game about what it means to cause something.

Perhaps it would be useful to think about how Omega might be doing its prediction. For example, it might have the ability to travel into the future and observe your action before it happens. In this case what you do is directly affecting what the box contains, and the problem's statement that whatever you choose won't affect the contents of the box is just wrong.

Or maybe it has a copy of the entire state of your brain, and can simulate you in a software sandbox inside its own mind long enough to see what you will do. In this case it makes sense to think of the box as not being empty or full until you've made your choice, if you are the copy in the sandbox. If you aren't the copy in the sandbox then you'd be better off choosing both boxes, but the way the problem's set up you can't tell this. You can still try to maximize future wealth. My arithmetic says that choosing Box B is the best strategy in this case. (Mixed strategies, where you hope that the sandbox version of yourself will randomly choose Box B alone and the outside one will choose both, are dominated by choosing Box B. Also I assume that if you are in the sandbox, you want to maximize the wealth of the outside agent. I think this is reasonable because it seems like there is nothing else to care about, but perhaps someone will disagree.)

You could interpret Omega differently than in these stories, although I think my first point above that you should think of your choice as causing Omega to put money in the box, or not, is reasonable. I would say that the fact that Omega put the money in the box chronologically before you make the decision is irrelevant. I think uncertainty about an event that has already happened, but that hasn't been revealed to you, is basically the same thing as uncertainty about something that hasn't happened yet, and it should be modeled the same way.

Comment author: kamenin 01 February 2008 09:04:55AM 4 points [-]

I have two arguments for going for Box B. First, for a scientist it's not unusual that every rational argument (=theory) predicts that only two-boxing makes sense. Still, if the experiment again and again refutes that, it's obviously the theory that's wrong and there's obviously something more to reality than that which fueled the theories. Actually, we even see dilemmas like Newcomb's in the contextuality of quantum measurements. Measurement tops rationality or theory, every time. That's why science is successful and philosophy is not.

Second, there's no question I choose box B. Either I get the million $ -- or I have proven an extragalactical superintelligence wrong. How cool is that? 1000$? Have you looked at the exchange rates lately?

Comment author: Unknown 01 February 2008 09:54:41AM 0 points [-]

Paul, if we were determined, what would you mean when you say that "we ought not to care"? Do you mean to say that the outcome would be better if we didn't care? The fact that the caring is part of the causal chain does have something to do with this: the outcome may be determined by whether or not we care. So if you consider one outcome better than another (only one really possible, but both possible as far as you know), then either "caring" or "not caring" might be preferable, depending on which one would lead to each outcome.

Comment author: RobinHanson 01 February 2008 12:20:59PM 0 points [-]

Eliezer, if a smart creature modifies itself in order to gain strategic advantages from committing itself to future actions, it must think could better achieve its goals by doing so. If so, why should we be concerned, if those goals do not conflict with our goals?

Comment author: Toby_Ord2 01 February 2008 12:36:26PM 2 points [-]

I think Anonymous, Unknown and Eliezer have been very helpful so far. Following on from them, here is my take:

There are many ways Omega could be doing the prediction/placement and it may well matter exactly how the problem is set up. For example, you might be deterministic and he is precalculating your choice (much like we might be able to do with an insect or computer program), or he might be using a quantum suicide method, (quantum) randomizing whether the million goes in and then destroying the world iff you pick the wrong option (This will lead to us observing him being correct 100/100 times assuming a many worlds interpretation of QM). Or he could have just got lucky with the last 100 people he tried it on.

If it is the deterministic option, then what do the counterfactuals about choosing the other box even mean? My approach is to say that 'You could choose X' means that if you had desired to choose X, then you would have. This is a standard way of understanding 'could' in a deterministic universe. Then the answer depends on how we suppose the world to be different to give you counterfactual desires. If we do it with a miracle near the moment of choice (history is the same, but then your desires change non-physically), then you ought two-box as Omega can't have predicted this. If we do it with an earlier miracle, or with a change to the initial conditions of the universe (the Tannsjo interpretation of counterfactuals) then you ought one-box as Omega would have predicted your choice. Thus, if we are understanding Omega as extrapolating your deterministic thinking, then the answer will depend on how we understand the counterfactuals. One-boxers and Two-boxers would be people who interpret the natural counterfactual in the example in different (and equally valid) ways.

If we understand it as Omega using a quantum suicide method, then the objectively right choice depends on his initial probabilities of putting the million in the box. If he does it with a 50% chance, then take just one box. There is a 50% chance the world will end either choice, but this way, in the case where it doesn't, you will have a million rather than a thousand. If, however, he uses a 99% chance of putting nothing in the box, then one-boxing has a 99% chance of destroying the world which dominates the value of the extra money, so instead two-box, take the thousand and live.

If he just got lucky a hundred times, then you are best off two-boxing.

If he time travels, then it depends on the nature of time-travel...

Thus the answer depends on key details not told to us at the outset. Some people accuse all philosophical examples (like the trolley problems) of not giving enough information, but in those cases it is fairly obvious how we are expected to fill in the details. This is not true here. I don't think the Newcomb problem has a single correct answer. The value of it is to show us the different possibilities that could lead to the situation as specified and to see how they give different answers, hopefully illuminating the topic of free-will, counterfactuals and prediction.

Comment author: Ben_Jones 01 February 2008 01:23:49PM 1 point [-]

Be careful of this sort of argument, any time you find yourself defining the "winner" as someone other than the agent who is currently smiling from on top of a giant heap.

This made me laugh. Well said!

There's only one question about this scenario for me - is it possible for a sufficiently intelligent being to fully, fully model an individual human brain? If so, (and I think it's tough to argue 'no' unless you think there's a serious glass ceiling for intelligence) choose box B. If you try and second-guess (or, hell, googolth-guess) Omega, you're taking the risk that Omega is not smart enough to have modelled your consciousness sufficiently well. How big is this risk? 100 times out of 100 speaks for itself. Omega is cleverer than we can understand. Box B.

(Time travel? No thanks. I find the probability that Omega is simulating people's minds a hell of a lot more likely than that he's time travelling, destroying the universe etc. And even if he were, Box B!)

If you can have your brain modelled exactly - to the point where there is an identical simulation of your entire conscious mind and what it perceives - then a lot of weird stuff can go on. However, none of it will violate causality. (Quantum effects messing up the simulation or changing the original? I guess if the model could be regularly updated based on the original...but I don't know what I'm talking about now ;) )

Comment author: Zubon 01 February 2008 02:08:29PM 2 points [-]

How does the box know? I could open B with the intent of opening only B or I could open B with the intent of then opening A. Perhaps Omega has locked the boxes such that they only open when you shout your choice to the sky. That would beat my preferred strategy of opening B before deciding which to choose. I open boxes without choosing to take them all the time.

Are our common notions about boxes catching us here? In my experience, opening a box rarely makes nearby objects disintegrate. It is physically impossible to "leave $1000 on the table," because it will disintegrate if you do not choose A. I also have no experience with trans-galactic super-intelligences, and its ability to make time-traveling super-boxes is already covered by the discussion above. I think of boxes as things that either are full or are not, independent of my intentions, but I also think of them as things that do not disintegrate based on my intentions.

Taking both is equivalent to just taking A. Restate the problem that way: take A and get $1000 or take B and get $1,000,000. Which would you prefer?

I think the problem becomes more amusing if box A does not disintegrate. They are just two cardboard boxes, one of which is open and visibly has $1000 in it. You don't shout your intention to the sky, you just take whatever boxes you like. The reasonable thing to do is open box B; if it is empty, take box A too; if it is full of money, heck, take box A too. They're boxes, they can't stop you. But that logic makes you a two-boxer, so if Omega anticipates it, and Omega does, B will be empty. You definitely need to pre-commit to taking only B. Assume you have, and you open B, and B has $1,000,000. You win! Now what do you do? A is just sitting there with $1000 in it. You already have your million. You even took it out of the box, in case the box disintegrates. Do you literally walk away from $1000, on the belief that Omega has some hidden trick to retroactively make B empty? The rule was not that the money would go away if you took both, the rule is that B would be empty. B was not empty. A is still there. You already won for being a one-boxer, does anything stop you from being a two-boxer and winning the bonus $1000?

Comment author: Eliezer_Yudkowsky 01 February 2008 02:14:56PM 2 points [-]

Eliezer, if a smart creature modifies itself in order to gain strategic advantages from committing itself to future actions, it must think could better achieve its goals by doing so. If so, why should we be concerned, if those goals do not conflict with our goals?

Well, there's a number of answers I could give to this:

*) After you've spent some time working in the framework of a decision theory where dynamic inconsistencies naturally Don't Happen - not because there's an extra clause forbidding them, but because the simple foundations just don't give rise to them - then an intertemporal preference reversal starts looking like just another preference reversal.

*) I developed my decision theory using mathematical technology, like Pearl's causal graphs, that wasn't around when causal decision theory was invented. (CDT takes counterfactual distributions as fixed givens, but I have to compute them from observation somehow.) So it's not surprising if I think I can do better.

*) We're not talking about a patchwork of self-modifications. An AI can easily generally modify its future self once-and-for-all to do what its past self would have wished on future problems even if the past self did not explicitly consider them. Why would I bother to consider the general framework of classical causal decision theory when I don't expect the AI to work inside that general framework for longer than 50 milliseconds?

*) I did work out what an initially causal-decision-theorist AI would modify itself to, if it booted up on July 11, 2018, and it looks something like this: "Behave like a nonclassical-decision-theorist if you are confronting a Newcomblike problem that was determined by 'causally' interacting with you after July 11, 2018, and otherwise behave like a classical causal decision theorist." Roughly, self-modifying capability in a classical causal decision theorist doesn't fix the problem that gives rise to the intertemporal preference reversals, it just makes one temporal self win out over all the others.

*) Imagine time spread out before you like a 4D crystal. Now imagine pointing to one point in that crystal, and saying, "The rational decision given information X, and utility function Y, is A", then pointing to another point in the crystal and saying "The rational decision given information X, and utility function Y, is B". Of course you have to be careful that all conditions really are exactly identical - the agent has not learned anything over the course of time that changes X, the agent is not selfish with temporal deixis which changes Y. But if all these conditions are fulfilled, I don't see why an intertemporal inconsistency should be any less disturbing than an interspatial inconsistency. You can't have 2 + 2 = 4 in Dallas and 2 + 2 = 3 in Minneapolis.

*) What happens if I want to use a computation distributed over a large enough volume that there are lightspeed delays and no objective space of simultaneity? Do the pieces of the program start fighting each other?

*) Classical causal decision theory is just not optimized for the purpose I need a decision theory for, any more than a toaster is likely to work well as a lawnmower. They did not have my design requirements in mind.

*) I don't have to put up with dynamic inconsistencies. Why should I?

Comment author: Caledonian2 01 February 2008 02:15:00PM 3 points [-]

So it seems you are disagreeing with most all game theorists in economics as well as most decision theorists in philosophy. Maybe perhaps they are right and you are wrong?

Maybe perhaps we are right and they are wrong?

The issue is to be decided, not by referring to perceived status or expertise, but by looking at who has the better arguments. Only when we cannot evaluate the arguments does making an educated guess based on perceived expertise become appropriate.

Again: how much do we want to bet that Eliezer won't admit that he's wrong in this case? Do we have someone willing to wager another 10 credibility units?

Comment author: Paul_Crowley 01 February 2008 02:31:55PM 4 points [-]

Caledonian: you can stop talking about wagering credibility units now, we all know you don't have funds for the smallest stake.

Ben Jones: if we assume that Omega is perfectly simulating the human mind, then when we are choosing between B and A+B, we don't know whether we are in reality or simulation. In reality, our choice does not affect the million, but in the simulation this will. So we should reason "I'd better take only box B, because if this is the simulation then that will change whether or not I get the million in reality".

Comment author: RobinHanson 01 February 2008 02:42:43PM 1 point [-]

There is a big difference between having time inconsistent preferences, and time inconsistent strategies because of the strategic incentives of the game you are playing. Trying to find a set of preferences that avoids all strategic conflicts between your different actions seems a fool's errand.

Comment author: Caledonian2 01 February 2008 02:48:24PM 2 points [-]

What we have here is an inability to recognize that causality no longer flows only from 'past' to 'future'.

If we're given a box that could contain $1,000 or nothing, we calculate the expected value of the superposition of these two possibilities. We don't actually expect that there's a superposition within the box - we simply adopt a technique to help compensate for what we do not know. From our ignorant perspective, either case could be real, although in actuality either the box has the money or it does not.

This is similar. The amount of money in the box depends on what choice we make. The fact that the placement of money into the box happened in the past is irrelevant, because we've already presumed that the relevant cause-and-effect relationship works backwards in time.

Eliezer states that the past is fixed. Well, it may be fixed in some absolute sense (although that is a complicated topic), but from our ignorant perspective we have to consider what appears to us to be the possible alternatives. That means that we must consider the money in the boxes to be uncertain. Choosing causes Omega to put a particular amount of money in the box. That this happened in the past is irrelevant, because the causal dependence points into the past instead of the future.

Even if we ignore actual time travel, we must consider the amount of money present to be uncertain until we choose, which then determines how much is there - in the sense of our technique, from our limited perspective.

If we accept that Omega is really as accurate as it appears to be - which is not a small thing to accept, certainly - and we want to maximize money, then the correct choice is B.

Comment author: JulianMorrison 01 February 2008 04:55:56PM -1 points [-]

How about simply multiplying? Treat Omega as a fair coin toss. 50% of a million is half-a-million, and that's vastly bigger than a thousand. You can ignore the question of whether omega has filled the box, in deciding that the uncertain box is more important. So much more important, that the chance of gaining an extra 1000 isn't worth the bother of trying to beat the puzzle. You just grab the important box.

Comment author: zzz2 01 February 2008 06:42:48PM 1 point [-]

After you've spent some time working in the framework of a decision theory where dynamic inconsistencies naturally Don't Happen - not because there's an extra clause forbidding them, but because the simple foundations just don't give rise to them - then an intertemporal preference reversal starts looking like just another preference reversal.

... Roughly, self-modifying capability in a classical causal decision theorist doesn't fix the problem that gives rise to the intertemporal preference reversals, it just makes one temporal self win out over all the others.

This is a genuine concern. Note that most instances of precommitment arise quite naturally due to reputational concerns: any agent which is complex enough to come up with the concept of reputation will make superficially irrational ("hawkish") choices in order not to be pushed around in the future. Moreover, precommitment is only worthwhile if it can be accurately assessed by the counterparty: an agent will not want to "generally modify its future self ... to do what its past self would have wished" unless it can gain a reputational advantage by doing so.

Comment author: Eliezer_Yudkowsky 01 February 2008 06:48:07PM 1 point [-]

There is a big difference between having time inconsistent preferences, and time inconsistent strategies because of the strategic incentives of the game you are playing.

I can see why a human would have time-inconsistent strategies - because of inconsistent preferences between their past and future self, hyperbolic discounting functions, that sort of thing. I am quite at a loss to understand why an agent with a constant, external utility function should experience inconsistent strategies under any circumstance, regardless of strategic incentives. Expected utility lets us add up conflicting incentives and reduce to a single preference: a multiplicity of strategic incentives is not an excuse for inconsistency.

I am a Bayesian; I don't believe in probability calculations that come out different ways when you do them using different valid derivations. Why should I believe in decisional calculations that come out in different ways at different times?

I'm not sure that even a causal decision theorist would agree with you about strategic inconsistency being okay - they would just insist that there is an important difference between deciding to take only box B at 7:00am vs 7:10am, if Omega chooses at 7:05am, because in the former case you cause Omega's action while in the latter case you do not. In other words, they would insist the two situations are importantly different, not that time inconsistency is okay.

And I observe again that a self-modifying AI which finds itself with time-inconsistent preferences, strategies, what-have-you, will not stay in this situation for long - it's not a world I can live in, professionally speaking.

Trying to find a set of preferences that avoids all strategic conflicts between your different actions seems a fool's errand.

I guess I completed the fool's errand, then...

Do you at least agree that self-modifying AIs tend not to contain time-inconsistent strategies for very long?

Comment author: RobinHanson 01 February 2008 06:55:14PM 2 points [-]

The entire issue of casual versus inferential decision theory, and of the seemingly magical powers of the chooser in the Newcomb problem, are serious distractions here, as Eliezer has the same issue in an ordinary commitment situation, e.g., punishment. I suggest starting this conversation over from such an ordinary simple example.

Comment author: Tom_Crispin 01 February 2008 06:57:41PM 2 points [-]

Let me restate: Two boxes appear. If you touch box A, the contents of box B are vaporized. If you attempt to open box B, box A and it's contents are vaporized. Contents as previously specified. We could probably build these now.

Experimentally, how do we distinguish this from the description in the main thread? Why are we taking Omega seriously when if the discussion dealt with the number of angels dancing on the head of pin the derision would be palpable? The experimental data point to taking box B. Even if Omega is observed delivering the boxes, and making the specified claims regarding their contents, why are these claims taken on faith as being an accurate description of the problem?

Comment author: Hendrik_Boom 01 February 2008 07:03:00PM 4 points [-]

Let's take Bayes seriously.

Sometime ago there was a posting about something like "If all you knew was that the past 5 mornings the sun rose, what would you assign the probability the that sun would rise next morning? It came out so something like 5/6 or 4/5 or so.

But of course that's not all we know, and so we'd get different numbers.

Now what's given here is that Omega has been correct on a hundred occasions so far. If that's all we know, we should estimate the probability of him being right next time at about 99%. So if you're a one-boxer your expectation would be $990,000 and a two-boxer would have an expectation of $11,000.

But the whole argument seems to be about what extra knowledge you have; in particular, Can causation work in reverse? or Is Omega really superintelligent? or even Are the conditions stated in the problem logically inconsistent (which would justify any answer.)

Perhaps someone who enjoys these kinds of odds calculations could investigate the extent to which we know these things and how it affects the outcome?

Comment author: Unknown3 01 February 2008 07:09:00PM 4 points [-]

Eliezer, I have a question about this: "There is no finite amount of life lived N where I would prefer a 80.0001% probability of living N years to an 0.0001% chance of living a googolplex years and an 80% chance of living forever. This is a sufficient condition to imply that my utility function is unbounded."

I can see that this preference implies an unbounded utility function, given that a longer life has a greater utility. However, simply stated in that way, most people might agree with the preference. But consider this gamble instead:

A: Live 500 years and then die, with certainty.
B: Live forever, with probability 0.000000001%; die within the next ten seconds, with probability 99.999999999%

Do you choose A or B? Is it possible to choose A and have an unbounded utility function with respect to life? It seems to me that an unbounded utility function implies the choice of B. But then what if the probability of living forever becomes one in a googleplex, or whatever? Of course, this is a kind of Pascal's Wager; but it seems to me that your utility function implies that you should accept the Wager.

It also seems to me that the intuitions suggesting to you and others that Pascal's Mugging should be rejected similarly are based on an intuition of a bounded utility function. Emotions can't react infinitely to anything; as one commenter put it, "I can only feel so much horror." So to the degree that people's preferences reflect their emotions, they have bounded utility functions. In the abstract, not emotionally but mentally, it is possible to have an unbounded function. But if you do, and act on it, others will think you a fanatic. For a fanatic cares infinitely for what he perceives to be an infinite good, whereas normal people do not care infinitely about anything.

This isn't necessarily against an unbounded function; I'm simply trying to draw out the implications.

Comment author: zzz 01 February 2008 07:16:00PM 0 points [-]

they would just insist that there is an important difference between deciding to take only box B at 7:00am vs 7:10am, if Omega chooses at 7:05am

But that's exactly what strategic inconsistency is about. Even if you had decided to take only box B at 7:00am, by 7:06am a rational agent will just change his mind and choose to take both boxes. Omega knows this, hence it will put nothing into box B. The only way out is if the AI self-commits to take only box B is a way that's verifiable by Omega.

Comment author: Jeremy_McKibben 01 February 2008 08:04:00PM 0 points [-]

When the stakes are high enough I one-box, while gritting my teeth. Otherwise, I'm more interested in demonstrating my "rationality" (Eliezer has convinced me to use those quotes).

Perhaps we could just specify an agent that uses reverse causation in only particular situations, as it seems that humans are capable of doing.

Comment author: Ben_Jones 01 February 2008 09:45:00PM 0 points [-]

Paul G, almost certainly, right? Still, as you say, it has little bearing on one's answer to the question.

In fact, not true, it does. Is there anything to stop myself making a mental pact with all my simulation buddies (and 'myself', whoever he be) to go for Box B?

Comment author: Brian_Jaress2 01 February 2008 11:20:00PM 1 point [-]

In arguing for the single box, Yudkowsky has made an assumption that I disagree with: at the very end, he changes the stakes and declares that your choice should still be the same.

My way of looking at it is similar to what Hendrik Boom has said. You have a choice between betting on Omega being right and betting on Omega being wrong.

A = Contents of box A

B = What may be in box B (if it isn't empty)

A is yours, in the sense that you can take it and do whatever you want with it. One thing you can do with A is pay it for a chance to win B if Omega is right. Your other option is to pay nothing for a chance to win B if Omega is wrong.

Then just make your bet based on what you know about Omega. As stated, we only know his track record over 100 attempts, so use that. Don't worry about the nature of causality or whether he might be scanning your brain. We don't know those things.

If you do it that way, you'll probably find that your answer depends on A and B as well as Omega's track record.

I'd probably put Omega at around 99%, as Hendrik did. Keeping A at a thousand dollars, I'd one-box if B were a million dollars or if B were something I needed to save my life. But I'd two-box if B were a thousand dollars and one cent.

So I think changing A and B and declaring that your strategy must stay the same is invalid.

Comment author: Tom_Breton 01 February 2008 11:45:00PM 3 points [-]

IMO there's less to Newcomb's paradox than meets the eye. It's basically "A future-predicting being who controls the set of choices could make rational choices look silly by making sure they had bad outcomes". OK, yes, he could. Surprised?

What I think makes it seem paradoxical is that the paradox both assures us that Omega controls the outcome perfectly, and cues us that this isn't so ("He's already left" etc). Once you settle what it's really saying either way, the rest follows.

Comment author: Alex_Rockwell 02 February 2008 12:13:00AM 2 points [-]

Yes, this is really an issue of whether your choice causes Omega's action or not. The only way for Omega to be a perfect predictor is for your choice to actually cause Omega's action. (For example, Omega 'sees the future' and acts based on your choice). If your choice causes Omega's action, then choosing B is the rational decision, as it causes the box to have the million.

If your choice does not cause Omega's action, then choosing both boxes is the winning approach. in this case, Omega is merely giving big awards to some people and small awards to others.

If your choice has some %age chance of causing Omega's action, then the problem becomes one of risk management. What is your chance of getting the big award if you choose B compared with the utility of the two chocies.


I agree with what Tom posted. The only paradox here is that the problem both states that your choice causes Omega's action (because it supposedly predicts perfectly), and also says that your action does not cause Omega's action (because the decision is already made). Thus, wether or not you think box B, or both boxes is the correct choice, depends on which of these two contradictory statements you end up believing.

Comment author: Douglas_Knight2 02 February 2008 12:24:00AM 0 points [-]

the dominant consensus in modern decision theory is that one should two-box...there's a common attitude that "Verbal arguments for one-boxing are easy to come by, what's hard is developing a good decision theory that one-boxes"

Those are contrary positions, right?


Robin Hason:
Punishment is ordinary, but Newcomb's problem is simple! You can't have both.

The advantage of an ordinary situation like punishment is that game theorists can't deny the fact on the ground that governments exist, but they can claim it's because we're all irrational, which doesn't leave many directions to go in.

Comment author: PK 02 February 2008 04:22:00AM 2 points [-]

I agree that "rationality" should be the thing that makes you win but the Newcomb paradox seems kind of contrived.

If there is a more powerful entity throwing good utilities at normally dumb decisions and bad utilities at normally good decisions then you can make any dumb thing look genius because you are under different rules than the world we live in at present.

I would ask Alpha for help and do what he tells me to do. Alpha is an AI that is also never wrong when it comes to predicting the future, just like Omega. Alpha would examine omega and me and extrapolate Omega's extrapolated decision. If there is a million in box B I pick both otherwise just B.

Looks like Omega will be wrong either way, or will I be wrong? Or will the universe crash?

Comment author: Grant 02 February 2008 05:22:00AM 1 point [-]

To me, the decision is very easy. Omega obviously possesses more prescience about my box-taking decision than I do myself. He's been able to guess correct in the past, so I'd see no reason to doubt him with myself. With that in mind, the obvious choice is to take box B.

If Omega is so nearly always correct, then determinism is shown to exist (at least to some extent). That being the case, causality would be nothing but an illusion. So I'd see no problem with it working in "reverse".

Comment author: Greg_Reimer 07 February 2008 03:47:00AM 7 points [-]

Fascinating. A few days after I read this, it struck me that a form of Newcomb's Problem actually occurs in real life--voting in a large election. Here's what I mean.

Say you're sitting at home pondering whether to vote. If you decide to stay home, you benefit by avoiding the minor inconvenience of driving and standing in line. (Like gaining $1000.) If you decide to vote, you'll fail to avoid the inconvenience, meanwhile you know your individual vote almost certainly won't make a statistical difference in getting your candidate elected. (Which would be like winning $1000000.) So rationally, stay at home and hope your candidate wins, right? And then you'll have avoided the inconvenience too. Take both boxes.

But here's the twist. *If* you muster the will to vote, it stands to reason that those of a similar mind to you (a potentially statistically significant number of people) would also muster the will to vote, because of their similarity to you. So knowing this, why not stay home anyway, avoid the inconvenience, and trust all those others to vote and win the election? They're going to do what they're going to do. Your actions can't change that. The contents of the boxes can't be changed by your actions. Well, if you don't vote, perhaps that means neither will the others, and so it goes. Therein lies the similarity to Newcomb's problem.

Comment author: AndyCossyleon 04 November 2010 09:52:07PM *  5 points [-]

A very good point. I'm the type to stay home from the polls. But I'd also one-box..... hm.

I think it may have to do with the very weak correlation between my choice to vote and the choice of those of a similar mind to me to vote as opposed to the very strong correlation between my choice to one-box and Omega's choice to put $1,000,000 in box B.

Comment author: wedrifid 04 November 2010 10:21:12PM 0 points [-]

Rational agents defect against a bunch of irrational fools who are mostly choosing for signalling purposes and who may well vote for the other guy even if they cooperate.

Comment author: Will_Pearson 04 April 2008 04:59:00PM 0 points [-]

"If it ever turns out that Bayes fails - receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions - then Bayes has to go out the window."

What exactly do you mean by mere decisions? I can construct problems where agents that use few computational resources win. Bayesian agents by your own admission have to use energy to get in mutual information with the environment (a state I am still suspecious of), so they have to use energy, meaning they lose.

Comment author: Tim_Freeman 12 April 2008 09:18:00PM 1 point [-]

The premise is that a rational agent would start out convinced that this story about the alien that knows in advance what they'll decide appears to be false.

The Kolomogorov complexity of the story about the alien is very large because we have to hypothesize some mechanism by which it can extrapolate the contents of minds. Even if I saw the alien land a million times and watched the box-picking connect with the box contents as they're supposed to, it is simpler to assume that the boxes are some stage magic trick, or even that they are an exception to the usual laws of physics.

Once we've done enough experiments that we're forced into the hypothesis that the boxes are an exception to the usual laws of physics, it's pretty clear what to do. The obvious revised laws of physics based on the new observations make it clear that one should choose just one box.

So a rational agent would do the right thing, but only because there's no way to get it to believe the backstory.

Comment author: MattMahoney 12 April 2008 11:56:00PM 1 point [-]

It is not possible for an agent to make a rational choice between 1 or 2 boxes if the agent and Omega can both be simulated by Turing machines. Proof: Omega predicts the agent's decision by simulating it. This requires Omega to have greater algorithmic complexity than the agent (including the nonzero complexity of the compiler or interpreter). But a rational choice by the agent requires that it simulate Omega, which requires that the agent have greater algorithmic complexity instead.

In other words, the agent X, with complexity K(X), must model Omega which has complexity K(X + "put $1 million in box B if X does not take box A"), which is slightly greater than K(X).

In the framework of the ideal rational agent in AIXI, the agent guesses that Omega is the shortest program consistent with the observed interaction so far. But it can never guess Omega because its complexity is greater than that of the agent. Since AIXI is optimal, no other agent can make a rational choice either.

As an aside, this is also a wonderful demonstration of the illusion of free will.

Comment author: 01 27 April 2008 07:44:00PM 0 points [-]

Okay, maybe I am stupid, maybe I am unfamiliar with all the literature on the problem, maybe my English sucks, but I fail to understand the following:
-
Is the agent aware of the fact that one boxers get 1 000 000 at the moment Omega "scans" him and presents the boxes?

OR

Is agent told about this after Omega "has left"?

OR

Is agent unaware of the fact that Omega rewards one-boxers at all?
-
P.S.: Also, as most "decision paradoxes", this one will have different solutions depending on the context (is the agent a starving child in Africa, or a "megacorp" CEO)

Comment author: Alan6 12 May 2008 03:12:00AM 1 point [-]

I'm a convinced two-boxer, but I'll try to put my argument without any bias. It seems to me the way this problem has been put has been an attempt to rig it for the one boxers. When we talk about "precommitment" it is suggested the subject has an advance knowledge of Omega and what is to happen. The way I thought the paradox worked, was that Omega would scan/analyze a person and make its prediction, all before the person ever heard of the dilemna. Therefore, a person has no way to develop an intention of being a one-boxer or a two-boxer that in any way affects Omega's prediction. For the Irene/Rachel situation, there is no way to ever "precommit;" the subject never gets to play Omega's game again and Omega scans their brains before they ever heard of him. (So imagine you only had one shot at playing Omega's game, and Omega made its prediction before you ever came to this website or anywhere else and heard about Newcomb's paradox. Then that already decides what it puts in the boxes.)

Secondly, I think a requirement of the problem is that your choice, at the time of actually taking the box(es), cannot effect what's in the box. What we have here are two completely different problems; if in any way Omega or your choice information can travel back in time to change the contents of the box, the choice is trivial. So yes, Omega may have chosen to discriminate against rational people and award irrational ones; the point is, there is absolutely nothing we can do about it (neither in precommitment or at the actual time to choose).

To clarify why I think two-boxing is the right choice, I would propose a real life experiment. Let's say we developed a survey, which, by asking people various questions about logic or the paranormal etc..., we use to classify them into one-boxers or two-boxers. The crux of the setup is, all the volunteers we take have never heard of the Newcomb Paradox; we make up any reason we want for them to take the survey. THEN, having already placed money or no money in box B, we give them the story about Omega and let them make the choice. Hypothetically, our survey could be 100% accurate; even if not it may be very accurate such that many of our predicted one-boxers will be glad to find their choice rewarded. In essence, they cannot "precommit" and their choice won't magically change the contents of the box (based on a human survey). They also cannot go back and convince themselves to cheat on our survey - it's impossible - and that is how Omega is supposed to operate. The point is, from the experimental point of view, every single person would make more from taking both boxes, because at the time of choice there's always the extra $1000 in box A.

Comment author: AndyWood 19 May 2008 06:56:00PM 1 point [-]

If the alien is able to predict your decision, it follows that your decision is a function of your state at the time the alien analyzes you. Then, there is no meaningful question of "what should you do?" Either you are in a universe in which you are disposed to choose the one box AND the alien has placed the million dollars, or you are in a universe in which you are disposed to take both boxes AND the alien has placed nothing. If the former, you will have the subjective experience of "deciding to take the one box", which is itself a deterministic process that feels like a free choice, and you will find the million. If the latter, you will have the subjective experience of "deciding to take both boxes", and you will find nothing in the opaque box.

In short, the framing of the problem implies that your decision-making process is deterministic (which does not preclude it being a process that you are conscious of participating in), and the figurative notion of "free will" does not include literal degrees of freedom. If you must insist on viewing it as a question of what the correct action is, it's to take the one box. Regardless of your motivation, even if your reason for doing so is this argument, you will find yourself in a universe in which events (including thought events) have led you to take one box, and these are the same universes in which the alien places a million dollars in the box.

Comment author: Lewis_Powell 24 June 2008 08:40:00PM 1 point [-]

Yes, but when I tried to write it up, I realized that I was starting to write a small book. And it wasn't the most important book I had to write, so I shelved it. My slow writing speed really is the bane of my existence. The theory I worked out seems, to me, to have many nice properties besides being well-suited to Newcomblike problems. It would make a nice PhD thesis, if I could get someone to accept it as my PhD thesis. But that's pretty much what it would take to make me unshelve the project. Otherwise I can't justify the time expenditure, not at the speed I currently write books.

If you have a solution to Newcomb's Problem, but don't have the time to work on it, is there any chance you will post a sketch of your solution for other people to investigate and/or develop?

Comment author: Eneasz 01 July 2008 06:19:00PM 2 points [-]

Isn't this the exact *opposite* arguement from the one that was made in Dust Specks vs 50 Years of Torture?

Correct me if I'm wrong, but the argument in this post seems to be "Don't cling to a supposedly-perfect 'causal decision theory' if it would make you lose gracefully, take the action that makes you WIN."

And the argument for preferring 50 Years of Torture over 3^^^3 Dust Specks is that "The moral theory is perfect. It must be clung to, even when the result is a major loss."

How can both of these be true?

(And yes, I am defining "preferring 50 Years of Torture over 3^^^3 Dust Specks" as an unmitigated loss. A moral theory that returns a result that almost every moral person alive would view as abhorrent has at least one flaw if it could produce such a major loss.)

Comment author: HalFinney 01 July 2008 06:58:00PM 0 points [-]

One belated point, some people seem to think that Omega's successful prediction is virtually impossible and that the experiment is a purely fanciful speculation. However it seems to me entirely plausible that having you fill out a questionnaire while being brain scanned might well bring this situation into practicality in the near future. The questions, if filled out correctly, could characterize your personality type with enough accuracy to give a very strong prediction about what you will do. And if you lie, in the future that might be detected with a brain scan. I don't see anything about this scenario which is absurd, impossible, or even particularly low probability. The one problem is that there might well be a certain fraction of people for whom you really can't predict what they'll do, because they're right on the edge and will decide more or less at random. But you could exclude them from the experiment and just give those with solid predictions a shot at the boxes.

Comment author: Dagon 04 September 2008 10:41:00PM 0 points [-]

Somehow I'd never thought of this as a rationalist's dilemma, but rather a determinism vs free will illustration. I still see it that way. You cannot both believe you have a choice AND that Omega has perfect prediction.

The only "rational" (in all senses of the word) response I support is: shut up and multiply. Estimate the chance that he has predicted wrong, and if that gives you +expected value, take both boxes. I phrase this as advice, but in fact I mean it as prediction of rational behavior.

Comment author: Tim_Tyler 27 October 2008 10:57:00AM 1 point [-]

In my motivations and in my decision theory, dynamic inconsistency is Always Wrong. Among other things, it always implies an agent unstable under reflection.

If you really want to impress an inspector who can see your internal state, by altering your utility function to conform to their wishes, then one strategy would be to create a trusted external "brain surgeon" agent with the keys to your utility function to change it back again after your utility function has been inspected - and then forget all about the existence of the surgeon.

The inspector will be able to see the lock on your utility function - but those are pretty standard issue.

Comment author: John_Maxwell 06 November 2008 01:28:00AM 0 points [-]

As a rationalist, it might be worthwhile to take the one box just so those Omega know-it-alls will be wrong for once.

Comment author: anon16 25 December 2008 06:27:00PM 0 points [-]

If random number generators not determinable by Omega exist, generate one bit of entropy. If not, take the million bucks. Quantum randomness anyone?

Comment author: Benja_Fallenstein 02 January 2009 05:57:00AM 0 points [-]

Given how many times Eliezer has linked to it, it's a little surprising that nobody seems to have picked up on this yet, but the paragraph about the utility function not being up for grabs seems to have a pretty serious technical flaw:

There is no finite amount of life lived N where I would prefer a 80.0001% probability of living N years to an 0.0001% chance of living a googolplex years and an 80% chance of living forever. This is a sufficient condition to imply that my utility function is unbounded.

Let p = 80% and let q be one in a million. I'm pretty sure that what Eliezer has in mind is,

(A) For all n, there is an even larger n' such that (p+q)*u(live n years) < p*u(live n' years) + q*(live a googolplex years).

This indeed means that {u(live n' years) | n' in N} is not upwards bounded -- I did check the math :-) --, which means that u is not upwards bounded, which means that u is not bounded. But what he actually said was,

(B) For all n, (p+q)*u(live n years) <= p*u(live forever) + q*u(live googolplex years)

That's not only different from A, it contradicts A! It doesn't imply that u needs to be bounded, of course, but it flat out states that {u(live n years) | n in N} is upwards bounded by (p*u(live forever) + q*u(live googolplex years))/(p+q).

(We may perhaps see this as reason enough to extend the domain of our utility function to some superset of the real numbers. In that case it's no longer necessary for the utility function to be unbounded to satisfy (A), though -- although we might invent a new condition like "not bounded by a real number.")

Comment author: Eliezer_Yudkowsky 02 January 2009 06:57:00AM 0 points [-]

Benja, the notion is that "live forever" does not have any finite utility, since it is bounded below by a series of finite lifetimes whose utility increases without bound.

Comment author: Benja_Fallenstein 02 January 2009 03:01:00PM 0 points [-]

*thinks* -- Okay, so if I understand you correctly now, the essential thing I was missing that you meant to imply was that the utility of living forever must necessarily be equal to (cannot be larger than) the limit of the utilities of living a finite number of years. Then, if u(live forever) is finite, p times the difference between u(live forever) and u(live n years) must become arbitrarily small, and thus, eventually smaller than q times the difference between u(live n years) and u(live googolplex years). You then arrive at a contradiction, from which you conclude that u(live forever) = the limit of u(live n years) cannot be finite. Okay. Without the qualification I was missing, the condition wouldn't be inconsistent with a bounded utility function, since the difference wouldn't have to get arbitrarily small, but the qualification certainly seems reasonable.

(I would still prefer for all possibilities considered to have defined utilities, which would mean extending the range of the utility function beyond the real numbers, which would mean that u(live forever) would, technically, be an upper bound for {u(live n years) | n in N} -- that's what I had in mind in my last paragraph above. But you're not required to share my preferences on framing the issue, of course :-))

Comment author: michael_webster2 27 February 2009 05:16:00AM 1 point [-]

There are two ways of thinking about the problem.

1. You see the problem as decision theorist, and see a conflict between the expected utility recommendation and the dominance principle. People who have seen the problem this way have been led into various forms of causal decision theory.

2. You see the problem as game theorist, and are trying to figure out the predictor's utility function, what points are focal and why. People who have seen the problem this way have been led into various discussions of tacit coordination.

Newcomb's scenario is a paradox, not meant to be solved, but rather explored in different directions. In its original form, much like the Monty Hall problem, Newcomb's scenario is not well stated to give rise to problem with a calculated solution.

This is not a criticism of the problem, indeed it is an ingenious little puzzle.

And there is much to learn from well defined Newcomb like problems.

Comment author: Tim_Tyler 02 March 2009 07:58:00PM 0 points [-]

Re: First, foremost, fundamentally, above all else: Rational agents should WIN.

When Deep Blue beat Gary Kasparov, did that prove that Gary Kasparov was "irrational"?

It seems as though it would be unreasonable to expect even highly rational agents to win - if pitted against superior competition. Rational agents can lose in other ways as well - e.g. by not having access to useful information.

Since there are plenty of ways in which rational agents can lose, "winning" seems unlikely to be part of a reasonable definition of rationality.

Comment author: WJ 05 March 2009 08:58:00AM 0 points [-]

I think I've solved it.

I'm a little late to this, and given the amount of time people smarter than myself have spent thinking about this it seems naive even to myself to think that I have found a solution to this problem. That being said, try as I might, I can't find a good counter argument to this line of reasoning. Here goes...

The human brain's function is still mostly a black box to us, but the demonstrated predictive power of this alien is strong evidence that this is not the case with him. If he really can predict human decisions, than the mere fact that you are choosing one box is the best way for you to ensure that will be what is predicted.

The standard attack on this line of reasoning seems to be that since his prediction happened in the past, your decision can't influence it. But it already has influenced it. He was aware of the decision before you made it (evidenced by his predictive power). In fact, it is not really a decision in the sense of "freely" choosing one of two options (in the way that most people use "freely" at least). Think of this decision as just extremely complicated and seemingly unpredictable data analysis, where the unpredictability comes from never being able to know intimately every part of the decision process and the inputs. But if one could perfectly crack the "black box" of your decision, as this alien appears to have done (at least this seems by far the most plausible explanation to me) then one could predict decisions with the accuracy the alien possesses. In other words, the gears were already in motion for your decision to be made, and the alien was already witness whether you realized it or not. In that sense you aren't making your decision afterwords when you think you are, you are actually realizing the decision that you were already set up to make at an earlier time.

If you agree with what I have written above, your obvious best decision is to just go ahead and pick one box, and hope that the alien would have predicted this. Based on the evidence, that will probably be enough to make the one million show up. Deciding instead to go for two boxes for any reason whatsoever will probably mean that the million won't be there. The time issue is just an illusion caused by your imperfect knowledge and data processing that takes time.

Comment author: Dave9 03 April 2009 04:02:00PM 1 point [-]

Cross-posting from Less Wrong, I think there's a generalized Russell's Paradox problem with this theory of rationality:


I don't think I buy this for Newcomb-like problems. Consider Omega who says, "There will be $1M in Box B IFF you are irrational."

Rationality as winning is probably subject to a whole family of Russell's-Paradox-type problems like that. I suppose I'm not sure there's a better notion of rationality.

Comment author: Unknown3 03 April 2009 05:49:00PM 0 points [-]

Eliezer, why didn't you answer the question I asked at the beginning of the comment section of this post?

Comment author: Dan_Moore 01 July 2009 08:43:52PM -1 points [-]

The 'delayed choice' experiments of Wheeler & others appear to show a causality that goes backward in time. So, I would take just Box B.

Comment author: gurgeh 17 July 2009 12:33:35PM 0 points [-]

I would use a true quantum random generator. 51% of the time I would take only one box. Otherwise I would take two boxes. Thus Omega has to guess that I will only take one box, but I have a 49% chance of taking home another $1000. My expected winnings will be $1000490 and I am per Eliezer's definition more rational than he.

Comment author: RobinZ 17 July 2009 02:52:51PM 2 points [-]

This is why I restate the problem to exclude the million when people choose randomly.

Comment author: avalot 20 July 2009 06:07:56AM *  0 points [-]

I'm a bit nervous, this is my first comment here, and I feel quite out of my league.

Regarding the "free will" aspect, can one game the system? My rational choice would be to sit right there, arms crossed, and choose no box. Instead, having thus disproved Omega's infallibility, I'd wait for Omega to come back around, and try to weasel some knowledge out of her.

Rationally, the intelligence that could model mine and predict my likely action (yet fail to predict my inaction enough to not bother with me in the first place), is an intelligence I'd like to have a chat with. That chat would be likely to have tremendously more utility for me than $1,000,000.

Is that a valid choice? Does it disprove Omega's infallibility? Is it a rational choice?

If messing with the question is not a constructive addition to the debate, accept my apologies, and flame me lightly, please.

Comment author: CronoDAS 20 July 2009 06:37:57AM 4 points [-]

Hi. This is a rather old post, so you might not get too many replies.

Newcomb's problem often comes with the caveat that, if Omega thinks you're going to game the system, it will leave you with only the $1,000. But yes, we like clever answers here, although we also like to consider, for the purposes of thought experiments, the least convenient possible world in which the loopholes we find have been closed.

Also, may I suggest visiting the welcome thread?

Comment author: joecode 20 July 2009 01:10:26PM 1 point [-]

I've come around to the majority viewpoint on the alien/Omega problem. It seems to be easier to think about when you pin it down a bit more mathematically.

Let's suppose the alien determines the probability of me one-boxing is p. For the sake of simplicity, let's assume he then puts the 1M into one of the boxes with this probability p. (In theory he could do it whenever p exceeded some thresh-hold, but this just complicates the math.)

Therefore, once I encounter the situation, there are two possible states:

a) with probability p there is 1M in one box, and 1k in the other

b) with probability 1-p there is 0 in one box, and 1k in the other So:

the expected return of two-boxing is p(1M+1k)+(1-p)1k = 1Mp + 1kp + 1k - 1kp = 1Mp + 1k

the expected return of one-boxing is 1Mp

If the act of choosing affects the prior determination p, then the expected return calculation differs depending on my choice:

If I choose to two-box, then p=~0, and I get about 1k on average

If I choose to one-box, then p=~1, and I get about 1M on average

In this case, the expected return is higher by one-boxing.

If choosing the box does not affect p, then p is the same in both expected return calculations. In this case, two boxing clearly has better expected return than one-boxing.

Of course if the determination of p is effected by the choice actually made in the future, you have a situation with reverse-time causality.

If I know that I am going to encounter this kind of problem, and it is somehow possible to pre-commit to one boxing before the alien determines the probability p of me doing so, that certainly makes sense. But it is difficult to see why I would maintain that commitment when the choice actually presents itself, unless I actually believe this choice effects p, which, again, implies reverse-time causality.

It seems the problem has been setup in a deliberately confusing manner. It is as if the alien has just decided to find people who are irrational and pay them 1M for it. The problem seems to encourage irrational thinking, maybe because we want to believe that rational people always win, when of course one can set up a fairly absurd situation so that they do not.

Comment author: Wei_Dai 08 August 2009 11:25:07PM *  9 points [-]

There is no finite amount of life lived N where I would prefer a 80.0001% probability of living N years to an 0.0001% chance of living a googolplex years and an 80% chance of living forever. This is a sufficient condition to imply that my utility function is unbounded.

Wait a second, the following bounded utility function can explain the quoted preferences:

  • U(live googolplex years) = 99
  • limit as N goes to infinity of U(live N years) = 100
  • U(live forever) = 101

Benja Fallenstein gave an alternative formulation that does imply an unbounded utility function:

For all n, there is an even larger n' such that (p+q)*u(live n years) < p*u(live n' years) + q*(live a googolplex years).

But these preferences are pretty counter-intuitive to me. If U(live n years) is unbounded, then the above must hold for any nonzero p, q, and with "googolplex" replaced by any finite number. For example, let p = 1/3^^^3, q = .8, n = 3^^^3, and replace "googolplex" with "0". Would you really be willing to give up .8 probability of 3^^^3 years of life for a 1/3^^^3 chance at a longer (but still finite) one? And that's true no matter how many up-arrows we add to these numbers?

Comment author: CarlShulman 21 August 2009 04:32:28AM 3 points [-]

"Would you really be willing to give up .8 probability of 3^^^3 years of life for a 1/3^^^3 chance at a longer (but still finite) one?"

I'd like to hear this too.

Comment author: Eliezer_Yudkowsky 21 August 2009 05:21:41AM 6 points [-]

Okay. There's two intuitive obstacles, my heuristic as a human that my mind is too weak to handle tiny probabilities and that I should try to live my life on the mainline, and the fact that 3^^^3 already extrapolates a mind larger than the sum of every future experience my present self can empathize with.

But I strongly suspect that answering "No" would enable someone to demonstrate circular / inconsistent preferences on my part, and so I very strongly suspect that my reflective equilibrium would answer "Yes". Even in the realm of the computable, there are simple computable functions that grow a heck of a lot faster than up-arrow notation.

Comment author: Wei_Dai 21 August 2009 06:16:27AM *  5 points [-]

Eliezer, would you be willing to bet all of your assets and future earnings against $1 of my money, that we can do an infinite amount of computation before the universe ends or becomes incapable of supporting life?

Your answer ought to be yes, if your preferences are what you state. If it turns out that we can do an infinite amount of computation before the universe ends, then this bet increases your money by $1, which allows you to increase your chance of having an infinite lifetime by some small but non-zero probability. If it turns out that our universe can't do an infinite amount of computation, you lose a lot, but the loss of expected utility is still tiny compared to what you gain.

So, is it a bet?

Also, why do you suspect that answering "No" would enable someone to demonstrate circular / inconsistent preferences on your part?

Comment author: Eliezer_Yudkowsky 21 August 2009 06:55:00AM 1 point [-]

So, is it a bet?

No for two reasons - first, I don't trust human reason including my own when trying to live one's life inside tiny probabilities of huge payoffs; second, I ordinarily consider myself an average utilitarian and I'm not sure this is how my average utilitarianism plays out. It's one matter if you're working within a single universe in which all-but-infinitesimal of the value is to be found within those lives that are infinite, but I'm not sure I would compare two differently-sized possible Realities the same way. I am not sure I am willing to say that a finite life weighs nothing in my utility function if an infinite life seems possible - though if both were known to coexist in the same universe, I might have to bite that bullet. (At the opposite extreme, a Bostromian parliament might assign both cases representative weight proportional to probability and let them negotiate the wise action.)

Also I have severe doubts about infinite ethics, but that's easily fixed using a really large finite number instead (pay everything if time < googolplex, keep $1 if time > TREE(100), return $1 later if time between those two bounds).

Also, why do you suspect that answering "No" would enable someone to demonstrate circular / inconsistent preferences on your part?

Keep growing the lifespan by huge computational factors, keep slicing near-infinitesimally tiny increments off the probability. (Is there an analogous inconsistency to which I expose myself by answering "No" to the bet above, from trying to treat alternative universes differently than side-by-side spatial reasons?)

Comment author: Wei_Dai 21 August 2009 07:47:04AM *  0 points [-]

It's one matter if you're working within a single universe in which all of the value is to be found within those lives that are infinite, but I'm not sure I would compare two differently-sized Realities the same way. I am not sure I am willing to say that a finite life weighs nothing in my utility function if an infinite life seems possible.

In that case, it's not that your utility function is unbounded in years lived, but rather your utility for each year lived is a decreasing function of the lifetime of the universe (or perhaps total lifetime of everyone in the universe).

I'll have to think if that makes sense.

Comment author: Eliezer_Yudkowsky 21 August 2009 07:44:28PM 2 points [-]

It's possible that I'm reasoning as if my utility function is over "fractions of total achievable value" within any given universe. I am not sure if there are any problems with this, even if it's true.

Comment author: Wei_Dai 21 August 2009 08:43:24PM 0 points [-]

After thinking about it, that doesn't make sense either. Suppose Omega comes to you and says that among the universes that you live in, there is a small fraction that will end in 5 years. He offers to kill you now in those universes, in exchange for granting you a googleplex years of additional life in a similar fraction of universes with time > TREE(100) and where you would have died in less than googleplex years without his help (and where others manage to live to TREE(100) years old if that makes any difference). Would you refuse?

Comment author: Eliezer_Yudkowsky 21 August 2009 09:53:22PM 2 points [-]

No. But here, by specification, you're making all the universes real and hence part of a larger Reality, rather than probabilities of which only a single one is real.

If there were only one Reality, and there were small probabilities of it being due to end in 5 years, or in a googolplex years, and the two cases seemed of equal probability, and Omega offered to destroy reality now if it were only fated to last 5 years, in exchange for extending its life to TREE(100) if it were otherwise fated to last a googolplex years... well, this Reality is already known to have lasted a few billion years, and through, say, around 2 trillion life-years, so if it is due to last only another 5 years the remaining 30 billion life-years are not such a high fraction of its total value to be lost - we aren't likely to do so much more in just another 5 years, if that's our limit; it seems unlikely that we'd get FAI in that time. I'd probably still take the offer. But I wouldn't leap at it.

Comment author: Wei_Dai 21 August 2009 10:33:50PM 0 points [-]

In that case, would you accept my original bet if I rephrase it as making all the universes part of a larger Reality? That is, if in the future we have reason to believe that Tegmark's Level 4 Multiverse is true, and find ourselves living in a universe with time < googolplex, then you'd give you all your assets and future earnings, in return for $1 of my money if we find ourselves living in a universe with time > TREE(100).

Comment author: stedwick 05 October 2009 11:31:04AM 1 point [-]

I really don't see what the problem is. Clearly, the being has "read your mind" and knows what you will do. If you are of the opinion to take both boxes, he knows that from his mind scan, and you are playing right into his hands.

Obviously, your decision cannot affect the outcome because it's already been decided what's in the box, but your BRAIN affected what he put in the box.

It's like me handing you an opaque box and telling you there is $1 million in it if and only if you go and commit murder. Then, you open the box and find it empty. I then offer Hannibal Lecter the same deal, he commits murder, and then opens the box and finds $1 million. Amazing? I don't think so. I was simply able to create an accurate psychological profile of the two of you.

Comment author: Vladimir_Nesov 05 October 2009 04:00:58PM 0 points [-]

The question is how to create a formal decision algorithm that will be able to understand the problem and give the right answer (without failing on other such tests). Of course you can solve it correctly if you are not yet poisoned by too much presumptuous philosophy.

Comment author: toto 18 November 2009 02:37:33PM 4 points [-]

I guess I'm missing something obvious. The problem seems very simple, even for an AI.

The way the problem is usually defined (omega really is omniscient, he's not fooling you around, etc.) there are only two solutions:

  • You take the two boxes, and Omega had already predicted that, meaning that there is nothing in Box B - you win 1000$

  • You take box B only, and Omega had already predicted that, meaning that there is 1M$ in box B - you win 1M$.

That's it. Period. Nothing else. Nada. Rien. Nichts. Sod all. These are the only two possible options (again, assuming the hypotheses are true). The decision to take box B only is a simple outcome comparison. It is a perfectly rational decision (if you accept the premises of the game).

Now the way Eliezer states it is different from the usual formulation. In Eliezer's version, you cannot be sure about Omega's absolute accuracy. All you know is his previous record. That does complicate things, if only because you might be the victim of a scam (e.g. like the well-known trick to convince comeone that you can consistently predict the winning horse in a 2-horse race - simply start with 2^N people, always give a different prediction to each half of them, discard those to whom you gave the wrong one, etc.)

At any rate, the other two outcomes that were impossible in the previous version (involving mis-prediction by Omega) are now possible, with a certain probability that you need to somehow ascertain. That may be difficult, but I don't see any logical paradox.

For example, if this happened in the real world, you might reason that the probability that you are being scammed is overwhelming in regard to the probability of existence of a truly omniscient predictor. This is a reasonable inference from the fact that we hear about scams every day, but nobody has ever reported such an omniscient predictor. So you would take both boxes and enjoy your expected $1000+epsilon (Omega may have been sincere but deluded, lucky in the previous 100 trials, and wrong in this one).

In the end, the guy who would win most (in expected value!) would not be the "least rational", but simply the one who made the best estimates for the probabilites of each outcome, based on his own knowledge of the universe (if you have a direct phone line to the Angel Gabriel, you will clearly do better).

What is the part that would be conceptually (as opposed to technically/practically) difficult for an algorithm?

Comment author: Geminii 22 December 2009 11:32:31AM 3 points [-]

I one-box, but not because I haven't considered the two-box issue.

I one-box because it's a win-win in the larger context. Either I walk off with a million dollars, OR I become the first person to outthink Omega and provide new data to those who are following Omega's exploits.

Even without thinking outside the problem, Omega is a game-breaker. We do not, in the problem as stated, have any information on Omega other than that they are superintelligent and may be able to act outside of casuality. Or else Omega is simply a superduperpredictor, to the point where (quantum interactions and chaos theory aside) all Omega-chosen humans have turned out to be correctly predictable in this one aspect.

Perhaps Omega is deliberately NOT chosing to test humans it can't predict. Or it is able to affect the local spacetime sufficiently to 'lock in' a choice even after it's physically left the area?

We can't tell. It's superintelligent. It's not playing on our field. It's potentially an external source of metalogic. The rules go out the window.

In short, the problem as described is not sufficiently constrained to presume a paradox, because it's not confining itself to a single logic system. It's like asking someone only familiar with non-imaginary numbers what the square root of negative one is. Just because they can't derive an answer doesn't mean you don't have one - you're using different number fields.

Comment author: simplicio 06 March 2010 06:15:53AM 0 points [-]

My solution to the problem of the two boxes:

Flip a coin. If heads, both A & B. If tails, only A. (If the superintelligence can predict a coin flip, make it a radioactive decay or something. Eat quantum, Hal.)

In all seriousness, this is a very odd problem (I love it!). Of course two boxes is the rational solution - it's not as if post-facto cogitation is going to change anything. But the problem statement seems to imply that it is actually impossible for me to choose the choice I don't choose, i.e., choice is actually impossible.

Something is absurd here. I suspect it's the idea that my choice is totally predictable. There can be a random element to my choice if I so choose, which kills Omega's plan.

Comment author: JGWeissman 06 March 2010 07:29:41AM 2 points [-]

It is a common assumption in these sorts of problems that if Omega predicts that you will condition your choice on a quantum event, it will not put the money in Box B.

See The Least Convenient Possible World.

Comment author: Kevin 06 March 2010 08:54:38AM *  2 points [-]

I suspect it's the idea that my choice is totally predictable

At face, that does sound absurd. The problem is that you are underestimating a superintelligence. Imagine that the universe is a computer simulation, so that a set of physical laws plus a very, very long string of random numbers is a complete causal model of reality. The superintelligence knows the laws and all of the random numbers. You still make a choice, even though that choice ultimately depends on everything that preceded it. See http://wiki.lesswrong.com/wiki/Free_will

I think much of the debate about Newcomb's Problem is about the definition of superintelligence.

Comment author: wedrifid 06 March 2010 09:54:26AM 2 points [-]

Of course two boxes is the rational solution - it's not as if post-facto cogitation is going to change anything.

No it isn't. If you like money it is rational to get more money. Take one box.

Comment author: ata 06 March 2010 10:01:25AM *  2 points [-]

What wedrifid said. See also Rationality is Systematized Winning and the section of What Do We Mean By "Rationality"? about "Instrumental Rationality", which is generally what we mean here when we talk about actions being rational or irrational. If you want to get more money, than the instrumentally rational action is the epistemically rational answer to the question "What course of action will cause me to get the most money?".

If you accept the premises of Omega thought experiments, then the right answer is one-boxing, period. If you don't accept the premises, it doesn't make sense for you to be answering it one way or the other.

Comment author: simplicio 06 March 2010 04:34:05PM 0 points [-]

I thought about this last night and also came to the conclusion that randomizing my choice would not "assume the worst" as I ought to.

And I fully accept that this is just a thought experiment & physics is a cheap way out. I will now take the premises or leave them. :)

Comment author: TobyBartels 22 July 2010 06:00:21AM *  1 point [-]

I'm not reading 127 comments, but as a newcomer who's been invited to read this page, along with barely a dozen others, as an introduction, I don't want to leave this unanswered, even though what I have to say has probably already been said.

First of all, the answer to Newcomb's Problem depends a lot on precisely what the problem is. I have seen versions that posit time travel, and therefore backwards causality. In that case, it's quite reasonable to take only one box, because your decision to do so does have a causal effect on the amount in Box B. Presumably causal decision theorists would agree.

However, in any version of the problem where there is no clear evidence of violations of currently known physics and where the money has been placed by Omega before my decisions, I am a two-boxer. Yet I think that your post above must not be talking about the same problem that I am thinking of, especially at the end. Although you never said so, it seems to me that you must be talking about a problem which says "If you choose Box B, then it will have a million dollars; if you choose both boxes, then Box B will be empty.". But that is simply not what the facts will be if Omega has made the decision in the past and currently understood physics applies. In the problem as stated, Omega may make mistakes in the future, and that makes all the difference.

It's presumptuous of me to assume that you're talking about a different problem from the one that you stated, I know. But as I read the psychological states that you suggest that I might have —that I might wish that I considered one-boxing rational, for example—, they seem utterly insane. Why would I wish such a thing? What does it have to do with anything? The only thing that I can wish for is that Omega has predicted that I will be a one-boxer, which has nothing to do with what I consider rational now.

The quotation from Joyce explains it well, up until the end, where poor phrasing may have confused you. The last sentence should read:

When Rachel wishes she was Irene's type she is wishing for Irene's circumstances, not wishing to make Irene's choice.

It is simply not true that Rachel envies Irene's choice. Rachel envies Irene's situation, the situation where there is a million dollars in Box B. And if Rachel were in that situation, then she would still take both boxes! (At least if I understand Joyce correctly.)

Possibly one thing that distinguishes me from one-boxers, and maybe even most two-boxers, is that I understand fundamental physics rather thoroughly and my prior has a very strong presumption against backwards causality. The mere fact that Omega has made successful predictions about Newcomb's Paradox will never be enough to overrule that. Even being superintelligent and coming from another galaxy is not enough, although things change if Omega (known to be superintelligent and honest) claims to be a time-traveller. Perhaps for some one-boxers, and even for some irrational two-boxers, Omega's past success at prediction is good evidence for backwards causality, but not for me.

So suppose that somebody puts two boxes down before me, presents convincing evidence for the situation as you stated it above (but no more), and goes away. Then I will simply take all of the money that this person has given me: both boxes. Before I open them, I will hope that they predicted that I will choose only one. After I open them, if I find Box B empty, then I will wish that they had predicted that I would choose only one. But I will not wish that I had chosen only one. And I certainly will not hope, beforehand, that I will choose only one and yet nevertheless choose two; that would indeed be irrational!

Comment author: Alicorn 22 July 2010 06:08:14AM 10 points [-]

You are disposed to take two boxes. Omega can tell. (Perhaps by reading your comment. Heck, I can tell by reading your comment, and I'm not even a superintelligence.) Omega will therefore not put a million dollars in Box B if it sets you a Newcomb's problem, because its decision to do so depends on whether you are disposed to take both boxes or not, and you are.

I am disposed to take one box. Omega can tell. (Perhaps by reading this comment. I bet you can tell by reading my comment, and I also bet that you're not a superintelligence.) Omega will therefore put a million dollars in Box B if it sets me a Newcomb's problem, because its decision to do so depends on whether I am disposed to take both boxes or not, and I'm not.

If we both get pairs of boxes to choose from, I will get a million dollars. You will get a thousand dollars. I will be monetarily better off than you.

But wait! You can fix this. All you have to do is be disposed to take just Box B. You can do this right now; there's no reason to wait until Omega turns up. Omega does not care why you are so disposed, only that you are so disposed. You can mutter to yourself all you like about how silly the problem is; as long as you wander off with just B under your arm, it will tend to be the case that you end the day a millionaire.

Comment author: cousin_it 22 July 2010 06:58:56AM *  6 points [-]

Sometime ago I figured out a refutation of this kind of reasoning in Counterfactual Mugging, and it seems to apply in Newcomb's Problem too. It goes as follows:

Imagine another god, Upsilon, that offers you a similar two-box setup - except to get the $2M in the box B, you must be a one-boxer with regard to Upsilon and a two-boxer with regard to Omega. (Upsilon predicts your counterfactual behavior if you'd met Omega instead.) Now you must choose your dispositions wisely because you can't win money from both gods. The right disposition depends on your priors for encountering Omega or Upsilon, which is a "bead jar guess" because both gods are very improbable. In other words, to win in such problems, you can't just look at each problem individually as it arises - you need to have the correct prior/predisposition over all possible predictors of your actions, before you actually meet any of them. Obtaining such a prior is difficult, so I don't really know what I'm predisposed to do in Newcomb's Problem if I'm faced with it someday.

Comment author: Alicorn 22 July 2010 07:08:04AM 0 points [-]

Something seems off about this, but I'm not sure what.

Comment author: cousin_it 22 July 2010 07:10:30AM *  0 points [-]

I'm pretty sure the logic is correct. I do make silly math mistakes sometimes, but I've tested this one on Vladimir Nesov and he agrees. No comment from Eliezer yet (this scenario was first posted to decision-theory-workshop).

Comment author: Alicorn 22 July 2010 07:11:52AM 1 point [-]

It reminds me vaguely of Pascal's Wager, but my cached responses thereunto are not translating informatively.

Comment author: cousin_it 22 July 2010 07:14:45AM *  1 point [-]

Then I think the original Newcomb's Problem should remind you of Pascal's Wager just as much, and my scenario should be analogous to the refutation thereof. (Thereunto? :-)

Comment author: Vladimir_Nesov 22 July 2010 07:17:55AM *  4 points [-]

This is not a refutation, because what you describe is not about the thought experiment. In the thought experiment, there are no Upsilons, and so nothing to worry about. It is if you face this scenario in real life, where you can't be given guarantees about the absence of Upsilons, that your reasoning becomes valid. But it doesn't refute the reasoning about the thought experiment where it's postulated that there are no Upsilons.

(Original thread, my discussion.)

Comment author: cousin_it 22 July 2010 07:35:46AM *  0 points [-]

Thanks for dropping the links here. FWIW, I agree with your objection. But at the very least, the people claiming they're "one-boxers" should also make the distinction you make.

Also, user Nisan tried to argue that various Upsilons and other fauna must balance themselves out if we use the universal prior. We eventually took this argument to email, but failed to move each other's positions.

Comment author: Vladimir_Nesov 22 July 2010 07:39:07AM *  0 points [-]

Just didn't want you confusing people or misrepresenting my opinion, so made everything clear. :-)

Comment author: toto 22 July 2010 09:16:49AM 0 points [-]

OK. I assume the usual (Omega and Upsilon are both reliable and sincere, I can reliably distinguish one from the other, etc.)

Then I can't see how the game doesn't reduce to standard Newcomb, modulo a simple probability calculation, mostly based on "when I encounter one of them, what's my probability of meeting the other during my lifetime?" (plus various "actuarial" calculations).

If I have no information about the probability of encountering either, then my decision may be incorrect - but there's nothing paradoxical or surprising about this, it's just a normal, "boring" example of an incomplete information problem.

you need to have the correct prior/predisposition over all possible predictors of your actions, before you actually meet any of them.

I can't see why that is - again, assuming that the full problem is explained to you on encountering either Upsilon or Omega, both are truhful, etc. Why can I not perform the appropriate calculations and make an expectation-maximising decision even after Upsilon-Omega has left? Surely Omega-Upsilon can predict that I'm going to do just that and act accordingly, right?

Comment author: cousin_it 22 July 2010 09:22:20AM *  0 points [-]

Yes, this is a standard incomplete information problem. Yes, you can do the calculations at any convenient time, not necessarily before meeting Omega. (These calculations can't use the information that Omega exists, though.) No, it isn't quite as simple as you state: when you meet Omega, you have to calculate the counterfactual probability of you having met Upsilon instead, and so on.

Comment author: Eliezer_Yudkowsky 23 July 2010 12:16:06AM 9 points [-]

Omega lets me decide to take only one box after meeting Omega, when I have already updated on the fact that Omega exists, and so I have much better knowledge about which sort of god I'm likely to encounter. Upsilon treats me on the basis of a guess I would subjunctively make without knowledge of Upsilon. It is therefore not surprising that I tend to do much better with Omega than with Upsilon, because the relevant choices being made by me are being made with much better knowledge. To put it another way, when Omega offers me a Newcomb's Problem, I will condition my choice on the known existence of Omega, and all the Upsilon-like gods will tend to cancel out into Pascal's Wagers. If I run into an Upsilon-like god, then, I am not overly worried about my poor performance - it's like running into the Christian God, you're screwed, but so what, you won't actually run into one. Even the best rational agents cannot perform well on this sort of subjunctive hypothesis without much better knowledge while making the relevant choices than you are offering them. For every rational agent who performs well with respect to Upsilon there is one who performs poorly with respect to anti-Upsilon.

On the other hand, beating Newcomb's Problem is easy, once you let go of the idea that to be "rational" means performing a strange ritual cognition in which you must only choose on the basis of physical consequences and not on the basis of correct predictions that other agents reliably make about you, so that (if you choose using this bizarre ritual) you go around regretting how terribly "rational" you are because of the correct predictions that others make about you. I simply choose on the basis of the correct predictions that others make about me, and so I do not regret being rational.

And these questions are highly relevant and realistic, unlike Upsilon; in the future we can expect there to be lots of rational agents that make good predictions about each other.

Comment author: cousin_it 23 July 2010 08:49:02AM *  0 points [-]

Pascal's Wagers, huh. So your decision theory requires a specific prior?

Comment author: Vladimir_Nesov 23 July 2010 10:35:24AM 0 points [-]

Omega lets me decide to take only one box after meeting Omega, when I have already updated on the fact that Omega exists, and so I have much better knowledge about which sort of god I'm likely to encounter.

In what sense can you update? Updating is about following a plan, not about deciding on a plan. You already know that it's possible to observe anything, you don't learn anything new about environment by observing any given thing. There could be a deep connection between updating and logical uncertainty that makes it a good plan to update, but it's not obvious what it is.

Comment author: EStokes 26 July 2010 11:12:37PM 1 point [-]

Huh? Updating is just about updating your map. (?) The next sentence I didn't understand the reasoning of, could you expand?

Comment author: andreas 27 July 2010 02:03:22AM 0 points [-]

Intuitively, the notion of updating a map of fixed reality makes sense, but in the context of decision-making, formalization in full generality proves elusive, even unnecessary, so far.

By making a choice, you control the truth value of certain statements—statements about your decision-making algorithm and about mathematical objects depending on your algorithm. Only some of these mathematical objects are part of the "real world". Observations affect what choices you make ("updating is about following a plan"), but you must have decided beforehand what consequences you want to establish ("[updating is] not about deciding on a plan"). You could have decided beforehand to care only about mathematical structures that are "real", but what characterizes those structures apart from the fact that you care about them?

Vladimir talks more about his crazy idea in this comment.

Comment author: TobyBartels 22 July 2010 08:09:12AM 1 point [-]

But wait! You can fix this. All you have to do is be disposed to take just Box B.

No, that's not what I should do. What I should do is make Omega think that I am disposed to take just Box B. If I can successfully make Omega think that I'll take only Box B but still take both boxes, then I should. But since Omega is superintelligent, let's take it as understood that the only way to make Omega think that I'll take only Box B is to make it so that I'll actually take Box B. Then that is what I should do.

But I have to do it now! (I don't do it now only because I don't believe that this situation will ever happen.) Once Omega has placed the boxes and left, if the known laws of physics apply, then it's too late!

If you take only Box B and get a million dollars, wouldn't you regret having not also taken Box A? Not only would you have gotten a thousand dollars more, you'd also have shown up that know-it-all superintelligent intergalactic traveller too! That's a chance that I'll never have, since Omega will read my comment here and leave my Box B empty, but you might have that chance, and if so then I hope you'll take it.

Comment author: Alicorn 22 July 2010 08:14:43AM *  2 points [-]

It's not really too late then. Omega can predict what you'll do between seeing the boxes, and choosing which to take. If this is going to include a decision to take one box, then Omega will put a million dollars in that box.

I will not regret taking only one box. It strikes me as inconsistent to regret acting as the person I most wish to be, and it seems clear that the person I most wish to be will take only one box; there is no room for approved regret.

Comment author: TobyBartels 22 July 2010 08:29:41AM *  0 points [-]

It's not really too late then.

If you say this, then you believe in backwards causality (or a breakdown of the very notion of causality, as in Kevin's comment below). I agree that if causality doesn't work, then I should take only Box B, but nothing in the problem as I understand it from the original post implies any violation of the known laws of physics.

If known physics applies, then Omega can predict all it likes, but my actions after it has placed the boxes cannot affect that prediction. There is always the chance that it predicts that I will take both boxes but I take only Box B. There is even the chance that it will predict that I will take only Box B but I take both boxes. Nothing in the problem statement rules that out. It would be different if that were actually impossible for some reason.

I will not regret taking only one box.

I knew that you wouldn't, of course, since you're a one-boxer. And we two-boxers will not regret taking both boxes, even if we find Box B empty. Better $1000 than nothing, we will think!

Comment author: Vladimir_Nesov 22 July 2010 08:39:44AM *  3 points [-]

If you say this, then you believe in backwards causality (or a breakdown of the very notion of causality, as in Kevin's comment below). I agree that if causality doesn't work, then I should take only Box B, but nothing in the problem as I understand it from the original post implies any violation of the known laws of physics.

Beware hidden inferences. Taboo causality.

Comment author: TobyBartels 22 July 2010 09:55:16AM *  1 point [-]

I don't see what that link has to do with anything in my comment thread. (I haven't read most of the other threads in reply to this post.)

I should explain what I mean by ‘causality’. I do not mean some metaphysical necessity, whereby every event (called an ‘effect’) is determined (or at least influenced in some asymmetric way) by other events (called its ‘causes’), which must be (or at least so far seem to be) prior to the effect in time, leading to infinite regress (apparently back to the Big Bang, which is somehow an exception). I do not mean anything that Aristotle knew enough physics to understand in any but the vaguest way.

I mean the flow of macroscopic entropy in a physical system.

The best reference that I know on the arrow of time is Huw Price's 1996 book Time's Arrow and Archimedes' Point. But actually I didn't understand how entropy flow leads to a physical concept of causality until several years after I read that, so that might not actually help, and I'm having no luck finding the Internet conversation that made it click for me.

But basically, I'm saying that, if known physics applies, then P(there is money in Box B|all information available on a macroscopic level when Omega placed the boxes) = P(there is money in Box B|all information … placed the boxes & I pick both boxes), even though P(I pick both boxes|all information … placed the boxes) < 1, because macroscopic entropy strictly increases between the placing of the boxes and the time that I finally pick a box.

So I need to be given evidence that known physics does not apply before I pick only Box B, and a successful record of predictions by Omega will not do that for me.

Comment author: FAWS 22 July 2010 12:04:02PM *  6 points [-]

If known physics applies, then Omega can predict all it likes, but my actions after it has placed the boxes cannot affect that prediction. There is always the chance that it predicts that I will take both boxes but I take only Box B. There is even the chance that it will predict that I will take only Box B but I take both boxes. Nothing in the problem statement rules that out. It would be different if that were actually impossible for some reason.

Ah, I see what the probem is. You have a confused notion of free will and what it means to make a choice.

Making a choice between two options doesn't mean there is a real chance that you might take either option (there always is at least an infinitesimal chance, but that it always true even for things that are not usefully described as a choice). It just means that attributing the reason for your taking whatever option you take is most usefully attributed to you (and not e.g. gravity, government, the person holding a gun to you head etc.). In the end, though, it is (unless the choice is so close that random noise makes the difference) a fact about you that you will make the choice you will make. And it is in principle possible for others to discover this fact about you.

If it is a fact about you that you will one-box it is not possible that you will two-box. If it is a fact about you that you will two-box it is not possible that you will one-box. If it is a fact about you that you will leave the choice up to chance then Omega probably doesn't offer you to take part in the first place.

Now, when deciding what choice to make it is usually most useful to pretend there is a real possibility of taking either option, since that generally causes facts about you that are more benefitial to you. And that you do that is just another fact about you, and influences the fact about which choice you make. Usually the fact which choice you will make has no consequences before you make your choice, and so you can model the rest of the world as being the same in either case up to that point when counterfactually considering the consequences of either choice. But the fact about which choice you will make is just another fact like any other, and is allowed, even if it usually doesn't, to have consequences before that point in time. If it does it is best, for the very same reason you pretend that either choice is a real possibility in the first place, to also model the rest of the world as different contingent on your choice. That doesn't mean backwards causality. Modeling the word in this way is just another fact about you that generates good outcomes.

Comment author: RobinZ 22 July 2010 11:52:45PM 4 points [-]

Alicorn:

It's not really too late then. Omega can predict what you'll do between seeing the boxes, and choosing which to take. If this is going to include a decision to take one box, then Omega will put a million dollars in that box.

TobyBartels:

If you say this, then you believe in backwards causality (or a breakdown of the very notion of causality, as in Kevin's comment below). I agree that if causality doesn't work, then I should take only Box B, but nothing in the problem as I understand it from the original post implies any violation of the known laws of physics.

I remember reading an article about someone who sincerely lacked respect for people who were 'soft' (not exact quote) on the death penalty ... before ending up on the jury of a death penalty case, and ultimately supporting life in prison instead. It is not inconceivable that a sufficiently canny analyst (e.g. Omega) could deduce that the process of being picked would motivate you to reconsider your stance. (Or, perhaps more likely, motivate a professed one-boxer like me to reconsider mine.)

Comment author: CarlShulman 23 July 2010 12:22:59AM *  2 points [-]

From Andy Egan.

The Psychopath Button: Paul is debating whether to press the ‘kill all psychopaths’ button. It would, he thinks, be much better to live in a world with no psychopaths. Unfortunately, Paul is quite confident that only a psychopath would press such a button. Paul very strongly prefers living in a world with psychopaths to dying. Should Paul press the button? (Set aside your theoretical commitments and put yourself in Paul’s situation. Would you press the button? Would you take yourself to be irrational for not doing so?)

Newcomb’s Firebomb: There are two boxes before you. Box A definitely contains $1,000,000. Box B definitely contains $1,000. You have two choices: take only box A (call this one-boxing), or take both boxes (call this two-boxing). You will signal your choice by pressing one of two buttons. There is, as usual, an uncannily reliable predictor on the scene. If the predictor has predicted that you will two-box, he has planted an incendiary bomb in box A, wired to the two-box button, so that pressing the two-box button will cause the bomb to detonate, burning up the $1,000,000. If the predictor has predicted that you will one-box, no bomb has been planted – nothing untoward will happen, whichever button you press. The predictor, again, is uncannily accurate.

I would suggest looking at your implicit choice of counterfactuals and their role in your decision theory. Standard causal decision theory involves local violations of the laws of physics (you assign probabilities to the world being such that you'll one-box, or such that you'll one-box, and then ask what miracle magically altering your decision, without any connection to your psychological dispositions, etc, would deliver the highest utility). Standard causal decision theory is a normative principle for action, that says to do the action that would deliver the most utility if a certain kind of miracle happened. But you can get different versions of causal decision theory by substituting different sorts of miracles, e.g. you can say: "if I one-box, then I have a psychology that one-boxes, and likewise for two-boxing" so you select the action such that a miracle giving you the disposition to do so earlier on would have been better. Yet another sort of counterfactual that can be hooked up to the causal decision theory framework would go "there's some mathematical fact about what decision(decisions given Everett) my brain structure leads to in standard physics, and the predictor has access to this mathematical info, so I'll select the action that would be best brought about by a miracle changing that mathematical fact".

Comment author: Kevin 22 July 2010 06:11:22AM 1 point [-]

You underestimate the meaning of superintelligence. One way of defining a superintelligence that wins at Newcomb without violating causality, is to assume that the universe is computer simulation like, such that it can be defined by a set of physical laws and a very long string of random numbers. If Omega knows the laws and random numbers that define the universe, shouldn't Omega be able to predict your actions with 100% accuracy? And then wouldn't you want to choose the action that results in you winning a lot more money?

Comment author: TobyBartels 22 July 2010 08:13:31AM *  1 point [-]

So part of the definition of a superintelligence is that the universe is like that and Omega knows all that? In other words, if I have convincing evidence that Omega is superintelligent, then I must have convincing evidence that the universe is a computer simulation, etc? Then that changes things; just as the Second Law of Thermodynamics doesn't apply to Maxwell's Demon, so the law of forward causality (which is actually a consequence of the Second Law, under the assumption of no time travel) doesn't apply to a superintelligence. So yes, then I would pick only Box B.

This just goes to show how important it is to understand exactly what the problem states.

Comment author: nhamann 22 July 2010 09:03:43AM 3 points [-]

The computer simulation assumption isn't necessary, the only thing that matters is that Omega is transcendentally intelligent, and it has all the technology that you might imagine a post-Singularity intelligence might have (we're talking Shock Level 4). So Omega scans your brain by using some technology that is effectively indistinguishable from magic, and we're left to assume that it can predict, to a very high degree of accuracy, whether you're the type of person who would take one or two boxes.

Omega doesn't have to actually simulate your underlying physics, it just needs a highly accurate model, which seems reasonably easy to achieve for a superintelligence.

Comment author: TobyBartels 22 July 2010 10:07:37AM 1 point [-]

If its model is good enough that it violates the Second Law as we understand it, fine, I'll pick only Box B, but I don't see anything in the problem statement that implies this. The only evidence that I'm given is that it's made a run of perfect predictions (of unknown length!), is smarter than us, and is from very far away. That's not enough for new physics.

And just having a really good simulation of my brain, of the sort that we could imagine doing using known physics but just don't have the technical capacity for, is definitely not good enough. That makes the probability that I'll act as predicted very high, but I'll still come out worse if, after the boxes have been set, I'm unlucky enough to only pick Box B anyway (or come out better if I'm lucky enough to pick both boxes anyway, if Omega pegs me for a one-boxer).

Comment author: FAWS 22 July 2010 11:17:55AM *  7 points [-]

If its model is good enough that it violates the Second Law as we understand it [...]

It doesn't have to be even remotely close to good enough to that for the scenario. I'd bet a sufficiently good human psychologist could take omega's role and get it 90%+ right if he tests and interviews the people extensively first (without them knowing the purpose) and gets to exclude people he is unsure about. A super intelligent being should be far, far better at this.

You yourself claim to know what you would do in the boxing experiment, and you are an agent limited by conventional physics. There is no physical law that forbids another agent from knowing you as well as (or even better than) you know yourself.

You'll have to explain why you think 99.99% (or whatever) is not good enough, a 0.01% chance to win $ 1000 shouldn't make up for a 99.99% chance of losing $999,000.

Comment author: TobyBartels 23 July 2010 09:57:26AM *  1 point [-]

Thanks for the replies, everybody!

This is a global response to several replies within my little thread here, so I've put it at nearly the top level. Hopefully that works out OK.

I'm glad that FAWS brought up the probabilistic version. That's because the greater the probability that Omega makes mistakes, the more inclined I am to take two boxes. I once read the claim that 70% of people, when told Newcomb's Paradox in an experiment, claim to choose to take only one box. If this is accurate, then Omega can achieve a 70% level of accuracy by predicting that everybody is a one-boxer. Even if 70% is not accurate, you can still make the paradox work by adjusting the dollar amounts, as long as the bias is great enough that Omega can be confident that it will show up at all in the records of its past predictions. (To be fair, the proportion of two-boxers will probably rise as Omega's accuracy falls, and changing the stakes should also affect people's choices; there may not be a fixed point, although I expect that there is.)

If, in addition to the problem as stated (but with only 70% probability of success), I know that Omega always predicts one-boxing, then (hopefully) everybody agrees that I should take both boxes. There needs to some correlation between Omega's predictions and the actual outcomes, not just a high proportion of past successes.

FAWS also writes:

You yourself claim to know what you would do in the boxing experiment

Actually, I don't really want to make that claim. Although I've written things like ‘I would take both boxes’, I really should have written ‘I should take both boxes’. I'm stating a correct decision, not making a prediction about my actual actions. Right now, I predict about a 70% chance of two-boxing given the situation as stated in the original post, although I've never tried to calculate my estimates of probabilities, so who knows what that really means. (H'm, 70% again? Nope, I don't trust that calibration at all!)

FAWS writes elsewhere:

Making a choice between two options […] just means that attributing the reason for your taking whatever option you take is most usefully attributed to you (and not e.g. gravity, government, the person holding a gun to you head etc.).

I don't see what the gun has to do with it; this is a perfectly good problem in decision theory:

  • Suppose that you have a button that, if pressed, will trigger a bomb that kills two strangers on the other side of the world. I hold a gun to your head and threaten to shoot you if you don't press the button. Should you press it?

A person who presses the button in that situation can reasonably say afterwards ‘I had no choice! Toby held a gun to my head!’, but that doesn't invalidate the question. Such a person might even panic and make the question irrelevant, but it's still a good question.

If it is a fact about you that you will leave the choice up to chance then Omega probably doesn't offer you to take part in the first place.

So that's how Omega gets such a good record! (^_^)

Understanding the question really is important. I've been interpreting it something along these lines: you interrupt your normal thought processes to go through a complete evaluation of the situation before you, then see what you do. (This is exactly what you cannot do if you panic in the gun problem above.) So perhaps we can predict with certain accuracy that an utter bigot will take one course of action, but that is not what the bigot should do, nor is it what they will do if they discard their prejudices and decide afresh.

Now that I think about it, I see some problems with this interpretation, and also some refinements that might fix it. (The first thing to do is to make it less dependent on the specific person making the decision.) But I'll skip the refinements. It's enough to notice that Omega might very well predict that a person will not take the time to think things through, so there is poor correlation between what one should do and what Omega will predict, even though the decision is based on what the world would be like if one did take the time.

I still think that (modulo refinements) this is a good interpretation of what most people would mean if they tell a story and then ask ‘What should this person do?’. (I can try to defend that claim if anybody still wants me to after they finish this comment.) In that case, I stand by my decision that one should take both boxes, at least if there is no good evidence of new physics.

However, I now realise that there is another interpretation, which is more practical, however much the ordinary person might not interpret things this way. That is: sit down and think through the whole situation now, long before you are ever faced with it in real life, and decide what to do. One obvious benefit of this is that when I hold a gun to your head, you won't panic, because you will be prepared. More generally, this is what we are all actually doing right now! So as we make these idle philosophical musings, let's be practical, and decide what we'll do if Omega ever offers us this deal.

In this case, I agree that I will be better off (given the extremely unlikely but possible assumption that I am ever in this situation) if I have decided now to take only Box B. As RobinZ points out, I might change my mind later, but that can't be helped (and to a certain extent shouldn't be helped, since it's best if I take two boxes after Omega predicts that I'll only take one, but we can't judge that extent if Omega is smarter than us, so really there's no benefit to holding back at all).

If Omega is fallible, then the value of one-boxing falls drastically, and even adjusting the amount of money doesn't help in the end; once Omega's proportion of past success matches the observed proportion in experiments (or whatever our best guess of the actual proportion of real people is), then I'm back to two-boxing, since I expect that Omega simply always predicts one-boxing.

In hindsight, it's obvious that the the original post was about decision in this sense, since Eliezer was talking about an AI that modifies its decision procedures in anticipation of facing Omega in the future. Similarly, we humans modify our decision procedures by making commitments and letting ourselves invent rationalisations for them afterwards (although the problem with this is that it makes it hard to change our minds when we receive new information). So obviously Eliezer wants us to decide now (or at least well ahead of time) and use our leet Methods of Rationality to keep the rationalisations in check.

So I hereby decide that I will pick only one box. (You hear that, Omega!?) Since I am honest (and strongly doubt that Omega exists), I'll add that I may very well change my mind if this ever really happens, but that's about what I would do, not what I should do. And in a certain sense, I should change my mind … then. But in another sense, I should (and do!) choose to be a one-boxer now.

(Thanks also to CarlShulman, whom I haven't quoted, but whose comment was a big help in drawing my attention to the different senses of ‘should’, even though I didn't really adopt his analysis of them.)

Comment author: nhamann 26 July 2010 05:54:58PM *  7 points [-]

If Omega is fallible, then the value of one-boxing falls drastically, and even adjusting the amount of money doesn't help in the end;

Assume Omega has a probability X of correctly predicting your decision:

If you choose to two-box:
- X chance of getting $1000
- (1-X) chance of getting $1,001,000

If you choose to take box B only:
- X chance of getting $1,000,000
- (1-X) chance of getting $0

Your expected utilities for two-boxing and one-boxing are (respectively):

E2 = 1000X + (1-X)1001000
E1 = 1000000X

For E2 > E1, we must have 1000X + 1,001,000 - 1,001,000X - 1,000,000X > 0, or 1,001,000 > 2,000,000X, or

X < 0.5005

So as long as Omega can maintain a greater than 50% accuracy, you should expect to earn more money by one-boxing. Since the solution seems so simple, and since I'm a total novice at decision theory, it's possible I'm missing something here, so please let me know.

Comment author: RobinZ 26 July 2010 09:00:46PM *  3 points [-]

Wait - we can't assume that the probability of being correct is the same for two-boxing and one-boxing. Suppose Omega has a probability X of predicting one when you choose one and Y of predicting one when you choose two.

E1 = E($1 000 000) * X E2 = E($1 000) + E($1 000 000) * Y

The special case you list corresponds to Y = 1 - X, but in the general case, we can derive that E1 > E2 implies

X > Y + E($1 000) / E($1 000 000)

If we assume linear utility in wealth, this corresponds to a difference of 0.001. If, alternately, we choose a median net wealth of $93 100 (the U.S. figure) and use log-wealth as the measure of utility, the required difference increases to 0.004 or so. Either way, unless you're dead broke (e.g. net wealth $1), you had better be extremely confident that you can fool the interrogator before you two-box.

Comment author: TobyBartels 28 July 2010 04:59:07AM 2 points [-]

So as long as Omega can maintain a greater than 50% accuracy, you should expect to earn more money by one-boxing. Since the solution seems so simple, and since I'm a total novice at decision theory, it's possible I'm missing something here, so please let me know.

Your caclulation is fine. What you're missing is that Omega has a record of 70% accuracy because Omega always predicts that a person will one-box and 70% of people are one-boxers. So Omega always puts the million dollars in Box B, and I will always get $1,001,000$ if I'm one of the 30% of people who two-box.

At least, that is a possibility, which your calculation doesn't take into account. I need evidence of a correlation between Omega's predictions and the participants' actual behaviour, not just evidence of correct predictions. My prior probability distribution for how often people one-box isn't even concentrated very tightly around 70% (which is just a number that I remember reading once as the result of one survey), so anything short of a long run of predictions with very high proportion of correct ones will make me suspect that Omega is pulling a trick like this.

So the problem is much cleaner as Eliezer states it, with a perfect record. (But if even that record is short, I won't buy it.)

Comment author: TobyBartels 28 July 2010 05:08:23AM 1 point [-]

Oops, I see that RobinZ already replied, and with calculations. This shows that I should still remove the word ‘drastically’ from the bit that nhamann quoted.

Comment author: keddaw 03 August 2010 03:31:34PM *  1 point [-]

There is a good chance I am missing something here, but from an economic perspective this seems trivial:

P(Om) is the probability the person assigns Omega of being able to accurately predict their decision ahead of time.

A. P(Om) x $1m is the expected return from opening one box.

B. (1 - P(Om))x$1m + $1000 is the expected return of opening both boxes (the probability that Omega was wrong times the million plus the thousand.)

Since P(Om) is dependent on people's individual belief about Omega's ability to predict their actions it is not surprising different people make different decisions and think they are being rational - they are!

If A > B they choose one box, if B > A they choose both boxes.

This also shows why people will change their views if the amount in the visible box is changed (to $990,000 or $10).

Basically, in this instance, if you think the probability of Omega being able to determine your future action is greater than 0.5005 then you select a single box, if less than that you select both boxes. At P(Om)=0.5005 the expected return of both strategies is $500,500.

EDIT. I think I oversimplified B, but the point still stands. nhamann - I didn't see your post before writing mine. I think the only difference between them is that I state that it is a personal view of the probability of Omega being able to predict choices and you seem to want to use the actual probability that he can.

Comment author: timtyler 18 August 2010 06:22:47AM 0 points [-]

Re: "Do you take both boxes, or only box B?"

It would sure be nice to get hold of some more data about the "100 observed occasions so far". If Omega only visits two-boxers - or tries to minimise his outgoings - it would be good to know that. Such information might well be accessible - if we have enough information about Omega to be convinced of his existence in the first place.

Comment author: roryokane 22 August 2010 11:00:30PM *  0 points [-]

What this is really saying is “if something impossible (according to your current theory of the world) actually happens, then rather than insisting it’s impossible and ignoring it, you should revise your theory to say that’s possible”. In this case, the impossible thing is reverse causality; since we are told of evidence that reverse causality has happened in the form of 100 successful previous experiments, we must revise our theory to accept that reverse causality actually can happen. This would lead us to the conclusion that we should take one box. Alternatively, we could decide that our supposed evidence is untrustworthy and that we are being lied to when we are told that Omega made 100 successful predictions – we might think that this problem describes a nonsensical, impossible situation, similarly to if we were told that there was a barber who shaves everyone who does not shave themself.

Comment author: jschulter 28 August 2010 09:43:54PM *  0 points [-]

The link to that thesis doesn't seem to work for me.

A quick google turned up one that does

Comment author: Eoghanalbar 03 September 2010 03:26:55AM -1 points [-]

You know, I honestly don't even understand why this is a point of debate. One boxing and taking box B (and being the kind of person who will predictably do that) seem so obviously like the rational strategy that it shouldn't even require explanation.

And not obvious in the same way most people think the monty hill problem (game show, three doors, goats behind two, sports-car behind one, ya know?) seems 'obvious' at first.

In the case of the monty hill problem, you play with it, and the cracks start to show up, and you dig down to the surprising truth.

In this case, I don't see how anyone could see and cracks in the first place.

Am I missing something here?

Comment author: wedrifid 03 September 2010 03:49:44AM 0 points [-]

You know, I honestly don't even understand why this is a point of debate. One boxing and taking box B (and being the kind of person who will predictably do that) seem so obviously like the rational strategy that it shouldn't even require explanation.

It is the obvious rational strategy... which is why using a decision theory that doesn't get this wrong is important.

Comment author: Eoghanalbar 03 September 2010 04:01:38AM 0 points [-]

Yup yup, you're right, of course.

What I was trying to say, then, is that I don't understand why there's any debate about the validity of a decision theory that gets this wrong. I'm surprised everyone doesn't just go, "Oh, obviously any decision theory that says two-boxing is 'rational' is an invalid theory."

I'm surprised that this is a point of debate. I'm surprised, so I'm wondering, what am I missing?

Did I manage to make my question clearer like that?

Comment author: Sniffnoy 03 September 2010 04:52:43AM *  2 points [-]

I can say that for me personally, the hard part - that I did not get past till reading about it here - was noticing that there is actually such a variable as "what decision theory to use"; using a naive CDT sort of thing simply seemed rational /a priori/. Insufficient grasp of the nameless virtue, you could say.

Comment author: Eoghanalbar 03 September 2010 04:56:22AM 0 points [-]

Meaning you're in the same boat as me? Confused as to why this ever became a point of debate in the first place?

Comment author: Sniffnoy 03 September 2010 05:05:01AM 0 points [-]

...no? I didn't realize that the decision theory could be varied, that the obvious decision theory could be invalid, so I hit a point of confusion with little idea what to do about it.

Comment author: Eoghanalbar 03 September 2010 05:13:14AM 0 points [-]

But you're not saying that you would ever have actually decided to two-box rather than take box B if you found yourself in that situation, are you?

I mean, you would always have decided, if you found yourself in that situation, that you were the kind of person Omega would have predicted to choose box B, right?

I am still so majorly confused here. :P

Comment author: Sniffnoy 03 September 2010 05:16:53AM 1 point [-]

I have no idea! IIRC I leaned towards one-boxing, but I was honestly confused about it.

Comment author: Eoghanalbar 03 September 2010 06:12:06AM 0 points [-]

Ahah. So do you remember if you were confused in yourself, for reasons generated by your own brain, or just by your knowledge that some experts were saying two-boxing was the 'rational' strategy?

Comment author: wedrifid 03 September 2010 04:54:01AM *  1 point [-]

It's a good question. You aren't missing anything. And "people are crazy, the world is mad" isn't always sufficient. ;)

Comment author: Eoghanalbar 03 September 2010 05:03:04AM 1 point [-]

Ha! =]

Okay, I DO expect to see lots of 'people are crazy, the world is mad' stuff, yeah, I just wouldn't expect to see it on something like this from the kind of people who work on things like Causal Decision Theory! :P

So I guess what I really want to do first is CHECK which option is really most popular among such people: two-boxing, or predictably choosing box B?

Problem is, I'm not sure how to perform that check. Can anyone help me there?

Comment author: wedrifid 03 September 2010 06:34:44AM *  2 points [-]

It is fairly hard to perform such checks. We don't have many situations which are analogous to Newcomb's problem. We don't have perfect predictors and most situations humans are in can be considered "iterated". At least, we can consider most people to be using their 'iterated' reasoning by mistake when we put them in once off situations.

The closest analogy that we can get reliable answers out of is the 'ultimatum game' with high stakes... in which people really do refuse weeks worth of wages.

By the way, have you considered what you would do if the boxes were transparent? Just sitting there. Omega long gone and you can see piles of cash in front of you... It's tricky. :)

Comment author: Eoghanalbar 11 September 2010 11:38:53PM 0 points [-]

Thanks, but I meant not a check on what these CDT-studying-type people would DO if actually in that situation, but a check on whether they actually say that two-boxing would be the "rational" thing to do in that hypothetical situation.

I haven't considered you transparency question, no. Does that mean Omega did exactly what he would have done if the boxes were opaque, except that they are in fact transparent (a fact that did not figure into the prediction)? Because in that case I'd just see the million in B, and the thousand in A, and of course take 'em both.

Otherwise, Omega should be able to predict as well as me that, if I knew the rules of this game were that, if I decided to predictably choose to take only box B and leave A alone, box B would contain a million, and both boxes are transparent (and this transparency is figured into the prediction), I would expect to see a million in box B, take it, and just walk away from the paltry thousand in A.

This make sense?

Comment author: timtyler 03 September 2010 07:32:58AM *  0 points [-]

I think this is the position of classical theorists on self-modifiying agents:

From Rationality, Dispositions, and the Newcomb Paradox:

I conclude that the rational action for a player in the Newcomb Paradox is taking both boxes, but that rational agents will usually take only one box because they have rationally adopted the disposition to do so.''

They agree that agents who can self-modify will take one box. But they call that action "irrational". So, the debate really boils down to the definition of the term "rational" - and is not really concerned with the decision that rational agents who can self-modifiy will actually take.

If my analysis here is correct, the dispute is really all about terminology.

Comment author: RobinZ 04 September 2010 02:26:15PM 2 points [-]

One factor you may not have considered: the obvious rational metastrategy is causal decision theory, and causal decision theory picks the two-box strategy.

Comment author: Pavitra 04 September 2010 03:00:34PM 1 point [-]

I don't follow. Isn't it precisely on the meta-strategy level that CDT becomes obviously irrational?

Comment author: Alicorn 04 September 2010 03:14:16PM 2 points [-]

I think what RobinZ means is that you want to choose a strategy such that having that strategy will causally yield nice things. Given that criterion, object-level CDT fails; but one uses a causal consideration to reject it.

Comment author: RobinZ 04 September 2010 05:47:58PM *  2 points [-]

Key word is "obvious". If you say, "how should you solve games?", the historical answer is "using game theory", and when you say, "what does game theory imply for Newcomb's dilemma?", the historical answer is "two-box". It takes an additional insight to work out that a better metastrategy is possible, and things which take an additional insight are no longer obvious, true or no.

Edit: Alternatively: When I said "metastrategy", I meant one level higher than "two-boxing" - in other words, the level of decision theory. (I'm not sure which of the two objections you were raising.)

Comment author: Sniffnoy 04 September 2010 09:29:32PM 1 point [-]

This is basically what I was trying to point out. :)

Comment author: Xan 13 October 2010 11:43:49AM 2 points [-]

Mr Eliezer, I think you've missed a few points here. However, I've probably missed more. I apologise for errors in advance.

  1. To start with, I speculate than any system of decision making consistently gives the wrong results on a specific problem. The whole point of decision theory is finding principles which usually end up with a better result. As such, you can always formulate a situation in which it gives the wrong answer: maybe one of the facts you thought you knew was incorrect, and led you astray. (At the very least, Omega may decide to reward only those who have never heard of a particular brand of decision theory.)

It's like with file compression. In bitmaps, there are frequently large areas with similar colour. With this fact we can design a system that writes that taking less space. However, if we then try to compress a random bitmap, it will take more space than before the compression. Same thing with human minds. They work simply and relatively efficiently, but there's a whole field dedicated to finding flaws in its method. If you use causal decision theory, you sacrifice your ability at games against superhuman creatures that can predict the future, in return for better decision making when that isn't the case. That seems like a reasonably fair trade-off to me. Any theory which gets this one right opens itself to either getting another one wrong, or being more complex and thus harder for a human to use correctly.

  1. The scientific method and what I know of rationality make the initial assumption that your belief does not affect how the world works. "If a phenomenon feels mysterious, that is a fact about our state of knowledge, not a fact about the phenomenon itself." etc. However, this isn't something which we can actually know.

Some Christians believe that if you pray over someone with faith, they will be immediately healed. If that is true, rationalists are at a disadvantage, because they aren't as good at self delusion or doublethink as the untrained. They might never end up finding out that truth. I know that religion is the mind killer too, I'm just using the most common example of the supremely effective standard method being unable to deal with an idea. It's necessarily incomplete.

  1. I don't agree with you that "reason" means "choosing what ends up with the most reward". You're mixing up means and end. Arguing against a method of decision making because it comes up with the wrong answer to a specific case is like complaining that mp3 compression does a lousy job of compressing silence. I don't think that reason can be the only tool used, just one of them

Incidentally, I would totally only take the $1000 box, and claim that Omega told me I had won immortality, to confuse all decision theorists involved.

Comment author: Vladimir_Nesov 13 October 2010 12:04:28PM 2 points [-]

See chapters 1-9 of this document for a more detailed treatment of the argument.

Comment author: [deleted] 04 November 2010 01:32:47AM 3 points [-]

An analogy occurs to me about "regret of rationality."

Sometimes you hear complaints about the Geneva Convention during wartime. "We have to restrain ourselves, but our enemies fight dirty. They're at an advantage because they don't have our scruples!" Now, if you replied, "So are you advocating scrapping the Geneva Convention?" you might get the response "No way. It's a good set of rules, on balance." And I don't think this is an incoherent position: he approves of the rule, but regrets the harm it causes in this particular situation.

Rules, almost by definition, are inconvenient in some situations. Even a rule that's good on balance, a rule you wouldn't want to discard, will sometimes have negative consequences. Otherwise there would be no need to make it a rule! "Don't fool yourself into believing falsehoods" is a good rule. In some situations it may hurt you, when a delusion might have been happier. The hurt is real, even if it's outbalanced in the long run and in expected value. The regret is real. It's just local.

Comment author: manticangel 17 November 2010 04:34:10AM *  1 point [-]

"Verbal arguments for one-boxing are easy to come by, what's hard is developing a good decision theory that one-boxes"

First, the problem needs a couple ambiguities resolved, so we'll use three assumptions: A) You are making this decision based on a deterministic, rational philosophy (no randomization, external factors, etc. can be used to make your decision on the box) B) Omega is in fact infallible C) Getting more money is the goal (i.e. we are excluding decision-makers which would prefer to get less money, and other such absurdities)

Changing any of these results in a different game (either one that depends on how Omega handles random strategies, or one which depends on how often Omega is wrong - and we lack information on either)

Second, I'm going to reframe the problem a bit: Omega comes to you and has you write a decision-making function. He will evaluate the function, and populate Box B according to his conclusions on what the function will result in. The function can be self-modifying, but must complete in finite time. You are bound to the decision made by the actual execution of this function.

I can't think of any argument as to why this reframing would produce different results, given both Assumptions A and B as true. I feel this is a valid reframing because, if we assume Omega is in fact infallible, I don't see this as being any different from him evaluating the "actual" decision making function that you would use in the situation. Certainly, you're making a decision that can be expressed logically, and presumably you have the ability to think about the problem and modify your decision based on that contemplation (i.e. you have a decision-making function, and it can be self-modifying). If your decision function is somehow impossible to render mathematically, then I'd argue that Assumption A has been violated and we are, once again, playing a different game. If your decision function doesn't halt in finite time, then your payoff is guaranteed to be $0, since you will never actually take either box >.>


Given this situation, the AI simply needs to do two things: Identify that the problem is Newcombian and then identify some function X that produces the maximum expected payoff.

Identifying the problem as Newcombian should be trivial, since "awareness that this is a Newcombian problem" is a requirement of it being a Newcombian problem (if Omega didn't tell you what was in the boxes, it would be a different game, neh?)

Identifying the function X is well beyond my programming ability, but I will assert definitively that there is no function that produces a highe expected payoff than f(Always One-Box). If I am proven wrong, I dare say the person writing that proof will probably be able to cash in to a rather significant payoff :)


Keep in mind that the decision function can self-modify, but Omega can also predict this. The function "commit to One-Box until Omega leaves, then switch to Two-Box because it'll produce a higher gain now that Omega has made his prediction" would, obviously, have Omega conclude you'll be Two-Boxing and leave you with $0.

I honestly cannot find anything about this that would be overly difficult to program, assuming you already had an AI that could handle game theory problems (I'm assuming said AI is very, very difficult, and is certainly beyond my ability).

Given this reframing, f(Always One-Box) seems like a fairly trivial solution, and neither paradoxical nor terribly difficult to represent mathematically... I'm going to assume I'm missing something, since this doesn't seem to be the concensus conclusion at all, but since neither me nor my friend can figure out any faults, I'll go ahead and make this my first post on LessWrong and hope that it's not buried in obscurity due to this being a 2 year old thread :)

Comment author: Perplexed 17 November 2010 05:29:37AM *  2 points [-]

Rather than transforming the problem in the way you did, transform it so that you move first - Omega doesn't put money in the boxes until you say which one(s) you want.

Given this reframing, f(Always One-Box) seems like a fairly trivial solution, and neither paradoxical nor terribly difficult to represent mathematically...

As a decision problem, Newcomb's problem is rather pointless, IMHO. As a thought experiment helping us to understand the assumptions that are implicit in game theory, it could be rather useful. The thought experiment shows us that when a problem statement specifies a particular order of moves, what is really being specified is a state of knowledge at decision time. When a problem specifies that Omega moves first that is implicitly in contradiction to the claim that he knows what you will do when you move second. The implicit message is that Omega doesn't know - the explicit message is that he does. If the explicit message is to be believed, then change the move order to make the implicit message match the explicit one.

However, here, many people seem to prefer to pretend that Newcomb problems constitute a decision theory problem which requires clever solution, rather than a bit of deliberate confusion constructed by violating the implicit rules of the problem genre.

Comment author: Normal_Anomaly 23 November 2010 01:10:07PM 0 points [-]

A way of thinking of this "paradox" that I've found helpful is to see the two-boxer as imagining more outcomes than there actually are. For a payoff matrix of this scenario, the two-boxer would draw four possible outcomes: $0, $1000, $1000000, and $1001000 and would try for $1000 or $1001000. But if Omega is a perfect predictor, than the two that involve it making a mistake ($0 and $1001000) are very unlikely. The one-boxer sees only the two plausible options and goes for $1000000.

Comment author: Polymeron 08 December 2010 02:32:34PM *  4 points [-]

It took me a week to think about it. Then I read all the comments, and thought about it some more. And now I think I have this "problem" well in hand. I also think that, incidentally, I arrived at Eliezer's answer as well, though since he never spelled it out I can't be sure.

To be clear - a lot of people have said that the decision depends on the problem parameters, so I'll explain just what it is I'm solving. See, Eliezer wants our decision theory to WIN. That implies that we have all the relevant information - we can think of a lot of situations where we make the wisest decision possible based on available information and it turns out to be wrong; the universe is not fair, we know this already. So I will assume we have all the relevant information needed to win. We will also assume that Omega does have the capability to accurately predict my actions; and that causality is not violated (rationality cannot be expected to win if causality is violated!).

Assuming this, I can have a conversation with Omega before it leaves. Mind you, it's not a real conversation, but having sufficient information about the problem means I can simulate its part of the conversation even if Omega itself refuses to participate and/or there isn't enough time for such a conversation to take place. So it goes like this...

Me: "I do want to gain as much as possible in this problem. For that effect I will want you to put as much money in the box as possible. How do I do that?"

Omega: "I will put 1M$ in the box if you take only it; and nothing if you take both."

Me: "Ah, but we're not violating causality here, are we? That would be cheating!"

Omega: "True, causality is not violated. To rephrase, my decision on how much money to put in the box will depend on my prediction of what you will do. Since I have this capacity, we can consider these synonymous."

Me: "Suppose I'm not convinced that they are truly synonymous. All right then. I intend to take only the one box".

Omega: "Remember that I have the capability to predict your actions. As such I know if you are sincere or not."

Me: "You got me. Alright, I'll convince myself really hard to take only the one box."

Omega: "Though you are sincere now, in the future you will reconsider this decision. As such, I will still place nothing in the box."

Me: "And you are predicting all this from my current state, right? After all, this is one of the parameters in the problem - that after you've placed money in the boxes, you are gone and can't come back to change it".

Omega: "That is correct; I am predicting a future state from information on your current state".

Me: "Aha! That means I do have a choice here, even before you have left. If I change my state so that I am unable or unwilling to two-box once you've left, then your prediction of my future "decision" will be different. In effect, I will be hardwired to one-box. And since I still want to retain my rationality, I will make sure that this hardwiring is strictly temporary."

fiddling with my own brain a bit

Omega: "I have now determined that you are unwilling to take both boxes. As such, I will put the 1,000,000$ in the box."

Omega departs

I walk unthinkingly toward the boxes and take just the one

Voila. Victory is achieved.

My main conclusion is here is that any decision theory that does not allow for changing strategies is a poor decision theory indeed. This IS essentially the Friendly AI problem: You can rationally one-box, but you need to have access to your own source code in order to do so. Not having that would so inflexible as to be the equivalent of an Iterative Prisoner's Dilemma program that can only defect or only cooperate; that is, a very bad one.

The reason this is not obvious is that the way the problem is phrased is misleading. Omega supposedly leaves "before you make your choice", but in fact there is not a single choice here (one-box or two-box). Rather, there are two decisions to be made, if you can modify your own thinking process: 1. Whether or not to have the ability and inclination to make decision #2 "rationally" once Omega has left, and 2. Whether to one-box or two-box.

...Where decision #1 can and should be made prior to Omega's leaving, and obviously DOES influence what's in the box. Decision #2 does not influence what's in the box, but the state in which I approach that decision does. This is very confusing initially.

Now, I don't really know CDT too well, but it seems to me that presented as these two decisions, even it would be able to correctly one-box on Newcomb's problem. Am I wrong?

Eliezer - if you are still reading these comments so long after the article was published - I don't think it's an inconsistency in the AI's decision making if the AI's decision making is influenced by its internal state. In fact I expect that to be the case. What am I missing here?

Comment author: MoreOn 11 December 2010 05:21:49AM *  1 point [-]

I wanted to consider some truly silly solution. But since taking only box A is out (and I can’t find a good reason for choosing box A, other than a vague argument based in irrationality along the lines that I’d rather not know if omniscience exists…), so I came up with this instead. I won't apologize for all the math-economics, but it might get dense.

Omega has been correct 100 times before, right? Fully intending to take both boxes, I’ll go to each of the 100 other people. There’re 4 categories of people. Let’s assume they aren’t bound by psychology and they’re risk-neutral, but they are bound by their beliefs.

  1. Two-boxers who defend their decision do so on ground of “no backwards causality” (uh, what’s the smart-people term for that?). They don’t believe in Omega’s omniscience. There’s Q1 of these.

  2. Two-boxers who regret their decision also concede to Omega’s near-perfect omniscience. There’re Q2 of these.

  3. One-boxers who're happy also concede to Omega’s near-perfect omniscience. There’re Q3 of these.

  4. One-boxers who regret foregoing $1000. They don’t believe in Omega’s omniscience. There’re Q4 of these.

I’ll offer groups 2 and 3 (believers in that I’ll only get 1000) to split my 1000 between them, in proportion to their bet, if they’re right. If they believe in Omega’s perfect predictive powers, they think there’s a 0% chance of me winning. Therefore, it’s a good bet for them. Expected profit = 1000/weight-0*(all their money)>0

Groups 1 and 4 are trickier. They think Omega has a P chance of being wrong about me. I’ll ask them to bet X=1001000P/((1-P)weight)-eps, where weight is a positive number >1 that’s a function of how many people donated how much. Explicitly defining weight(Q1, Q4, various money caps) is a medium-difficulty exercise for a beginning calculus student. If you insist, I’ll model it, but it will take me more time than I’d already spent on this. So, for a person in one of these groups, expected profit = -X(1-P)+1001000P/weight = eps > 0!

So what do I have now? (Should I pray to Bayes that my intuition be confirmed?) There’re two possible outcomes of taking both boxes.

  1. Both are full. I give the 1001000 to groups 1 and 4, and collect Q21000+Q31000000 from groups 2 and 3, which is more than 1001000 if Q3>0 AND Q2>0, or if Q3>1. This outcome has potential for tremendous profit. Call this number PIE >> 1001000.

  2. Only A is full. I split my 1000 between groups 2 and 3, and collect X1Q1+X4Q4 from groups 1 and 4. What are X1 and X4 again? X, the amount of money group 1 and group 4 bet, is unique for each group. I called group 1’s X X1, group 4’s X4.

I need to find the conditions when X1Q1+X4Q4 > 1000. So suppose I undermaximized my profit, and completely ignored the poor group 1 (their 1000 won’t make much difference either way). Then X=X4 becomes much simpler, X=1001000P/((1-P)Q4)-eps, and then they payoff I get is -Q4eps+1001000P/(1-P). P = 0.001 and Q4eps < $2 guarantee X1Q1+X4Q4 > X4Q4 > 1000.

That’s all well and good, but if P is low (under 0.5), I’m getting less than 1001000. What can I do? Hedge again! I would actually go to people of groups 1 and 4 again, except it’s getting too confusing, so let’s introduce a “bank” that has the same mentality as the people of groups 1 and 4 (that there’s a chance P that Omega will be wrong about me). Remember PIE? The bank estimates my chances of getting PIE at P. Let’s say if I don’t get PIE, I get 1000 (which is the lowest possible profit for outcome 2; otherwise it’s not worth making that bet). I ask the following sum from the bank: PIEP+1000(1-P) – eps. The bank makes a profit of eps > 0. Since PIE is a large number, my profit at the end is approximately PIEP+1000(1-P) > 1001000.

Note that I’d been trying to find the LOWER bound on this gambit. Actually plugging in numbers for P and Q’s easily yielded profits in the 5 mil to 50 mil range.

Comment author: ata 11 December 2010 05:33:53AM *  0 points [-]

One-boxers who regret foregoing $1000. They don’t believe in Omega’s omniscience. There’re Q4 of these.

How is there anybody in this group? Considering that all of them have $1,000,000, what convinced them to one-box in the first place such that they later changed their minds about it and regretted the decision? (Like, I guess a one-boxer could say afterwards "I bet that guy wasn't really omniscient, I should have taken the other box too, then I'd have gotten $1,001,000 instead", but why wouldn't a person who thinks that way two-box to begin with?)

Comment author: nshepperd 11 December 2010 05:57:57AM *  1 point [-]

You're essentially engaging in arbitrage, taking advantage of the difference in the probabilities assigned to both boxes being full by different people. Which is one reason rational people never assign 0 probability to anything.

You could just as well go to some one-boxers (who "believe P(both full) = 0") and offer them a $1 bet 10000000:1 in your favor that both boxes will be full; then offer the two-boxers whatever bet they will take "that only one box is full" that will give you more than $1 profit if you win. Thus, either way, you make a profit, and you can make however much you like just by increasing the stakes.

This still doesn't actually solve newcomb's problem, though. I'd call it more of a cautionary tale against being absolutely certain.

(Incidentally, since you're going into this "fully intending" to take both boxes, I'd expect both one boxers and two boxers to agree on the extremely low probability Omega is going to have filled both boxes.)

Comment author: Vaniver 11 December 2010 06:08:15AM 0 points [-]

Which is one reason rational people never assign 0 probability to anything.

I don't know, I feel pretty confident assigning P(A&!A)=0 :P