Why meta-desires aren't inherently better

scientiststhesis:
dataandphilosophy:
Sometimes, people say something along the lines of “not all desires are good. You might desire something about your desires, and that is what we should actually follow, so that we don’t just give addicts more heroin and off unstable teenagers.” But I don’t think that meta-desires are necessarily better. Let’s create a person, and call her T.
T is a submissive, and wants to be have sex. But, T was raised in a cult, and desires that she not desire to have sex. However, T has escaped this cult, and desires that she not desire that she not desire to have sex, because she is trying to reject the trappings of the cult and accept her true self. On the other hand, T desires that she not desire that she not desire that she not desire to have sex, because she doesn’t like mental dissonance and wishes that she wasn’t trying to change her beliefs about something this hard to shake off. Lastly, T desires that she not desire that she not desire that she not desire that she not desire to have sex, because she also really appreciates mental games and trickery and philosophy questions, and thinks that her life would be less interesting if it were less meta.
Now, is it the case that the last desire, because it is the most meta, should be the one that we pay attention to? Do we only pay attention to the meta-desire, even though the meta-meta desire seems pretty important and relevant in this case?
I reject the claim that meta-desires should be privileged over normal desires. Thoughts?
Well, my intuition is that eventually it bottoms out. I suppose maybe in principle you could have an agent that goes omega meta-steps in their utility function by contradicting the next lower meta-step, but that doesn’t happen in real life.
And even if it did, we have a lot of ordinal numbers to go beyond omega.
What I’m trying to say is, unless the agent’s meta-desires are completely self-contradicting forever in all possible ordinal meta-levels, eventually they will have an infinity of agreeing meta-utilities. That’s where the weight comes from in (what I see as) the meta-desires argument: I not only desire not to desire eating sweets, I also desire to desire not to desire eating sweets, and I desire to desire to desire not to desire eating sweets, and I (desire to)* desire to not to desire eating sweets.
So um… in most agents, the inconsistencies are only finitely deep, I’d think, which means that they have zero weight when summing up all desires, so the meta-level after the highest meta-level that’s inconsistent with some meta-level above it feels like the one that should be heard.
In the case of an infinitely inconsistent agent? I dunno, that agent can probably just toss a coin or something.
(I feel like this should be looked into wrt FAI.)
It seems to me that not all desires have the same weight, and a finite number of distinct desires could very easily have greater total weight than the total weight of an infinite number of total desires.

Oh hey I see you’re trying to do Friendly AI theory on Tumblr.

And it’s actually pretty good FAI theory so I’m going to chip in, despite the very real risk this poses that Tumblr will prove to be the best format for the debate and that all moral philosophy of FAI will move to Tumblr and be done with examples involving sex, until finally Tumblr bloggers devising increasingly weird sexual scenarios end up determining the fate of humanity, because frankly humanity deserves it at this point.

The moral problem of FAI is to start with a detailed but strictly factual description of existing human beings, and get “What is the right thing to do?” out the other end. In the original CEV document there’s listed four types of extrapolation that might end up being involved with this: (1) The counterfactual question “What if this (description of a) human being knew this new true fact or rational probability distribution (as computed by the AI, we assume correctly or rationally)?” (2) The counterfactual of the human being able to consider lots of good arguments. (3) The human being having better self-knowledge or self-control. (4) Some unspecified counterfactuals about interactions between human beings.

This was meant to be a complete list of all the directions of extrapolation I could think of, but it doesn’t have to be an irreducible list. In particular, it seems to me that this debate invokes the question of whether (3) has force independently of (1) and (2), or whether (3) just emerges from (1) and (2). There’s also a side order of entanglement with the question of whether we determine what a given extrapolation “wants” by something that looks more like (a) “What is this adviser’s extrapolated decision, or verbal answer to this question?” or (b) “What is this adviser’s extrapolated weight of desire?”

If our answer looks more like (a), then we could say that the (3)-ish forces in moral progress, the thought experiments that seem to tell us that more meta desires ought to take precedence, really reduce to the more meta desire being the more stable one—the most meta desire usually being what we imagine being the “last word” in a process that is extrapolating changed answers in response to more knowledge or more arguments.

Then dataandphilosophy’s example is challenging (on this interpretation) because “Lastly, T desires that she not desire that she not desire that she not desire that she not desire to have sex, because she also really appreciates mental games and trickery and philosophy questions, and thinks that her life would be less interesting if it were less meta" seems like the sort of whim that could just as easily change given consideration of more arguments; or the sort of whim that would predictably change if T could view all her future life-moments at stake stretched out before her and being emotionally weighed all at once, without scope insensitivity. So on this interpretation, the key to the dilemma is that for once the most meta desire is not the most stable one, in which case we don’t think it should take precedence, thereby arguing that (3) is just a special case of (1) and (2).

It could also be that we find T’s most meta desire unpersuasive because we don’t think it would be a very strong drive and our moral intuitions are being driven by strength more than stability, the case (b) above. But then this defies our intuition that we can have meta-desires that trump object-level desires for more heroin.

In general the heroin addict seems like an argument for (a) over (b) despite the very real sense in which (a) seems dangerously unstable and arbitrary compared to (b). The heroin addict’s case argues that our moral philosophy should reduce our intuitions favoring stronger desires and meta-desires, into intuitions about which overall decisions would be stable after fairly considering many arguments.

It’s also possible that (3)-reflectivity could be a separate force from (1) and (2), but that it also usually aligns with (1) and (2), and sometimes (3) comes into conflict with (1) and (2) and gets trumped by them. So we ordinarily would accord some degree of privilege to T’s most meta desire, but not enough that we think the most meta desire should win here. It seems to me like a thought experiment to govern this case should describe a case of (1) or (2) indifference, or very slight care, and see if we think (3) ought to govern. Like when I try to figure out whether (4) is a legitimate independent force at all, I ask myself questions like:

Suppose that Hot Dave is currently strictly heterosexual, and currently strongly disprefers that he want to have sex with men (that is, Hot Dave currently strongly disprefers that he become bisexual, because Hot Dave has a System 1 feeling that this is icky). If there were no positive reason to have sex with men, Hot Dave would have a strong dispreference for acquiring a desire to do it. However, Hot Dave is also extrapolated to have some interesting experiences and meet people he otherwise wouldn’t by becoming bisexual, and avoid some social awkwardness; but not very much because the people in this extrapolated future are known to be graceful about taking no for an answer and there will be plenty of women for Hot Dave to have interesting times with. In fact it so happens that, inside Dave’s extrapolated mind, these two considerations end up exactly balancing. Does it make sense for the deciding vote to be cast by the fact that Hot Dave is very attractive to a number of gay men who would wholeheartedly prefer that Hot Dave end up bisexual? Does it make sense for the deciding vote to be cast by a non-counterbalanced consensus from a large number of people outside Dave that the story of the human species seems nicer somehow if Everyone Is Bi? Do we consider this as a factor influencing what we say should count as Hot Dave's own vote inside CEV, the way it seems potentially okay for my belief about how babies should end up as humans to correspond to a CEV that extrapolates babies into humans instead of superbabies?

I’m not sure what an analogous example for (1) and (2) indifference vs. a (3)-difference might be, but I thought I’d turn it over to people who seem like they might have an advantage in inventing weird scenarios of sexual desire and meta-desire in order to expose underlying points of moral philosophy.

And to the rest of humanity, if you didn’t want it to play out like this, you should have allocated your official institutions’ grant money more wisely.