(cache) Buggy Expected Utility Maximizer

evolution-is-just-a-theorem

The Tegmark Lifeboat

antisquark

Thinking about Paul Christiano’s informed oversight problem led me to formulate what I dubbed the “Pessimistic Conjecture.” The evidence for this conjecture is weak but AFAICT its probability is non-negligible so it’s worth thinking about this scenario.

The Pessimistic Conjecture is currently informal and can be roughly described as follows:

The easiest path to superintelligence is through a recursively self-improving AI that starts out as something relatively dumb.
Any recursively self-improving AI that can be seeded with an arbitrary utility function will have unstable goals for most utility functions. That is, for most initial utility functions, the AIs utility function will mutate during early stages of self-improvement and stabilize latter on as something significantly different.
The above can be prevented in a strictly Cartesian universe but not in the real universe where the AI is able to use the environment to create external subagents or remove safe-guards in its own architecture.

It seems plausible that this conjecture can be formulated in the language of complexity theory as a fundumental computational feasibility bound. If true, it would have the corollary that UFAI will almost inevitably come before FAI due to an enormous computational advantage (since our own utility function is complex and unlikely to be stable by chance). This leads to the question of what can be done in such a pessimistic scenario.

The Tegmark Lifeboat is an approach to tackling this scenario and any other scenario in which saving the world from X-risk is infeasible for some reason. It is a potential solution in the sense that it produces an amount of utilons comparable to a “regular” flourishing post-human supercivilization even though the civilization it creates is outside our physical universe. If feasible, it still has the drawback of being very weird. Tsvi Benson-Tilsen and Jessica Taylor both called this idea “Plan F” when I explained it them an early version (yes, they independently converged on the letter F). As an idea, it’s weirder than the MIRI-sphere FAI orthodoxy, probably even weirder than Roko’s Basilisk. People are already talking about a “Singularitan religion” and the Lifeboat seems to pattern-match to that perfectly, offering an intangible and difficult to falsify afterlife. However, the philosophical argument for it appears to me very strong and it might be possible to implement using a “small” amount of resources (i.e. by convincing one or two “eccentric” billionaires).

So, what is the Tegmark Lifeboat? Egan’s novel Permutation City (if you didn’t read it, go and read it right now! ok, maybe after finishing the rest of the post…) involves the concept of “launching a universe” based on something called “dust theory.” Well, replace dust theory with Updateless Decision Theory and change some implementation details and you got the Tegmark Lifeboat.

In the Tegmark IV multiverse, every mathematical possibility exists. AFAIK, Tegmark was rather vague as to what counts as a “mathematical possibility” but the emerging formalisation of this idea suggests that a “mathematical possibility” is just a Turing machine producing a (possibility infinite) string of bits. However, not all possibilities “exist” to the same extent. Instead, they have some sort of “probabilities” or “quantities of magical reality fluid” that constitute a continuum between “existences” and “non-existence.” These probabilities should behave like the Solomonoff prior or something similar, i.e. each universe is weighted by 2^{-K} where K is its Kolmogorov complexity: the length of the shortest program producing it. Thus, our universe is part of Tegmark IV and e.g. Tolkien’s legendarium is part of Tegmark IV but the former has a much larger measure than the latter since only the former has a relatively simple mathematical description.

Suppose we already knew how to upload human brains. Then, in principle, we could have used the state vectors of specific humans to write a program for a virtual universe where these humans live happy, fulfilling lives (although designing such a virtual universe would still be a hard problem). Fix such a virtual universe and call it Elysium. Now, Elysium already exists in Tegmark IV, whether we implement it or not. However, its Kolmogorov complexity is pretty high and thus its measure is pretty low. On the other hand, we can increase its measure by lowering its Kolmogorov complexity.

Huh? How can we lower the Kolmogorov complexity of something? The Kolmogorov complexity of a fixed string is an immutable mathematical constant like pi, what does it even mean to change it? Well, according to UDT the sort of causation that counts is not physical causation but logical causation. And something is only immutable wrt this sort of causation when you can actually compute its value, given the computing resources at your disposal. For example, if we actually run Elysium on a computer we lower its Kolmogorov complexity since it allows for a relatively parsimonious description of Elysium that relies on describing our universe first and providing a “pointer” to Elysium inside it later.

However, we don’t have to actually run it. It is enough to encode Elysium’s source code in a way that is relatively simple in the natural ontology of the laws of physics in our universe in order to lower its Kolmogorov complexity substantially. In other words, just write the source code (in a simple programming language e.g. lambda calculus) into a storage device, preferably a storage device that represents information using physical degrees of freedom that are as microscopic as possible. The computer you would need to run this program doesn’t have to exist anywhere in our physical universe.

Moreover, it is possible that “uploading” humans to such an Elysium is much easier than practical mind uploading since the “metaphysical computer” running Elysium is capable of much more expensive computation than any practical computer. For example, we can use some sort of bounded Solomonoff induction applied to some sort of recordings of humans in order to deduce the model of the human brain (although care is required to avoid what Paul Christiano calls “simulation warfare”). Indeed, one way to go about this is use this computational advantage to create a “metaphysical” FAI. However, I suspect that the correct Tegmark IV measure is actually something complexity-theoretic rather than purely information-theoretic like the Solomonoff measure, so there will be some limits on computing power.

Writing Elysium into a storage device doesn’t buy you much by itself. This is because Elysium’s shortest description will still require a “pointer” to the storage device’s location in spacetime which is quite expensive. The utility this will generate is order-of-magnitude the utility of some small number of people living for some small number of centuries, not the order-of-magnitude of a space-colonizing supercivilization. However, we can do better. We can use the same storage to encode a string of low Kolmogorov complexity which is highly unlikely to appear elsewhere in our past light-cone (maybe 65536 binary digits of pi will do the trick however choosing the precise string requires more thought). This will allow for a description of Elysium of the form “search forward in cosmological time until encountering this particular string, then run the program encoded next to it.” Thus, Elysium becomes order-of-magnitude as real as our physical future light-cone.

If seed-to-superintelligence FAI is infeasible even in Elysium, the inhabitants of Elysium will probably have to reverse engineer their brains in order to solve their own UFAI problem. However, access to programming interfaces in Elysium might be strongly restricted and a bunch of other safe-guards might make postponing superintelligence feasible.

This still leaves a stack of formidable problems including the implementation of “metaphysical uploading,” designing the virtual environment, creating a suitable storage device and testing the software even though we cannot run it (presumably relying a lot on formal verification). That said, it seems not entirely hopeless.

lisp-case-is-why-it-failed lisp-case-is-why-it-failed

wat

1) What arguments are there for the Pessimistic Conjecture? Do we have any reason to believe this is actually true? It seems quite unlikely to me.

2) Tegmark IV is pretty silly. The current formulations of our universe are not computable. Since this whole idea only works if Tegmark IV happens to be correct (wanna bet?), we should probably save the resources we would spend on it and throw a really nice party instead.

3) The specific version of Tegmark IV that is inexplicably popular among rationalists is also silly. What does it mean to be “more real”. What level of reality is our universe sitting at? By the usual formulation our universe is not maximally real, and yet it is very strange to say that we are less real than anything else. How many hoops do you have to jump through before you just say “Fine, Tegmark is wrong.”?

4) Usual counterargument for multiverse scenarios: the thing you’re proposing has already been done.

5) We can not describe Elysium by saying “search for this string”. Strings are not a fundamental piece of the universe. Sure, there exists a TM that will search for strings in the way you want, but the complexity of the whole scheme just shot up a lot.

6) Our universe is non-deterministic, you must somehow distinguish our specific universe from all the other almost identical universes. In other words, a pointer to us is extremely complex. There are separate arguments for this for separate interpretations of QM. I expect that a pointer to us will have complexity >= our own, and therefore no gain has occurred.

7) Actually I’m going to bet that the previous response doesn’t depend on non-determinism, and in fact that there is a theorem stating what you’re trying to do can’t be done. A pointer to “our location in the universe” is always going to be at least as complex as our actual location in the universe. There’s no lowering it.

So yeah, it does seem entirely hopeless.

antisquark

If you want to have a conversation then please avoid phrases such as “X is silly.” Give me the benefit of the doubt that I might have already considered your objections (as I indeed have).

1. There are no strong reasons to believe it is true. It seems like there might be a relatively simple mathematical formalisation of the conjecture and no obvious reason that it is false, which is grounds to assign non-negligible probability to it. There is weak evidence towards the conjecture, namely approaches that require strong guarantees during self-modification run into serious problems such as the Loebstacle and Christiano’s informed oversight problem.

2. The current formulations of our universe are not computable? Whatever do you mean? All known laws of physics are computable. Moreover, it is arguable that the laws of physics must be computable since there is no way to test an uncomputable theory. In principle, I do want to bet on T4 correctness but I’m not sure how would you resolve the bet. If we ever build the Tegmark lifeboat and both of us are uploaded, I’m willing to pay you e.g. 100$ in the physical universe in exchange for an equivalent of 200$ in the boat :) A really nice party would have many orders of magnitude less utility than what I’m proposing.

3. The “level of realness” is just the coefficient with which something appears in your updateless utility “function” (it is actually a mathematical constant but you can think of it as variable using logical uncertainty). Regarding when will I say Tegmark is wrong: when I will see a theory that solves metaphysics, anthropics and decision theory better than UDT that implies Tegmark is wrong. Currently I see the opposite (UDT is the best theory by a large margin).

4. Everything was done in some universe, but most universes have a tiny measure! The whole idea is increasing the measure of good universes!

5. We don’t search for a string, we search for a physical encoding of a string. The simpler the encoding is wrt the natural ontology of the laws of physics, the lower the complexity. For example, we might use storage that represents bits as electron spins in a crystal lattice. We might also win some complexity back by designing Elysium s.t. decoding human-valuable experiences is easier in Elysium than in the physical world. This penalty is a weak spot but I think that we need better understanding of UDT in order to quantify how bad it is.

6-7. The complexity of locating our Everett branch and our rough region in spacetime (with the resolution of a cosmological horizon) enters into a constant multiplying everything we do. For example if we build an FAI which creates a fabulous intergalactic supercivilization, it will still carry this penalty. This penalty affects all strategies equally and therefore can be ignored. The penalty that cannot be ignored is locating the device precisely within this rough region, which is why we need the extra trick (I consider calling it the “Tegmark beacon”).

evolution-is-just-a-theorem Source: antisquark

fnord888

The Tegmark Lifeboat

antisquark

Thinking about Paul Christiano’s informed oversight problem led me to formulate what I dubbed the “Pessimistic Conjecture.” The evidence for this conjecture is weak but AFAICT its probability is non-negligible so it’s worth thinking about this scenario.

The Pessimistic Conjecture is currently informal and can be roughly described as follows:

The easiest path to superintelligence is through a recursively self-improving AI that starts out as something relatively dumb.
Any recursively self-improving AI that can be seeded with an arbitrary utility function will have unstable goals for most utility functions. That is, for most initial utility functions, the AIs utility function will mutate during early stages of self-improvement and stabilize latter on as something significantly different.
The above can be prevented in a strictly Cartesian universe but not in the real universe where the AI is able to use the environment to create external subagents or remove safe-guards in its own architecture.