The Likelihood of a Terminator Scenario
So what about a world in which machines see us as a threat and take over?
Judging by what experts like Nick Bostrom of Oxford University say, it all depends on several factors related to a concept called “malignant future modes.” Essentially, we can simplify the big idea behind malignant future modes to be, “all of the primary ways that AI systems could turn against us one day.”
Perverse Instantiation
If you haven’t yet read our glossary, think of perverse instantiation as a superintelligence “discovering some way of satisfying the criteria of its final goal that goes against the original rules which were set by its programmers.” In even simpler terms, this is when an AI system finds a way to get to the ideal answer to the problem that it is built to solve by bending the rules that are supposed to prevent it from acting erratically. As is the case with almost every other key concept in AI, Nick Bostrom provides several examples of how this could play out. All of these examples are in the form of the original final goal plus the way that the AI purposefully misinterprets this goal.
Imagine that an AI team programs a system with the primary goal of making humans happy. This means that every action that the system takes should lead to this result. Even so, Bostrom mentions how easy it would be for the system to take this quite literally by determining that the best way to do so would be to connect human brains to computers and stimulate their pleasure centers, digitally. This interpretation of the road to reach such a final goal is essentially a prime example of perverse instantiation.
All in all, most of the other points that Bostrom makes on this subject are similar. For example, Bostrom also mentions an AI that was created with the final goal of always acting in a way to avoid feeling guilty. Essentially, the idea behind implementing such a goal could be to convince the system to have a conscience. Therefore, before creating a final goal like this, the system should have clear rules related to setting a digital conscience in place. The overall aim of all of this would be to avoid acting in any way that hurts humans or the world.
Again, this situation could end in perverse instantiation. The system could simply destroy whatever rules were put into place to make it feels guilt, so that when it breaks any rules, it feels nothing. The event of an AI taking feelings out of the equation or even vice versa, relates directly to “wireheading,” as mentioned above. The only difference is that “wireheading” involves adding an extreme amount of feeling, while the example involving a conscience involves the opposite. All of these examples are admittedly, highly speculative but that’s in the nature of almost everything in this industry in that AI is just beginning to rapidly grow as a space.
Unfortunately, perverse instantiation is not the only road that an AI can take to “turn against us.”
Infrastructure Profusion
In “wireheading,” one can see a connection to the second form of “malignant future modes,” which may be termed “infrastructure profusion.” Again, as the original explanation for this is highly technical, some simplification is necessary to fully understand where this theory’s importance lies. Using Bostrom’s example of an AI that’s built to create a certain number of paperclips could be an easy road towards making this clear. Imagine a system that has the sole purpose of improving a factory’s production of paperclips. Imagine as well that the original instructions for it say, “do everything necessary to improve the factory’s production of paperclips.”
This is where problems could begin. Nick Bostrom seems to say that simply based on the inability of any system to work with a 0% level of error, consequently, AIs will never accept with 100% certainty that they have reached their goal and can stop working. This is, of course, all hinging on the idea that the maximum level of efficiency that an AI system can reach is 99%, with the minimum percentage of error being 1%. Since the system does not stop working, it starts taking in all available resources to create paperclips, which, in theory, won’t end until it has consumed everything on Earth. Doing so, however, would also require a superintelligence that is powerful enough that humans and their resources cannot stop it.
To fix such a large potential issue and really, all issues related to perverse instantiation and infrastructure profusion, Bostrom suggests setting all goals in the context of ranges. In the case of the paperclip AI, its goal could be changed to “create 999,000-1 million paperclips.” In this way, the system could be satisfied once it hits anywhere in this range and its satisfaction wouldn’t require being error free. Unfortunately, this isn’t a blanket solution to perverse instantiation and infrastructure profusion, either. Setting a range like this could fail to stop infrastructure profusion in that it is theoretically impossible for an AI to set a 0% chance for failing at its goals.
Therefore, since an AI always has to see a chance that it won’t reach the desired numbers, it won’t stop working. On the subject of perverse instantiation, the AI would then see no reason not to take the easiest path to its goal, even if this is detrimental to humans or the Earth. In conclusion, it seems that straight answers to preventing these problems do not yet exist and Bostrom is only trying to bring their gravity to light so that future researchers can solve them. In addition to this logical end to our discussion of these first two issues, there’s still another malignant future mode that Bostrom proposes can occur.
Mind Crime
The third and final malignant future mode is “mind crime.”
As compared to the other malignant future modes, Mind crime’s a bit more short and to the point in terms of what it is. Think about an AI that creates a very large number of digital brains that look and function in the same way as human brains, for a neurological study of some sort. When the AI has finished its study, it destroys the brains, because it has no further use for them. This is Bostrom’s prime example of mind crime, which is simply when an AI acts against our general conception of morality, in a drastic way. What it could illustrate is just how dangerous placing too much trust in the effectiveness of AI systems can be.
Morality Models and Other Solutions
This is far from all of the information that exists on the topic of an AI uprising or “malignant future modes.” If you’re interesting in knowing more right now, look no further than the previously mentioned “Superintelligence,” though it is highly esoteric, which is why we frequently break it down here.
In the near future, we’ll be posting pieces on “Morality in AI systems,” as well as theory on “our lives in an algorithmic economy.” In short, we’ll be posting on everything that’s out there related to how we can prepare for a future in which we live with AIs, side by side. Stay tuned and as always, check out the resources below.
References:
Nick Bostrom’s “Superintelligence”:
LessWrong’s Discussion on Malignant Future Modes:
https://www.lesswrong.com/posts/BqoE5vhPNCB7X6Say/superintelligence-12-malignant-failure-modes