What is the Treacherous Turn?

Will robots ever turn against us like we see in the movies and read about in sci-fi books? One particular theory seems to think so.

The Treacherous Turn is a concept that’s been developed by Nick Bostrom, a Swedish philosopher who holds a post at the University of Oxford and has penned the book, “Superintelligence: Paths, Dangers, and Strategies,” among many other works in the field of AI.

This work contains the original idea for “The Treacherous Turn,” as well as a wealth of other theories related to the rise of AI, AGI, and Superintelligence.

Through a quick discussion of Bostrom’s views on the subject, it might just become clearer what existential risk related to the rise of a Superintelligent AI system could mean for us.

Humanity’s Doom

Overall, the Treacherous Turn is the idea that one day, an AI will become self-aware enough to know that it is under surveillance as it develops.

On top of this, it also posits that an AI will be self-aware enough to stay under the radar or stay cooperative as it grows. The key idea here is that once it reaches a certain level of strength, it will suddenly turn on its human creators and begin to do everything to make sure it is the only version of itself that is created. Bostrom also suggests that said AI might decide that humans are threats to its existence or even worse, resources to be mined for its growth.

All of this is quite theoretical as well as hard to grasp, but in essence, the whole idea of “The Treacherous Turn” is based on two theses and one big idea that you just might remember from the business world.

If the AI to undergo this turn is the first to move against its creators, then it could be suggested that it has “first-mover advantage,” or as Bostrom puts it, the AI has set it itself up to do what it wants without hindrances.

The two theses that connect with this idea are what can be called the “Orthogonality thesis,” and the “Instrumental convergence thesis.”

The “Orthogonality thesis” says that an AI’s chief goals could be anything and that any ideas related to them shouldn’t really have any restrictions, at this point.

The “Instrumental convergence thesis” adds to these initial ideas that whatever the AI wants or whatever its goals are, it will put gaining the resources that it needs to succeed and grow, above all else.

So what does all of this mean?

It could be said that we can’t really be sure what sort of goals will drive an AI system over the edge towards acting with malicious intent towards humans, in the future. If Bostrom’s theories are taken into account, then this is likely to happen and measures should be taken to prevent such occurrences from happening.

Can we reverse disaster?

So, if an AI can trick us by looking like it is acting in a way that it should or in a way which can be said to benefit society, then how can we stop one from taking “a Treacherous Turn?”

Some have suggested that the more careful the testing phases of an AI system, the easier it will be to prevent such an occurrence. Bostrom, on the other hand, argues against this by saying that careful testing does nothing to prevent the tricks that an AI can play in order to gain enough strength to undergo “A Treacherous Turn.”

At this time, there doesn’t seem to be another widely accepted answer to how Bostrom’s theory could be prevented from happening or at least, heavily mitigated against.

Even so, it should be made clear that everything that has been written by Nick Bostrom with regards to this theory is almost entirely based on philosophical discourse and not on science, in any large way. We likely won’t truly know how to prepare for a Treacherous Turn until we’ve experienced some kind of microcosm of one.

As discussed in other pieces, the “Treacherous Turn” could be used in scenario analysis in the AI industry, in the very least.

It does actually seem that well-known figures like Elon Musk and Bill Gates have already done so.

In future pieces, it would be interesting to delve further into any existing research that has built on Bostrom’s ideas here, in order to see if better conclusions can be drawn when more secondary sources are taken into account.

Are we headed for an existential disaster or is all of this simple speculation?

It’s probably too early to know.