How Artificial Superintelligence Could Ultimately Destroy Humanity

I’m confident that machine intelligence will be our final undoing. Its potential to wipe out humanity is something I’ve been thinking and writing about for the better part of 20 years. I take a lot of flak for this, but the prospect of human civilization getting extinguished by its own tools is not to be ignored.

How Artificial Superintelligence Could Ultimately Destroy Humanity

There is one surprisingly common objection to the idea that an artificial superintelligence might destroy our species, an objection I find ridiculous. It’s not that superintelligence itself is impossible. It’s not that we won’t be able to prevent or stop a rogue machine from ruining us. This naive objection proposes, rather, that a very smart computer simply won’t have the means or motivation to end humanity.

Loss of control and understanding
Imagine systems, whether biological or artificial, with levels of intelligence equal to or far greater than human intelligence. Radically enhanced human brains (or even nonhuman animal brains) could be achievable through the convergence of genetic engineering, nanotechnology, information technology, and cognitive science, while greater-than-human machine intelligence is likely to come about through advances in computer science, cognitive science, and whole brain emulation.

And now imagine if something goes wrong with one of these systems, or if they’re deliberately used as weapons. Regrettably, we probably won’t be able to contain these systems once they emerge, nor will we be able to predict the way these systems will respond to our requests.

“This is what’s known as the control problem,” Susan Schneider, director at the Center for Future Mind and the author of Artificial You: AI and the Future of the Mind, explained in an email. “It is simply the problem of how to control an AI that is vastly smarter than us.”

For analogies, Schneider pointed to the famous paper clip scenario, in which a paper clip manufacturer in possession of a poorly programmed artificial intelligence sets out to maximize efficiency of paper clip production. In turn, it destroys the planet by converting all matter on Earth into paper clips, a category of risk dubbed “perverse instantiation” by Oxford philosopher Nick Bostrom in his 2014 book Superintelligence: Paths, Dangers, Strategies. Or more simply, there’s the old magical genie story, in which the granting of three wishes “never goes well,” said Schneider. The general concern, here, is that we’ll tell a superintelligence to do something, and, because we didn’t get the details just quite right, it will grossly misinterpret our wishes, resulting in something we hadn’t intended.

For example, we could make the request for an efficient means of extracting solar energy, prompting a superintelligence to usurp our entire planet’s resources into constructing one massive solar array. Asking a superintelligence to “maximize human happiness” could compel it to rewire the pleasure centers of our brains or upload human brains into a supercomputer, forcing us to perpetually experience a five-second loop of happiness for eternity, as Bostrom speculates. Once an artificial superintelligence comes around, doom could arrive in some strange and unexpected ways.

Eliezer Yudkowsky, an AI theorist at the Machine Institute for Artificial Intelligence, thinks of artificial superintelligence as optimization processes, or a “system which hits small targets in large search spaces to produce coherent real-world effects,” as he writes in his essay “Artificial Intelligence as a Positive and Negative Factor in Global Risk.” Trouble is, these processes tend to explore a wide space of possibilities, many of which we couldn’t possibly imagine. As Yudkowski wrote:

I am visiting a distant city, and a local friend volunteers to drive me to the airport. I do not know the neighborhood. When my friend comes to a street intersection, I am at a loss to predict my friend’s turns, either individually or in sequence. Yet I can predict the result of my friend’s unpredictable actions: we will arrive at the airport. Even if my friend’s house were located elsewhere in the city, so that my friend made a wholly different sequence of turns, I would just as confidently predict our destination. Is this not a strange situation to be in, scientifically speaking? I can predict the outcome of a process, without being able to predict any of the intermediate steps in the process.

Divorced from human contexts and driven by its goal-based programming, a machine could mete out considerable collateral damage when trying to go from A to B. Grimly, an AI could also use and abuse a pre-existing powerful resource—humans—when trying to achieve its goal, and in ways we cannot predict.