How To Build A Robot That Won’T Take Over The World

Christoph Salge at New York University’s Game Innovation Lab.
SASHA MASLOV/QUANTA MAGAZINE

ISAAC ASIMOV’S FAMOUS Three Laws of Robotics—constraints on the behavior of androids and automatons meant to ensure the safety of humans—were also famously incomplete. The laws, which first appeared in his 1942 short story “Runaround” and again in classic works like I, Robot, sound airtight at first:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey the orders given it by human beings, except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

Of course, hidden conflicts and loopholes abound (which was Asimov’s point). In our current age of advanced machine-learning software and autonomous robotics, defining and implementing an airtight set of ethics for artificial intelligence has become a pressing concern for organizations like the Machine Intelligence Research Institute and OpenAI.

Christoph Salge, a computer scientist currently at New York University, is taking a different approach. Instead of pursuing top-down philosophical definitions of how artificial agents should or shouldn’t behave, Salge and his colleague Daniel Polani are investigating a bottom-up path, or “what a robot should do in the first place,” as they write in their recent paper, “Empowerment as Replacement for the Three Laws of Robotics.” Empowerment, a concept inspired in part by cybernetics and psychology, describes an agent’s intrinsic motivation to both persist within and operate upon its environment. “Like an organism, it wants to survive. It wants to be able to affect the world,” Salge explained. A Roomba programmed to seek its charging station when its batteries are getting low could be said to have an extremely rudimentary form of empowerment: To continue acting on the world, it must take action to preserve its own survival by maintaining a charge.

Empowerment might sound like a recipe for producing the very outcome that safe-AI thinkers like Nick Bostrom fear: powerful autonomous systems concerned only with maximizing their own interests and running amok as a result. But Salge, who has studied human-machine social interactions, wondered what might happen if an empowered agent “also looked out for the empowerment of another. You don’t just want your robot to stay operational—you also want it to maintain that for the human partner.”

 Salge and Polani realized that information theory offers a way to translate this mutual empowerment into a mathematical framework that a non-philosophizing artificial agent could put into action. “One of the shortcomings of the Three Laws of Robotics is that they are language-based, and language has a high degree of ambiguity,” Salge said. “We’re trying to find something that is actually operationizable.”

Quanta spoke with Salge about information theory, nihilist AI and the canine model of human-robot interaction. An edited and condensed version of the conversation follows.

Some technologists believe that AI is a major, even existential threat. Does the prospect of runaway AI worry you?

I’m a bit on the fence. I mean, I do think there are currently genuine concerns with robots and the growing influence of AI. But I think in the short term we’re probably more concerned about maybe job replacement, decision making, possibly a loss of democracy, a loss of privacy. I’m unsure how likely it is that this kind of runaway AI will happen anytime soon. But even an AI controlling your health care system or what treatment options you’re getting—we should start to be concerned about the kind of ethical questions that arise from this.

How does the concept of empowerment help us deal with these issues?

I think that the idea of empowerment does fill a niche. It keeps an agent from letting a human die, but once you’ve satisfied this very basic bottom line, it still has a continued drive to create additional possibilities and allow the human to express themselves more and have more influence on the world. In one of Asimov’s books, I think the robots just end up putting all the humans in some kind of safe containers. That would be undesirable. Whereas having our abilities to affect the world continuously enhanced seems to be a much more interesting end goal to reach.

You tested your ideas on virtual agents in a video game environment. What happened?

An agent motivated by its own empowerment would jump out of the way of a projectile, or keep from falling into a hole, or avoid any number of situations that would result in its losing mobility, dying or being damaged in a way that would reduce its operationality. It just keeps itself running.

When it was paired with a human player that it was supposed to empower as well as itself, we observed that the virtual robot would keep a certain distance so as not to block the human’s movement. It doesn’t block you in; it doesn’t stand in a doorway that’s then impossible for you to pass through. We basically saw that this effect keeps the companion sticking close to you so it can help you out. It led to behavior where it could take the lead or follow.

For example, we also created a scenario where we had a laser barrier that would be harmful for the human, but not harmful for the robot. If the human in this game gets closer to the laser, suddenly there is more and more of an empowerment-driven incentive for the robot to block the laser. The incentive gets stronger when the human stands right next to it, implying, “I want to cross this now.” And the robot would actually block the laser by standing in front of it.

Did the agents engage in any unintended behavior, like the kind that emerges from the three laws in Asimov’s fiction?

MORE of the story / click image TOP of PAGE