MINDWORKS

Mini: It takes two to tango (Julie Shah and Laura Majors)

Daniel Serfaty

In their new book, “What to Expect When You’re Expecting Robots: The Future of Human-Robot Collaboration”, Prof. Julie Shah, associate dean of Social and Ethical Responsibilities of Computing at MIT, and Laura Majors, Chief Technology Officer at Motional, describe the relationship between humans and robots in terms of a dance. Anyone who has danced with a partner knows that, to be successful, it is paramount that each partner understands, not only their own capabilities, but their partner's capabilities, then communicate those capabilities to each other. How does this principle apply to the relationship between humans and robots? You’re going to have to listen to this mini to find out!

Listen to the entire interview in Human-Robot Collaboration with Julie Shah and Laura Majors

Daniel Serfaty: I love in your book at some point you're talking about dance, you use the word dance, which I like a lot in this case because whatever saying you want, it takes two to tango, but the fact is that in order to be a great tango team, you not only have to be an excellent dancer by yourself. And certainly the two roles of the traditional role of men and women in that dance perhaps are different. However, you need to anticipate the moves of your partners to be a great dancer, especially in tango, it's particularly difficult.

You write about that and you start a journey to indicate this notion of harmony, of collaborative aspect of the behavior, Laura, in your world, is that as important, this notion of a robot having almost like an internal mental model of the human behavior and for the human that is also in the loop having some kind of an internal understanding of what the robot is capable of doing and what he's not capable of doing?

Laura Majors:Yeah, absolutely. We have people who ride in our cars who will take an autonomous ride to their passengers. So they have to understand what is the robot doing and why and how do I change what it's doing if I want it to stop earlier, or I want to know why it got so close to a truck, or does it see that motorcycle up ahead? There are also pedestrians who will need to cross in front of robotaxis and need to know is that car going to stop or not? So our vehicles have to be able to communicate to pedestrians and other human drivers in ways that they can understand. We have a project we call expressive robotics that's looking at different ways you can communicate to people, and again, using mental models they already have, rather than... You see a lot of research around flash a bunch of lights or have some display, but is there something that's more intuitive and natural?

In some of our studies, we discovered that people often use the deceleration of the vehicle as one indicator. So should we start the stop a little more abruptly and maybe a little earlier to indicate that we see you and we're stopping for you? Another cue people use is sound, the screeching of the brakes. So when we stop, should we actually amplify the screeching sound? That's something that we work on. And then the third class of users or of people in our integrated system that we think about our remote operators. So if a car gets stuck, let's say, it comes up to an intersection where the traffic light is out and there's a traffic cop directing traffic, or a remote operator needs to take over control and have some ability to interface with the car and with the situation. It's definitely an important part of autonomous vehicles.

Daniel Serfaty: That's interesting because in the first capture you only imagine the proverbial user, but in a large system or a system of systems, the way you describe it, there are all kinds of stakeholders here that you have to take into account in the very design of the vehicle itself.

Laura Majors:That's right. Julie and I in the book we call this other set of people bystanders. These are people who may not even realize is a car a human-driven car or a robot. The car may be far enough or angled in a way that you can't see if there's a person in the driver's seat or not. And so these people don't necessarily know what are the goals of that robot? Where's it going? What is it doing? How does it work? What are its blind spots? And so I think there's a lot of work there to figure out how can you effectively communicate with those bystanders, again, who know nothing about your system and be able to interact in a safe way with those bystanders.

Daniel Serfaty: That's fascinating because it's about almost amplifying an interaction that you wouldn't do normally if you're a car, in the sense that because you adapted in certain way, you have to exaggerate your signals somehow. Julie, what do you think are the remaining big scientific or technological hurdles for the next generation robots in a sense that I know you're working with students and you're working in a lab, you have the luxury of slow experimentation and grading semester after semester, maybe a luxury Laura doesn't have in her world? If you had some wishes for the next generation robots, will they be more socially intelligent, more emotionally intelligence, more culturally aware, more creative? What kind of quality you would like eventually to be able to design into those robots in the future?

Julie Shah:Well, we definitely need the systems to be more human aware in various ways, starting with humans as more than obstacles is a good starting point. And then once you go down that path, what is the right level at which to model a person? What do you need to know about them? And then in collections of people, are the norms, the conventions really do become important. So that's really just at its beginning. So being able to learn norms, conventions from relatively few demonstrations for observations is challenging, or to be able to update, start with a scaffold and update a model that the system has in a way that doesn't take thousands or hundreds of thousands or even millions of examples.

And so one of the technical challenges is as machine learning becomes more critically important to deploying these systems in less structured and more dynamic environments, it's relatively easy to lose sight as to what's required to make those systems capable. You look at the advances today, systems that are able to play various games like go and how they're able to learn. This requires either collecting vast amounts of labeled data, in which we're structuring the knowledge of the world through the system through those labels or a high fidelity emulator to practice. And our encoding of that emulator, it never truly mimics the real world. And so what translates what needs to be fixed up relatively quickly.

Many of our advances in perception, for example, are in fields where it's much easier to collect these vast amounts of data and it's easier to tailor them for different environments. If you look at what's required for deploying these systems in terms of understanding state of the world and being able to project, we don't have data sets on human behavior. And human behavior changes in ways that are tied to a particular intersection or a particular city when you're driving or when you're walking as a pedestrian and so that chance for problem becomes very important for a safety-critical system operating in these environments as well.

And so our own lab has a robust research program and what I call the small data problem. Everybody's working in big data and machine learning. If you work with people, you live in a world of small data and you begin to work very hard to gain the most you can out of what type of data it's easy for people to give. And labels are not easy, but there's other forms of high-level input a person can give to guide the behavior of a system or guide its learning or inference process paired with small amounts of labeled data.

And so we use techniques like that for being able to back out or infer human mental models, human latent states that affect humans behavior. And so as a very concrete example of that, for a very simple system, imagine a subway going up and down a line. And if that's how it goes up and down the line in Boston or New York, the behavior of the subway is the same. But in Boston, we say it's inbound and outbound from some arbitrary point called Park Street in the middle of the line. And in New York, we say like uptown and downtown based on when it gets to the end of the line and switches. It's sort of a two-state latent state that we hold to describe and understand that system.

But as a person that grew up in New Jersey and then moved to Boston, that switch can be very confusing. But if a person is able to give a machine just the change point in their own mental model, even if they can't use words to describe it, I can say the behavior of the subway switches at this point when it moves through Park Street. But the behavior of the subway in my mental models switches at the end of the line at this point. That's actually enough for a machine to lock in the two latent states that actually correspond to the human health mental model of the behavior of that system. And so these are your technical challenges, but ones that we can formulate and that we can address and make these systems vastly more capable with relatively little data and with very parsimonious only gathered human input. And so I think there's a really bright future here, but it's framing the problem the right way.

Daniel Serfaty: Laura, in your world, if you had one wish that will simplify, that will create a leap into the system that you are designing, what particular characteristics, or am hesitant to call it intelligence, but let's say social, cultural, creative, emotional components of the robot side of the equation would you wish for?

Laura Majors:One way I think about it is how do we create intelligence that can be predicted by other humans in the environment? And so I think that's really the leap. We talk about some in our book. Do you have to change the fundamental decision-making of the robot in order for its behavior to be understood and predicted by the people who encounter it? I think that's a really big gap still today. I think back to some of my early work in autonomous systems in talking with pilots in the military who flew around some of the early drones like Predator and other ones and they said the behavior of those systems was just so fundamentally different than a human-piloted vehicle, that they would avoid the airspace around those vehicles, give them lots of spacing and just get out of town.

And then Julia described in the manufacturing setting that these industrial robots were safe and could be side-by-side with people, but weren't smart and weren't contributing as well as they could be. So if we have that on our streets and our sidewalks, these systems that behave in ways we don't understand and who aren't able to add value to the tasks that we're doing every day, whether that's delivering food to us or getting us somewhere safely but quickly, I think that's going to be highly disruptive and a nuisance and it's not going to solve the real problems that these robots are designed or intended to solve. I think there's an element of predictive intelligence.

Daniel Serfaty: I like that, predictive intelligence.