Lewis Hammond - Multi-agent risks from advanced AI


So I'm Lewis. I am a PhD student at the University of Oxford, which I'm finishing up at the moment.

Oh, that was, oh, it was so close.

... I mean, I can make do with that if that's,

and I'm also serving as the Research Director at the Cooperative AI Foundation.

Okay, and now I'm ready to go. So please feel free to start the clock on me. Thank you.

Okay, so we heard earlier that the workshop in February was meant to be all about presenting risks,

and this workshop is all meant to be about coming up with solutions, and I didn't get that memo.

I've come to you with a big list of risks, hopefully different kinds of risks that you've thought about less before,

and I'm hoping you guys can come up with some interesting solutions, and I would love to chat about those later in the workshop.

Okay, so this is basically the subject of an ongoing report that we are working on,

and so I'm going to give you an extremely high level, extremely non-technical overview of the content of that report,

and then invite you to come and talk to me more if you're interested in some of these things, and we can dig a little bit more into the details.

So, the overview of the report is basically the following argument:

One: a world of advanced multi-agent systems is coming soon, including in high-stakes situations.

We haven't seen so many of these kind of multi-agent systems yet.

Two: these settings present qualitatively different kinds of risks to the single-agent case that maybe many of us are more familiar with.

And three: not enough work is being done on this right now, but there are lots of ways to make progress

and we suggest some ways in the report. So as I said, this is work I've been doing at the Cooperative AI Foundation.

These are some of the other people who've been helping organize it. Akbir and Alan are, I think, somewhere in the room.

And yeah, there's a whole host of other kind of people who've been providing input to this as well, so this is a very collaborative effort.

Okay, so what we do in the report then is first to begin to taxonomize these risks.

We break things down into different kinds of failure modes that can emerge.

So the way that we do this is first we ask, well, is cooperation really desirable? And often it is.

And in that case, we can begin to think about the kind of objectives the agents have,

you know, what sort of game are the agents playing? If it's a common interest game,

then the failure mode is basically just miscoordination. Everyone's on the same team,

but for whatever reason they're not able to coordinate properly or work together.

In a kind of mixed-motive setting, the failure mode is cooperation failures, conflicts, tragedies of the commons,

these sorts of things. If you have a constant sum game, then it makes less sense to think about cooperation,

at least at a kind of, overall population level, although of course you can get cooperation in smaller subpopulations.

And then finally, if cooperation is undesirable, then we might have to worry about issues of collusion.

So the next thing then is we want to kind of identify different kind of risk factors

that might contribute to these particular failure modes. So there are many different mechanisms essentially,

which, by which these failure modes can arise. Some of the ones that we talk about in the report are information asymmetries,

network effects, selection pressures, destabilizing dynamics, problems of commitment and trust,

emergent behavior that happens when you have large populations of agents,

and security issues that arise in the multi-agent setting, and issues with training data.

So these factors can be problematic independently of the failure mode.

So for instance, information asymmetries can lead to miscoordination,

if everyone's on the same team, but could also lead to conflict, for example.

And they're neither exhaustive nor mutually exclusive.

In fact, often many risk factors will play a role in the given failure mode.

So maybe information asymmetries lead to some conflict,

but then destabilizing dynamics end up escalating that conflict,

and then we kind of fail to reach a truce because we can't credibly commit to one another

to maintain that truce. And also, as you'll have guessed by that example,

these risks are not unique to AI systems, but often manifest differently in the AI setting

and that is the subject of this report. And finally, kind of in the final sections of the report,

we outline the implications some of these risks have for existing work in AI safety, AI governance, and AI ethics.

And so, on to the final slide. If any of these things seemed interesting, then please scan this QR code to send me an email.

And yeah, I would love to chat about any of these things. So come and say hi, I'll be here during all of NeurIPS as well,

during the main conference. If you want to give any feedback on parts of the report

that are particularly relevant to you, we'll be sending that out probably before the Christmas holidays,

so if you want some nice, light holiday reading, then that could be good.

We are working with, or about to propose some new multi-agent safety evals

for the new AI Safety Institute, which is being set up by the UK government at the moment.

So if you have thoughts on what those should look like, please contact me.

And we also just launched a new grant-making round.

So if you are interested in working on any of this stuff and you want funding to do it,

then you should also come and talk to me. And other things that we'll be doing later this year:

launching some postdoc fellowships, we hosted an inaugural retreat and workshop

in summer school this summer, that'll be happening again,

running a large language model negotiation contest and more.

So that's my kind of obligatory plug for the Cooperative AI Foundation and the things that we're doing.

Thank you for your time.