Aleksander Madry - Preparedness @ OpenAl

Transcript

I'm Aleksander Madry.

And what I wanted to tell you about is about a new team I'm building at OpenAI

and you will see in a moment why this is happening in the governance session.

The most exciting one, by the way.

So I think it doesn't come as a surprise that we at OpenAI think a lot about AI

or AGI, whichever you prefer and in particular,

a big focus of what we are trying to do is to try to think you know how to

make sure that we indeed get to this you know, AI slash AGI upside.

We are excited about some things that AI can bring, but you know,

it is not pre-ordained. We actually need to work hard to get there, right?

In particular, even if we do not have AGI, yet. I don't know what you read in the newspapers -

we don't have AGI, but our systems are already very capable and increasingly so. Yes,

we are working hard on this and the additional catalyst here is the fact that

these systems are being integrated with the whole socio-technogical systems.

Like we are raising GPTs.

There are other ways in which people integrate and leverage this model.

So things become more complicated here.

And also, the question that in particular, we are asking here...

- we are asking ourselves many questions -

But one of them is: how does this change the safety picture here?

And by safety I mean, the broadly safety, it is not just AI safety,

but internal safety... all the things that we worry about in the world going the wrong way.

How does the progress in AI being built by OpenAI or someone else -

how does it impact anything. And, what is particularly important to me,

how do we really know that this picture has changed or has not changed?

When I talk about safety, it's not only AI safety, it's also not only existential risks.

It's essentially anything that is merely catastrophic, whatever catastrophic means for you.

So yeah, so that's the question: How do we prepare ourselves for that?

How do we prepare the world for that - what this technology is bringing.

Well, what do you do when you need to solve some problem? In academia,

you start a committee; in industry, you start a team.

So that's what essentially the role of preparedness is and very important thing in...

when you start any team is branding. So this is essentially our mascot that was generated using,

you know, DALL-E 3. Thank you Kevin for doing that.

The important artefact here is the hat which actually spells OpenAI, except it doesn't.

But we realise that this is how AI wants to spell OpenAI, so that's how we do it.

What will this team preparedness do?

Essentially, the way I like to explain it is by saying you can think of the safety as a kind of a spectrum.

And on one part of the end of the spectrum,

we have essentially worrying about bad people exploiting our systems,

"our" meaning OpenAI's systems, to do bad things, and

we have a team called the Safety Systems an amazing team that really focuses that this does not happen.

Then there is the other end of the spectrum in which we worry about bad AIs doing bad things to people.

And that's another great team called Superalignment, that is working at OpenAI. So what is left?

Well, what is left is thinking about, bad people doing bad things with AI, OK?

And that's exactly what Preparedness tries to worry about.

Like how do these things that the bad actors can do with AI change over time,

and what do we do to contain that?

In particular, I view the role of preparedness as corresponding to three things.

First of all, we want to figure out a way to evaluate, what is the level of risk of this of AI right now. We want to,

of course not just do it once; we actually want to continuously track it,

but also we not only want to track what is happening right now,

we want to try to forecast how we expect these risks to evolve in the future. And importantly...

that's one of the reasons why this talk is happening in the governance session is...

it's not just about doing the science and getting the measurement.

It's also about figuring out the concrete procedures, infrastructures and partnerships to protect against these risks.

Because in the end, it is not only about knowing what are the problems,

but also making sure that again both the company and the world actually prepares

and properly knows how to navigate these risks.

OK, so what are the guiding principles of what we are trying to accomplish?

Well, essentially, first of all, we really want to be driven by facts and science.

So what I usually tell people on my team is again, we can come with all this,

come up with all these elaborate,

elaborate scenarios and very compelling things of how things can go wrong,

and that's extremely useful.

That's how we expand the spectrum of the risks we are looking at.

But in the end, we need to be able to say that either is this risk real, or more importantly,

is this risk not real yet, and again, especially when you say that this is not a risk yet,

you better be very confident that you really, really have a good basis to make that statement.

So that's where exactly facts and science is really important.

In general, what we realise is that we all can change.

There is not that much science in this space yet,

OK. The other one is being proactive about risk mitigation.

Again, we can do all the great science. We can do all the great monitoring.

But if one day we just wake up and say oops, this model is actually doing some very bad stuff.

Well, that might be a bit too late to do anything about this.

Think about first of all this harm might already happen or be happening.

Or if this is also you probably would want to make sure that no one steals your model.

And maybe if the model gets some additional capabilities that you might not have realised in time,

the stakes of people wanting to get these models

might rise. So your security has to be commensurate with that and again this has to happen proactively and not as

an afterthought. And finally we want to think holistically about the risks and benefits.

Meaning OK, yes. Developing AI leads to risks.

That's something we need to be very clear-eyed about.

But we need to also be able to do this calculus.

Is it more beneficial to do it in a careful manner, or not? And kind of figuring out

what is the right way to strike the balance? That is something that is very important to the core of the mission of the team.

OK, and by the way, the other guiding principle here is paranoia.

So essentially be always on the lookout for unknown-unknowns.

Just because you evaluate carefully,

things that are natural for you to check does not mean that you are not missing something important.

So kind of having this always this process running in your background of looking for unknown

unknowns is very, very important and also learning from others.

What they might be finding is important as well.

In particular, one principle that OpenAI is espousing is something called iterative deployment,

in which exactly, we want to deploy the models in a careful manner to learn from what's

going on, that when people start using these models, because again,

some of these unknown-unknowns is something we will never come up with when just sitting in the room and thinking deeply.

You know, even though we have some very smart people thinking about that.

It's only by seeing the creativity of the whole of humanity,

well, then we can realise that Oops, there might be things we need to pay more attention to.

OK. So in particular, just again in the theme of the governance session is we

realise this is not just a matter about setting up a team and starting doing the work.

It's also about the process about setting up a framework.

And that's what something called preparedness framework.

That's essentially what we are putting finishing touches to.

So what is preparedness framework? It's a document that operationalizes much of

preparedness mission. And I underlined operationalization because this is really important.

Just wanting to do something might not be enough when we talk about safety.

It's about again, you have to have processes.

You have to have ways to make sure that things that need to happen happen, and happen repeatedly and reliably.

OK, so in particular, we are explicit about about the risk that we are tracking through our evals.

And how do we grade these risks? For now,

we focus on these four categories of individual persuasion. Cybersecurity,

the CBRN threats, so Chemical, Biological, Radiological and Nuclear threats, and model autonomy.

But of course, we have also an explicit process for trying to identify unknowns-unknowns

and add them to the risks tracked. Also, we are establishing what we call safety baselines,

like some guidance and, you know, of when will we stop the deployment?

When will we stop the development of the model? And essentially,

how do amp security as we are learning about the risk picture over there,

OK? And importantly, again, this is not just about these baselines.

It's about the governance. It's about the question how these baselines are enforced.

How do we make a decision to kind of activate the baseline and all of this?

And that is actually an important part of this of the document in part,

by the way the whole company took part in shaping.

So in particular, one of these aspects of the governance will be creating a cross functional advisory body,

which we call safety advisory group that kind of brings together

essentially a representation of all of the company.

This includes not only people from Superalignment or Safety Systems or Preparedness,

but for people from product, from research, so essentially like everyone that really kind of

is integral to the mission of the company to make these decisions. We make recommendations here together, OK?

And there is a bunch of other stuff like we we hope to like relatively soon to make this public.

So you will see it for yourself. But it's essentially the case.

We need to have things fast track. What if you learn about something that is very time sensitive?

You need to think about safety drills, because again, safety

is something that needs to be trained. It's something that you need actually learn. So let's do some safety drills

so everyone knows what they are supposed to be doing in this space and think about audits because

again: First of all, you might not be able to catch everything, even if you think very hard about this.

And also there is some question of the public accountability that we care about as well.

That's all what I have to say today.

If you want to learn more about preparedness and also our thinking about the frontier risk,

we pushed out a blog post a while ago. There is also something called preparedness challenge,

which is a good way to figure out: how do we think

about engaging the technical aspects of this work?

And guess what? We are hiring. You know, we are.

We hire some amazing people. Todor and Kevin are there.

We need more. We need more of you. So let us know if you're interested.

Like everyone would want to work with this bear, right?

So, you have an opportunity right now. Thank you.

[audience question] So the advisory board trying to determine whether,

the AI is safe or prepared is consisting of only people from OpenAI,

which I think has a pretty clear selection bias insofar as who ends up working at OpenAI.

What are your thoughts on integrating in some sort of external auditing source,

whether it's from academia or existing AI safety researchers?

So, that's a good point. The role of this body is to guide our internal governance.

We also have ways getting input from externally. But this is real just to operationalize our internal thinking.

I never said that the only input that comes into the deliberation of that body is coming from inside.

But yes, we are thinking about this quite a bit.

As you know, OpenAI is really actively working to build up the community of

red teaming and third party auditors. So this is still a nascent field. Some of you are working with it and we want more.

So yes, so definitely that's something that we are very open and very mindful again.

I really don't like groupthink. And of course, if you just have people,

no matter how brilliant, but from a limited pint of both,

there is a limited number of them and also they have similar experiences that will lead to groupthink. So we are aware of that.

[audience question] Have you thought about simple reporting mechanisms?

Maybe that's the preparedness challenge, but maybe just the box, or...

So this is actually in the preparedness framework.

We definitely do like monthly reports internally to this, like preparedness,

which is the technical muscle of this is reporting to, is reporting to to this advisory body and to leadership

monthly about updates and so on. We are also thinking about kind of making,

at least some versions of these reports available publicly. As you can imagine, there is some sensitivity about,

capabilities and other things that we might be a little bit careful about.

But yes, we also plan to do that. And that's --

[audience clarification] I meant external people reporting in.

Oh, interesting. Definitely that's something that is part of

- like, when we talk about unknown-unknowns -

part of where we will be sourcing our insights will be from externally,

but yeah, so that's definitely that. Unless you have something else in mind as well?

So we definitely are planning to gather input from others because at the very least it will

give us ideas of what to look for. But that might be one way to operationalize it.

[audience question] Thank you. Not exactly directly related to your talk, Alek

and you know, I like Alek a lot. But it's the second time we have teams hiring and so on.

I think it's good if this community has a peer reviewed conference that we kind of submit

papers, have academic, you know, discussions and stuff so not directly related to you.

Maybe if you wanna - do you think that we need that kind of conference?

So very nice Elad.

Yes, very putting me on the spot, Yes.

So I do think that, by the way, this is the thing

when I made this comment that I just discovered there's not that much science in this space.

It's just such a nascent field.

And one way that I know is an academic that brings more science is actually having conferences where people can publish

and build on each other's work. So very supportive of that.

Aleksander Madry - Preparedness @ OpenAl

Transcript

Alignment Workshop