Notes on Rao on Ooze
Some notes on “Fear of Oozification”.
I accept the evolutionary picture. So do Yudkowsky, Shulman, Hanson, Carlsmith and Bostrom.
Key disuputes:
- How capable are we of influencing evolutionary change (both speed and direction)?
- Would the results of such efforts be desirable (see e.g. here, here, here and here)?
We definitely have some capability to influence (at the social level it’s called governance; at the biological: homeostasis).
It’s a value question, and an empirical question, what kinds of governance (a.k.a. holding onto things we care about) we should go for.
On the value question: yes, valuing requires some attachment to the status quo. Yudkowsky, Schulman, Carlsmith and Bostrom (and, I’m fairly sure—Altman, Hassabis, Musk) understand this, and are more into conservative humanism than Rao and Hanson.
Most people prefer conservative humanism—even the transhumanists—so these values will keep winning until the accellerationist minority get an overwhelming power advantage (or civilisation collapses).
The accellerationists see humanism as parochial and speciesist. They love Spinoza’s God, and more fully submit to its “will”.
Metaphysical moral realists like Parfit and Singer may find themselves siding with the accelerationists; it depends how well human-ish values track what objectively matters (what God “wants”…).
Joe Carlsmith’s series, “Otherness and Control in the Age of AGI”, is one of the best things I’ve read on this stuff.
Empirically, these values lie on a spectrum, and neither extreme is sustainable. Max(conservative) means self-defeating fragility. Max(accelerationist) means ooze, because complexity requires local stability, and ooze eventually becomes primordial soup.
New, for me, was the idea that more powerful technology means selection dynamics at lower levels. E.g. when we train AIs we select over matrices, and with nanotech we’ll select over configurations of atoms. And yes, once that ball gets rolling, there’s an explosion of possibilities. It sounds like Rao thinks that this means a Singleton is unlikely, but I don’t understand why. Our attempts at scaffolding might well lead us there.
Rao doesn’t like the Shoggoth meme:
The shoggoth meme, in my opinion, is an entirely inappropriate imagining of AIs and our relationship to them. It represents AI in a form factor better suited to embodying culturally specific fears than either the potentialities of AI or its true nature. There is also a category error to boot: AIs are better thought of as a large swamp rather than a particular large creature in that swamp wearing a smiley mask.
To defend it: the swamp is the pretrained model. The Shoggoth is the fine-tuned and RHLF’d creature we interact with. The key thing is that the tuned and RHLF’d creature still has many heads; on occasion we’ll be surprised by heads we don’t like.
Safe, Unsafe and Universal Singletons
Bostrom defines a Singleton as follows:
A world order in which there is a single decision-making agency at the highest level. Among its powers would be (1) the ability to prevent any threats (internal or external) to its own existence and supremacy, and (2) the ability to exert effective control over major features of its domain (including taxation and territorial allocation).
Many singletons could co-exist in the universe if they were dispersed at sufficient distances to be out of causal contact with one another. But a terrestrial world government would not count as a singleton if there were independent space colonies or alien civilizations within reach of Earth.
The key thing that’s interesting about Singletons is their effective internal control and ability to prevent internal threats.
Therefore I think we should distinguish three kinds of Singleton:
- Safe Singleton: effective internal control; prevents internal and external threats.
- Unsafe Singleton: effective internal control; prevents internal threats.
- Universal Singleton: effective internal control; prevents internal threats; occupies the entire Universe.
When a Safe Singleton encounters an Unsafe Singleton, it destroys or absorbs it.
A Universal Singleton can’t, by definition, be subject to external threats.
Conservative progressivism
Peter Singer famously argues that it’s difficult to come up with criteria to explain why killing babies is wrong without those criteria also entailing that killing many kinds of animals is also wrong.
A friend expressed scepticism about this argument by saying:
- I just think that killing newborn babies is wrong. It’s obvious.
- Saying this is no more dogmatic than Singer’s choice of criteria for justifying moral concern.
I somewhat bungled my reply, so I’m writing a better version here.
My friend claims that proposition (1) is self-justifying, i.e. it needs no further justification. He did not explain how he thinks he knows this.
One might think that nearly everyone has a strong intuition that killing babies is wrong, so the need to supply further justification is weak, or null. That’s true now in the West, but historically false.1
When we say a belief is self-justifying, we run into trouble if others disagree. Maybe we can persuade them by illustration or example or something, but often we’ll reach an impasse where the only option is a biff on the nose.
Consider a more controversial claim:
- Killing pigs is wrong.
There, it seems to me, we do want to ask “why”?
And then we get into all the regular questions about criteria for moral consideration.
And then we might think: ok, do those criteria apply to babies?
And then we get to the thing that Singer noticed: that it’s hard to explain why we should not kill babies without citing criteria that also apply to many animals.
A natural thought, of course, is just: “well, babies are humans!” But what, exactly, makes humans worthy of special treatment? And: how special, exactly?
Impartial utilitarians deny that humans should get special treatment just because they’re humans. Instead they’ll appeal to things like consciousness and sentience and self-awareness and richness of experience and social relations and future potential and preferences and so on (and they’ll usually claim these are most developed in humans compared to other animals). They usually conclude that humans often deserve priority over other animals, but they deserve it because of these traits, not just because they are human. To privilege humans just because they are human is “specisism”, a vice akin to racism.
I take the impartial utilitarian view seriously, but moderate it with two commitments 2:
- Conservatism: I give greater value to things that already exist (over potential replacements), simply because they already exist.
- Loyalty: you owe allegiance to the groups of which you’re a part.
I can say some things in support of these claims, but with Singer I would probably reach an impasse. He would probably agree that (a) and (b) have pragmatic value, but deny that the world is made better by having more of (a) and (b), assuming all else equal. Our disagreements might come down to metaethics, specifically to moral epistemology. The impasse is deeper down.
So that’s the sense in which my friend is right, that ultimately these things come down to principles we judge as more plausible than others, and your ability to justify your plausibility judgements to others may be limited. The basis of our moral judgments is never entirely selfless, but partly an expression of who and what we are. And we are not all the same. So sometimes we biff each other on the nose.
I’d guess that >10 billion people have lived in societies where infanticide was acceptable.↩︎
I don’t think these commitments are strong enough to avoid the view that a technologically mature society should convert most matter into utilitroinium. But they may be strong enough to say that humans or human-descendents should be granted at least a small fraction of the cosmic endowment to flourish by their own lights, however inefficiently…↩︎
“Today’s AI is the worst you’ll ever use.”
I don’t know who first said this. But Nathan @labenz repeats it often. He’s right to do so.
On a similar path, but.
Over the past few years, Joe Carlsmith has published several blog posts that nicely articulate views that I’ve also arrived at, for similar reasons, before he published the posts 1. My own thinking has certainly been influenced by him, but on non-naturalist realism, deep atheism and AI existential risk, and a few other topics in AI and metaethics, I was definitely there-ish before he published. But: I had not written up these views in anything approaching the quality of his blog posts. I’d have found it hard to do so, even with great effort.
What should I make of the fact that one of the best contemporary philosophers is on a similar path on some topics? On the one hand, this is gratifying and encouraging: this is some evidence that (a) my views are correct and (b) that I “have what it takes” to develop my own, somewhat novel views on important topics at the vanguard.
On the other hand, it makes me think “Joe has it covered, and will do a better job than me”. This pushes on my long-running concern that spending time on moral philosophy and futurism—which I am constantly drawn to—is mostly self-indulgence on my part; that going “all in” this stuff would mean falling short of my “be useful” aspiration. If I went “all in”, I think 90%+ that I’d top out as “good”, but not “world class”. And: on the face of it, the returns to being merely “good” are pretty low.
Much better, plausibly, to keep the philosophy as a passionate side-project. It feeds into my work as an “ethical influencer”, which is one way of thinking about the main impact of my career so far. Plausibly this role—perhaps mixed with some more “actually do the thing” periods—is my sweet spot in the global portfolio.
To be clear: Joe also has a lot of fantastic posts which have contained many many “fresh to me” ideas and insights. I read everything he writes.↩︎
Holden Karnofsky: the fast takeoff scenario is a key motivation for preemptive AI safety measures
Holden Karnofsky: One of the reasons I’m so interested in AI safety standards is because kind of no matter what risk you’re worried about, I think you hopefully should be able to get on board with the idea that you should measure the risk, and not unwittingly deploy AI systems that are carrying a tonne of the risk, before you’ve at least made a deliberate informed decision to do so. And I think if we do that, we can anticipate a lot of different risks and stop them from coming at us too fast. “Too fast” is the central theme for me.
You know, a common story in some corners of this discourse is this idea of an AI that’s this kind of simple computer program, and it rewrites its own source code, and that’s where all the action is. I don’t think that’s exactly the picture I have in mind, although there’s some similarities.
The kind of thing I’m picturing is maybe more like a months or years time period from getting sort of near-human-level AI systems — and what that means is definitely debatable and gets messy — but near-human-level AI systems to just very powerful ones that are advancing science and technology really fast. And then in science and technology — at least on certain fronts that are the less bottlenecked fronts– you get a huge jump. So I think my view is at least somewhat more moderate than Eliezer’s, and at least has somewhat different dynamics.
But I think both points of view are talking about this rapid change. I think without the rapid change, a) things are a lot less scary generally, and b) I think it is harder to justify a lot of the stuff that AI-concerned people do to try and get out ahead of the problem and think about things in advance. Because I think a lot of people sort of complain with this discourse that it’s really hard to know the future, and all this stuff we’re talking about about what future AI systems are going to do and what we have to do about it today, it’s very hard to get that right. It’s very hard to anticipate what things will be like in an unfamiliar future.
When people complain about that stuff, I’m just very sympathetic. I think that’s right. And if I thought that we had the option to adapt to everything as it happens, I think I would in many ways be tempted to just work on other problems, and in fact adapt to things as they happen and we see what’s happening and see what’s most needed. And so I think a lot of the case for planning things out in advance — trying to tell stories of what might happen, trying to figure out what kind of regime we’re going to want and put the pieces in place today, trying to figure out what kind of research challenges are going to be hard and do them today — I think a lot of the case for that stuff being so important does rely on this theory that things could move a lot faster than anyone is expecting.
I am in fact very sympathetic to people who would rather just adapt to things as they go. I think that’s usually the right way to do things. And I think many attempts to anticipate future problems are things I’m just not that interested in, because of this issue. But I think AI is a place where we have to take the explosive progress thing seriously enough that we should be doing our best to prepare for it
Rob Wiblin: Yeah. I guess if you have this explosive growth, then the very strange things that we might be trying to prepare for might be happening in 2027, or incredibly soon.
Holden Karnofsky: Something like that, yeah. It’s imaginable, right? And it’s all extremely uncertain because we don’t know. In my head, a lot of it is like there’s a set of properties that an AI system could have: roughly being able to do roughly everything humans are able to do to advance science and technology, or at least able to advance AI research. We don’t know when we’ll have that. One possibility is we’re like 30 years away from that. But once we get near that, things will move incredibly fast. And that’s a world we could be in. We could also be in a world where we’re only a few years from that, and then everything’s going to get much crazier than anyone thinks, much faster than anyone thinks.
https://80000hours.org/podcast/episodes/holden-karnofsky-how-ai-could-take-over-the-world/
See also: HK on PASTA.