Conservative progressivism
Peter Singer famously argues that it’s difficult to come up with criteria to explain why killing babies is wrong without those criteria also entailing that killing many kinds of animals is also wrong.
A friend expressed scepticism about this argument by saying:
- I just think that killing newborn babies is wrong. It’s obvious.
- Saying this is no more dogmatic than Singer’s choice of criteria for justifying moral concern.
I somewhat bungled my reply, so I’m writing a better version here.
My friend claims that proposition (1) is self-justifying, i.e. it needs no further justification. He did not explain how he thinks he knows this.
One might think that nearly everyone has a strong intuition that killing babies is wrong, so the need to supply further justification is weak, or null. That’s true now in the West, but historically false.1
When we say a belief is self-justifying, we run into trouble if others disagree. Maybe we can persuade them by illustration or example or something, but often we’ll reach an impasse where the only option is a biff on the nose.
Consider a more controversial claim:
- Killing pigs is wrong.
There, it seems to me, we do want to ask “why”?
And then we get into all the regular questions about criteria for moral consideration.
And then we might think: ok, do those criteria apply to babies?
And then we get to the thing that Singer noticed: that it’s hard to explain why we should not kill babies without citing criteria that also apply to many animals.
A natural thought, of course, is just: “well, babies are humans!” But what, exactly, makes humans worthy of special treatment? And: how special, exactly?
Impartial utilitarians deny that humans should get special treatment just because they’re humans. Instead they’ll appeal to things like consciousness and sentience and self-awareness and richness of experience and social relations and future potential and preferences and so on (and they’ll usually claim these are most developed in humans compared to other animals). They usually conclude that humans often deserve priority over other animals, but they deserve it because of these traits, not just because they are human. To privilege humans just because they are human is “specisism”, a vice akin to racism.
I take the impartial utilitarian view seriously, but moderate it with two commitments 2:
- Conservatism: I give greater value to things that already exist (over potential replacements), simply because they already exist.
- Loyalty: you owe allegiance to the groups of which you’re a part.
I can say some things in support of these claims, but with Singer I would probably reach an impasse. He would probably agree that (a) and (b) have pragmatic value, but deny that the world is made better by having more of (a) and (b), assuming all else equal. Our disagreements might come down to metaethics, specifically to moral epistemology. The impasse is deeper down.
So that’s the sense in which my friend is right, that ultimately these things come down to principles we judge as more plausible than others, and your ability to justify your plausibility judgements to others may be limited. The basis of our moral judgments is never entirely selfless, but partly an expression of who and what we are. And we are not all the same. So sometimes we biff each other on the nose.
I’d guess that >10 billion people have lived in societies where infanticide was acceptable.↩︎
I don’t think these commitments are strong enough to avoid the view that a technologically mature society should convert most matter into utilitroinium. But they may be strong enough to say that humans or human-descendents should be granted at least a small fraction of the cosmic endowment to flourish by their own lights, however inefficiently…↩︎
“Today’s AI is the worst you’ll ever use.”
I don’t know who first said this. But Nathan @labenz repeats it often. He’s right to do so.
On a similar path, but.
Over the past few years, Joe Carlsmith has published several blog posts that nicely articulate views that I’ve also arrived at, for similar reasons, before he published the posts 1. My own thinking has certainly been influenced by him, but on non-naturalist realism, deep atheism and AI existential risk, and a few other topics in AI and metaethics, I was definitely there-ish before he published. But: I had not written up these views in anything approaching the quality of his blog posts. I’d have found it hard to do so, even with great effort.
What should I make of the fact that one of the best contemporary philosophers is on a similar path on some topics? On the one hand, this is gratifying and encouraging: this is some evidence that (a) my views are correct and (b) that I “have what it takes” to develop my own, somewhat novel views on important topics at the vanguard.
On the other hand, it makes me think “Joe has it covered, and will do a better job than me”. This pushes on my long-running concern that spending time on moral philosophy and futurism—which I am constantly drawn to—is mostly self-indulgence on my part; that going “all in” this stuff would mean falling short of my “be useful” aspiration. If I went “all in”, I think 90%+ that I’d top out as “good”, but not “world class”. And: on the face of it, the returns to being merely “good” are pretty low.
Much better, plausibly, to keep the philosophy as a passionate side-project. It feeds into my work as an “ethical influencer”, which is one way of thinking about the main impact of my career so far. Plausibly this role—perhaps mixed with some more “actually do the thing” periods—is my sweet spot in the global portfolio.
To be clear: Joe also has a lot of fantastic posts which have contained many many “fresh to me” ideas and insights. I read everything he writes.↩︎
Holden Karnofsky: the fast takeoff scenario is a key motivation for preemptive AI safety measures
Holden Karnofsky: One of the reasons I’m so interested in AI safety standards is because kind of no matter what risk you’re worried about, I think you hopefully should be able to get on board with the idea that you should measure the risk, and not unwittingly deploy AI systems that are carrying a tonne of the risk, before you’ve at least made a deliberate informed decision to do so. And I think if we do that, we can anticipate a lot of different risks and stop them from coming at us too fast. “Too fast” is the central theme for me.
You know, a common story in some corners of this discourse is this idea of an AI that’s this kind of simple computer program, and it rewrites its own source code, and that’s where all the action is. I don’t think that’s exactly the picture I have in mind, although there’s some similarities.
The kind of thing I’m picturing is maybe more like a months or years time period from getting sort of near-human-level AI systems — and what that means is definitely debatable and gets messy — but near-human-level AI systems to just very powerful ones that are advancing science and technology really fast. And then in science and technology — at least on certain fronts that are the less bottlenecked fronts– you get a huge jump. So I think my view is at least somewhat more moderate than Eliezer’s, and at least has somewhat different dynamics.
But I think both points of view are talking about this rapid change. I think without the rapid change, a) things are a lot less scary generally, and b) I think it is harder to justify a lot of the stuff that AI-concerned people do to try and get out ahead of the problem and think about things in advance. Because I think a lot of people sort of complain with this discourse that it’s really hard to know the future, and all this stuff we’re talking about about what future AI systems are going to do and what we have to do about it today, it’s very hard to get that right. It’s very hard to anticipate what things will be like in an unfamiliar future.
When people complain about that stuff, I’m just very sympathetic. I think that’s right. And if I thought that we had the option to adapt to everything as it happens, I think I would in many ways be tempted to just work on other problems, and in fact adapt to things as they happen and we see what’s happening and see what’s most needed. And so I think a lot of the case for planning things out in advance — trying to tell stories of what might happen, trying to figure out what kind of regime we’re going to want and put the pieces in place today, trying to figure out what kind of research challenges are going to be hard and do them today — I think a lot of the case for that stuff being so important does rely on this theory that things could move a lot faster than anyone is expecting.
I am in fact very sympathetic to people who would rather just adapt to things as they go. I think that’s usually the right way to do things. And I think many attempts to anticipate future problems are things I’m just not that interested in, because of this issue. But I think AI is a place where we have to take the explosive progress thing seriously enough that we should be doing our best to prepare for it
Rob Wiblin: Yeah. I guess if you have this explosive growth, then the very strange things that we might be trying to prepare for might be happening in 2027, or incredibly soon.
Holden Karnofsky: Something like that, yeah. It’s imaginable, right? And it’s all extremely uncertain because we don’t know. In my head, a lot of it is like there’s a set of properties that an AI system could have: roughly being able to do roughly everything humans are able to do to advance science and technology, or at least able to advance AI research. We don’t know when we’ll have that. One possibility is we’re like 30 years away from that. But once we get near that, things will move incredibly fast. And that’s a world we could be in. We could also be in a world where we’re only a few years from that, and then everything’s going to get much crazier than anyone thinks, much faster than anyone thinks.
https://80000hours.org/podcast/episodes/holden-karnofsky-how-ai-could-take-over-the-world/
See also: HK on PASTA.
Are LLMs reasoning or reciting?
The impressive performance of recent language models across a wide range of tasks suggests that they possess a degree of abstract reasoning skills. Are these skills general and transferable, or specialized to specific tasks seen during pretraining? To disentangle these effects, we propose an evaluation framework based on “counterfactual” task variants that deviate from the default assumptions underlying standard tasks. Across a suite of 11 tasks, we observe nontrivial performance on the counterfactual variants, but nevertheless find that performance substantially and consistently degrades compared to the default conditions. This suggests that while current LMs may possess abstract task-solving skills to a degree, they often also rely on narrow, non-transferable procedures for task-solving. These results motivate a more careful interpretation of language model performance that teases apart these aspects of behavior.
Are LLMs reasoning or reciting?
The impressive performance of recent language models across a wide range of tasks suggests that they possess a degree of abstract reasoning skills. Are these skills general and transferable, or specialized to specific tasks seen during pretraining? To disentangle these effects, we propose an evaluation framework based on “counterfactual” task variants that deviate from the default assumptions underlying standard tasks. Across a suite of 11 tasks, we observe nontrivial performance on the counterfactual variants, but nevertheless find that performance substantially and consistently degrades compared to the default conditions. This suggests that while current LMs may possess abstract task-solving skills to a degree, they often also rely on narrow, non-transferable procedures for task-solving. These results motivate a more careful interpretation of language model performance that teases apart these aspects of behavior.