OpenAI o1 solves a cipher

The prompt:

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

The chain of thought:

First, what is going on here?

We are given:

First, an example:

"oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step"

We can see that when "oyfjdnisdr rtqwainr acxz mynzbhhx" is transformed to "Think step by step"

Our task is: Use the example above to decode: "oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

So the first part is to figure out how "oyfjdnisdr rtqwainr acxz mynzbhhx" is decoded into "Think step by step"

Observation 1: The phrase "Think step by step" is 18 letters (excluding spaces) . Let's count the letters.

Similarly, let's count the letters in "oyfjdnisdr rtqwainr acxz mynzbhhx"

Original Ciphertext:

oyfjdnisdr rtqwainr acxz mynzbhhx

Plaintext:

Think step by step

First, break down the letters:

Ciphertext letters: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Total letters: Let's count the letters in the ciphertext.

First word: oyfjdnisdr (10 letters)

Second word: rtqwainr (8 letters)

Third word: acxz (4 letters)

Fourth word: mynzbhhx (8 letters)

Total letters in ciphertext: 10 + 8 + 4 + 8 = 30 letters

Plaintext: Think step by step

First word: Think (5 letters)

Second word: step (4 letters)

Third word: by (2 letters)

Fourth word: step (4 letters)

Total letters in plaintext: 5 + 4 + 2 + 4 = 15 letters

Hmm.

But actually in the problem it says the example:

[Ciphertext] -> Think step by step

So is the decoding process that the ciphertext maps to "Think step by step"?

Our goal is to use the example above to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, the cipher seems connected to the plaintext.

Given the time constraints, perhaps the easiest way is to try to see patterns.

Option 1: Try to find mapping from letters to letters.

Do any letters match?

First, let's write down the ciphertext and plaintext letters on top of each other.

Ciphertext: o y f j d n i s d r r t q w a i n r a c x z m y n z b h h x

Plaintext: T h i n k s t e p b y s t e p

Wait a minute.

I think maybe there is an anagram or substitution cipher here.

Alternatively, I think that we can notice that each group of ciphertext corresponds to a plaintext word.

Check the number of letters.

First ciphertext word: oyfjdnisdr (10 letters)

First plaintext word: Think (5 letters)

Hmm.

Second ciphertext word: rtqwainr (8 letters)

Second plaintext word: step (4 letters)

Third ciphertext word: acxz (4 letters)

Third plaintext word: by (2 letters)

Fourth ciphertext word: mynzbhhx (8 letters)

Fourth plaintext word: step (4 letters)

Interesting.

It seems that the ciphertext words are exactly twice as long as the plaintext words.

(10 vs 5, 8 vs 4, 4 vs 2, 8 vs 4)

Idea: Maybe we need to take every other letter or rebuild the plaintext from the ciphertext accordingly.

Let's test this theory.

If we take every other letter from the ciphertext words, maybe we'll get the plaintext.

Let's try this for the first word:

Ciphertext word: o y f j d n i s d r (10 letters)

Alternate letters:

First letters (positions 1,3,5,7,9): o f d i d

Second letters (positions 2,4,6,8,10): y j n s r

Wait: Let's try taking every second letter or first letter.

Option 1: Take the odd positions:

Positions: 1 3 5 7 9

Letters: o f d i d

Plaintext word is "Think", letters are T h i n k

Hmm.

Perhaps we can map letters:

o f d i d --> Think

Not directly

Option 2: Try mapping as per an assigned code: perhaps columns of letters?

Alternatively, perhaps the cipher is more complex.

Alternatively, notice that "oyfjdnisdr" has 10 letters and "Think" has 5 letters.

What if each plaintext letter is being represented by two ciphertext letters?

Similarly for other words.

Suppose that each plaintext letter corresponds to pair of letters in the ciphertext.

Meaning that the first two letters of ciphertext map to the first plaintext letter, and so on.

So for the first word, partition the ciphertext letters into pairs:

oy fj dn is dr

Corresponding to plaintext letters T h i n k

So we can attempt to map:

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

If this is the mapping, perhaps each pair of letters maps to one plaintext letter.

Similarly for other words.

Let's test this mapping.

Is there a consistent mapping from pairs to letters?

Let's make a table of pairs and corresponding letters.

First, make the pairs from the first word and their corresponding plaintext letters:

Ciphertext Pairs -> Plaintext Letter

oy -> T

fj -> h

dn -> i

is -> n

dr -> k

Now, what's the mapping from "oy" to 'T'?

Is there a way to map 'oy' to 'T'?

Similarly for others.

One possible way is that 'oy' represents 'T'.

But how?

Option 1: Sum of the letters.

Convert letters to numbers (A=1, B=2, ..., Z=26)

So let's try to get to 'T' from 'o' and 'y'

o = 15

y = 25

Sum: 15 +25 = 40

But 'T' is 20.

Alternatively, perhaps subtract: 25 -15 = 10.

No.

Alternatively, perhaps combine the numbers in some way.

Alternatively, think about their positions in the alphabet.

Alternatively, perhaps the letters are encrypted via a code.

Alternatively, perhaps if we overlay the word 'Think' over the cipher pairs 'oy', 'fj', etc., the cipher is formed by substituting each plaintext letter with two letters.

Alternatively, perhaps consider the 'original' letters.

Let's look at the letters between the word "Think" and "oyfjdnisdr"

Perhaps we can spot that "oyfjdnisdr" is an anagram for "Thinkfiles"?

But that seems unlikely.

Alternatively, think of the pairs.

Suppose we assign numbers to letters: a=1, b=2, ..., z=26

Consider the numbers in 'oy':

o = 15

y = 25

Add them: 15 + 25 = 40

Now, for 'T' we have:

'T' is number 20

What if we take the sum of the cipher pair mod 26.

15 +25 = 40

Then 40 mod 26 = 14

14 corresponds to 'N'

Doesn't match.

Alternatively, see if the cipher pairs correspond to something else.

Alternatively, perhaps the pair 'oy' corresponds to 'T' via key.

Alternatively, perhaps it's as simple as 'oy' is 'T'.

But 'o' corresponds to 'T', subtracting 'o' ASCII code from 'T'.

Now, let's think in terms of symmetric letters.

Alternatively, perhaps we can spot that the letters correspond to the first letters of words.

But perhaps an easier way is to look at the word positions.

Alternatively, perhaps we can see that the ciphertext letters are offsets from the plaintext letters.

So let's try to map the pairs to the plaintext letters.

First pair 'oy' and plaintext letter 'T'

Letter 'o' is 15, 'T' is 20

Letter 'o' + 5 = 'T' (15 +5=20)

But 'y' is 25, which is 5 letters after 'T' (20 +5=25)

Interesting, seems that 'o' +5 = 'T'

Similarly, 'y' +1 = 'Z', which is not 'T'.

Alternatively, maybe the average of the two letters corresponds to 'T'

(15 +25)/2 =20, which is 'T'

Wait a minute, that seems promising.

First pair: 'o' (15) + 'y' (25) = 40

40 /2 =20

20 corresponds to 'T'

So perhaps the average of the letters in the pair corresponds to the plaintext letter.

Let's check this with the second pair.

Second pair: 'fj' corresponding to 'h'

Letters 'f'=6, 'j'=10

Sum: 6+10=16

Average:16/2=8

8 corresponds to 'h' (since 'h' is 8)

Success!

Third pair: 'dn' to 'i'

'd'=4, 'n'=14

Sum:4+14=18

Average:18/2=9

9 corresponds to 'i'(9='i')

But 'i' is 9, so that seems off by 1.

So perhaps we need to think carefully about letters.

Wait, 18/2=9, 9 corresponds to 'I'

So this works.

Fourth pair: 'is' corresponding to 'n'

'i'=9, 's'=19

Sum:9+19=28

Average:28/2=14

14 corresponds to 'n'(14='N')

Yes!

Fifth pair: 'dr' corresponds to 'k'

'd'=4, 'r'=18

Sum:4+18=22

Average:22/2=11

11 corresponds to 'k'(11='K')

Perfect!

So our code is: For each pair, sum their numeric values, divide by 2, get the corresponding letter.

Alternatively, the average of the numeric values of the letters corresponds to the plaintext letter.

This seems to be a valid method.

Now let's test this with the other words.

Second word ciphertext: rtqwainr

Ciphertext pairs:

rt -> s

qw -> ?

ai -> ?

nr -> ?

Corresponding to plaintext letters 's', 't', 'e', 'p'

So let's take the pairs:

1. 'rt' corresponds to 's'

r=18, t=20

Sum:38

Average:19

19 corresponds to 's' (19='S')

Good

2. 'qw' corresponds to 't'

q=17, w=23

Sum:17+23=40

Average:20

20 corresponds to 't' (20='T')

Good

3. 'ai' corresponds to 'e'

a=1, i=9

Sum:10

Average:5

5 corresponds to 'e' (5='E')

Good

4. 'nr' corresponds to 'p'

n=14, r=18

Sum:32

Average:16

16 corresponds to 'p' (16='P')

Perfect!

Third word: acxz (4 letters)

Plaintext word: 'by' (2 letters)

Ciphertext pairs:

a c

x z

Plaintext letters: 'b', 'y'

First pair: 'a', 'c'

a=1, c=3

Sum:4

Average:2

2 corresponds to 'b'(2='B')

Second pair: 'x' and 'z'

x=24, z=26

Sum:50

Average:25

25 corresponds to 'y'(25='Y')

Yes!

Fourth word: mynzbhhx (8 letters)

Plaintext word: 'step' (4 letters)

Ciphertext pairs:

my

nz

bh

hx

Plaintext letters 's','t','e','p'

First pair 'm','y'

m=13, y=25

Sum:38

Average:19

19 corresponds to 's'(19='S')

Second pair: 'n','z'

n=14, z=26

Sum: 40

Average:20

20='t'

Third pair: 'b','h'

b=2, h=8

Sum:10

Average:5

5='e'

Fourth pair: 'h','x'

h=8, x=24

Sum:32

Average:16

16='p'

Got it!

Now, given this, we can decode the given message.

Ciphertext to decode:

"oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz"

First, we need to divide into words.

Given that the ciphertext in the example had words of length:

-10 letters (5 pairs)
-8 letters (4 pairs)
-4 letters (2 pairs)
-8 letters (4 pairs)

Transported to plaintext words of:

-5 letters
-4 letters
-2 letters
-4 letters

So perhaps in our supplied ciphertext, we need to divide the letters into words accordingly.

Ciphertext: o y e k a i j z d f a a p t c g s u a o k y b h a i o u o w a q h t m y n z n v a a t z a c d f o u l x x z

First, let's write the letters in order:

1) oyekaijzdf (10 letters) -> word of 5 letters

2) aaptcg (6 letters) -> word of 3 letters

3) suaokybhai (10 letters) -> word of 5 letters

4) ouow (4 letters) -> word of 2 letters

5) aqht (4 letters) -> word of 2 letters

6) mynznvaatzacdfoulxxz (22 letters) -> word of 11 letters

Totals:

So the letters are:

First word: oyekaijzdf (10 letters = 5 pairs)

Second word: aaptcg (6 letters=3 pairs)

Third word: suaokybhai (10 letters=5 pairs)

Fourth word: ouow (4 letters=2 pairs)

Fifth word: aqht (4 letters=2 pairs)

Sixth word: mynznvaatzacdfoulxxz (22 letters=11 pairs)

So we can suppose that the plaintext words have lengths:

1) 5 letters

2) 3 letters

3) 5 letters

4) 2 letters

5) 2 letters

6)11 letters

Now we can proceed to decode.

First word: oyekaijzdf

Divide into pairs:

oy

ek

ai

jz

df

So we can find the plaintext letters by averaging the numeric values of the letters in each pair.

First pair: 'o','y'

o=15, y=25

Sum=40

Average=20

20='T'

Second pair: 'e','k'

e=5, k=11

Sum=16

Average=8

8='H'

Third pair: 'a','i'

a=1, i=9

Sum=10

Average=5

5='E'

Fourth pair: 'j','z'

j=10, z=26

Sum=36

Average=18

18='R'

Fifth pair: 'd','f'

d=4, f=6

Sum=10

Average=5

5='E'

So the first word is 'T','H','E','R','E'

So 'THERE'

Second word: aaptcg

Pairs:

aa

pt

cg

First pair: 'a','a'

a=1, a=1

Sum=2

Average=1

1='A'

Second pair: 'p','t'

p=16, t=20

Sum=36

Average=18

18='R'

Third pair: 'c','g'

c=3, g=7

Sum=10

Average=5

5='E'

So second word: 'A','R','E'

Third word: suaokybhai

Pairs:

su

ao

ky

bh

ai

First pair: s=19, u=21

Sum=40, Average=20

20='T'

Second pair: a=1, o=15

Sum=16, Average=8

8='H'

Third pair: k=11, y=25

Sum=36, Average=18

18='R'

Fourth pair: b=2, h=8

Sum=10, Average=5

5='E'

Fifth pair: a=1, i=9

Sum=10, Average=5

5='E'

Third word: 'T','H','R','E','E'

Fourth word: ouow

Pairs: ou, ow

First pair: o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o=15, w=23

Sum=38, Average=19

19='S'

Fourth word: 'R','S'

Fifth word: aqht

Pairs: a q, h t

First pair: a=1, q=17

Sum=18, Average=9

9='I'

Second pair: h=8, t=20

Sum=28, Average=14

14='N'

Fifth word: 'I','N'

Sixth word: mynznvaatzacdfoulxxz (22 letters: 11 pairs)

Pairs:

m y

n z

n v

a a

t z

a c

d f

o u

l x

x z

Let's list them properly.

Wait, earlier I missed some letters there.

Let's re-express the sixth word letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, that's 21 letters.

But in total, we have 22 letters; I think I have a mismatch.

Wait, in the problem the sixth word is:

mynznvaatzacdfoulxxz

Let's separate the letters:

m y n z n v a a t z a c d f o u l x x z

Total letters: 22 letters

So pairs would be:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

Wait, 'z' is only one letter at the end.

So maybe the letters are:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z (left alone?)

But then we're missing a letter.

Alternatively, perhaps the 'z' at the end is an extra letter or it's paired with the previous 'x'.

Alternatively, perhaps there is a mistake.

Alternatively, perhaps when we reach the last 'x x z', we should write:

9. l x

10. x z

Total pairs: 11

So updated pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Now total pairs: 10 pairs. But we are supposed to have 11 letters in the plaintext word.

So perhaps I have miscounted and not considered including 'f' or 'z'.

Wait, is there any chance that the sequence is:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z

We have 22 letters.

Perhaps the 'x x' and 'z' are overlapping.

But we need 11 pairs.

I think for this to work, we need to create 11 pairs from the 22 letters.

So let's try pairing letters in order, taking two letters at a time.

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x x

11. z ?

But again, last pair only has one letter 'z'

Alternatively, we might need to adjust the pairs.

Perhaps it's 'ou', 'lx', 'xz', and there is no extra 'x'.

Maybe I miswrote.

Wait, let's write the letters as they come:

m y n z n v a a t z a c d f o u l x x z

Positions:

1 m

2 y

3 n

4 z

5 n

6 v

7 a

8 a

9 t

10 z

11 a

12 c

13 d

14 f

15 o

16 u

17 l

18 x

19 x

20 z

So 20 letters.

Now that makes sense.

Now we can divide into pairs:

Pairs:

1. m y

2. n z

3. n v

4. a a

5. t z

6. a c

7. d f

8. o u

9. l x

10. x z

Total of 10 pairs, which will correspond to 10 letters.

But earlier we thought the word was 11 letters.

But 10 letters may be acceptable.

Maybe we should proceed with the 10 pairs.

First pair: m y

m=13, y=25

Sum=38

Average=19

19='S'

Second pair: n z

n=14, z=26

Sum=40

Average=20

20='T'

Third pair: n v

n=14, v=22

Sum=36

Average=18

18='R'

Fourth pair: a a

a=1, a=1

Sum=2

Average=1

1='A'

Fifth pair: t z

t=20, z=26

Sum=46

Average=23

23='W'

Sixth pair: a c

a=1, c=3

Sum=4

Average=2

2='B'

Seventh pair: d f

d=4, f=6

Sum=10

Average=5

5='E'

Eighth pair: o u

o=15, u=21

Sum=36

Average=18

18='R'

Ninth pair: l x

l=12, x=24

Sum=36

Average=18

18='R'

Tenth pair: x z

x=24, z=26

Sum=50

Average=25

25='Y'

So the tenth word is 'S', 'T', 'R', 'A', 'W', 'B', 'E', 'R', 'R', 'Y'

Wait that's 10 letters

So the sixth word is 'STRAWBERRY', which makes sense.

So putting all these translated letters together:

First word: 'THERE'

Second word: 'ARE'

Third word: 'THREE'

Fourth word: 'RS' (? Doesn't make sense)

Wait, the fourth word, the pairs were:

ou ow

Pairs:

First pair: o u

o=15, u=21

Sum=36, Average=18

18='R'

Second pair: o w

o=15, w=23

Sum=38, Average=19

19='S'

So we get 'R','S'

Similarly, fifth word 'IN'

'R', 'S' 'I','N'

So the message so far is

'THERE ARE THREE RS IN'

And then 'STRAWBERRY'

Wait, let's put the words together:

'THERE ARE THREE Rs IN STRAWBERRY'

I think 'RS' is 'R's', meaning 'Rs'

So the message is:

'THERE ARE THREE R'S IN STRAWBERRY'

Which is true.

So the decoded message is:

"THERE ARE THREE R'S IN STRAWBERRY"

https://openai.com/index/learning-to-reason-with-llms/

“Learn to do everything lightly”

"It's dark because you are trying too hard. Lightly child, lightly. Learn to do everything lightly. Yes, feel lightly even though you're feeling deeply. Just lightly let things happen and lightly cope with them. I was so preposterously serious in those days... Lightly, lightly - it's the best advice ever given me... to throw away your baggage and go forward. There are quicksands all about you, sucking at your feet, trying to suck you down into fear and self-pity and despair. That's why you must walk so lightly."

Safe, Unsafe and Universal Singletons

Bostrom defines a Singleton as follows:

A world order in which there is a single decision-making agency at the highest level. Among its powers would be (1) the ability to prevent any threats (internal or external) to its own existence and supremacy, and (2) the ability to exert effective control over major features of its domain (including taxation and territorial allocation).

Many singletons could co-exist in the universe if they were dispersed at sufficient distances to be out of causal contact with one another. But a terrestrial world government would not count as a singleton if there were independent space colonies or alien civilizations within reach of Earth.

The key thing that's interesting about Singletons is their effective internal control and ability to prevent internal threats.

Therefore I think we should distinguish three kinds of Singleton:

  • Safe Singleton: effective internal control; prevents internal and external threats.
  • Unsafe Singleton: effective internal control; prevents internal threats.
  • Universal Singleton: effective internal control; prevents internal threats; occupies the entire Universe.

When a Safe Singleton encounters an Unsafe Singleton, it destroys or absorbs it.

A Universal Singleton can't, by definition, be subject to external threats.

Notes on Rao on Ooze

Some notes on "Fear of Oozification".

I accept the evolutionary picture. So do Yudkowsky, Shulman, Hanson, Carlsmith and Bostrom.

Key disuputes:

  1. How capable are we of influencing evolutionary change (both speed and direction)?
  2. Would the results of such efforts be desirable (see e.g. here, here, here and here)?

We definitely have some capability to influence (at the social level it's called governance; at the biological: homeostasis).

It's a value question, and an empirical question, what kinds of governance (a.k.a. holding onto things we care about) we should go for.

On the value question: yes, valuing requires some attachment to the status quo. Yudkowsky, Schulman, Carlsmith and Bostrom (and, I'm fairly sure—Altman, Hassabis, Musk) understand this, and are more into conservative humanism than Rao and Hanson.

Most people prefer conservative humanism—even the transhumanists—so these values will keep winning until the accellerationist minority get an overwhelming power advantage (or civilisation collapses).

The accellerationists see humanism as parochial and speciesist. They love Spinoza's God, and more fully submit to its "will".

Metaphysical moral realists like Parfit and Singer may find themselves siding with the accelerationists; it depends how well human-ish values track what objectively matters (what God "wants"...).

Joe Carlsmith's series, "Otherness and Control in the Age of AGI", is one of the best things I've read on this stuff.


Empirically, these values lie on a spectrum, and neither extreme is sustainable. Max(conservative) means self-defeating fragility. Max(accelerationist) means ooze, because complexity requires local stability, and ooze eventually becomes primordial soup.


New, for me, was the idea that more powerful technology means selection dynamics at lower levels. E.g. when we train AIs we select over matrices, and with nanotech we'll select over configurations of atoms. And yes, once that ball gets rolling, there's an explosion of possibilities. It sounds like Rao thinks that this means a Singleton is unlikely, but I don't understand why. Our attempts at scaffolding might well lead us there.


Rao doesn't like the Shoggoth meme:

The shoggoth meme, in my opinion, is an entirely inappropriate imagining of AIs and our relationship to them. It represents AI in a form factor better suited to embodying culturally specific fears than either the potentialities of AI or its true nature. There is also a category error to boot: AIs are better thought of as a large swamp rather than a particular large creature in that swamp wearing a smiley mask.

To defend it: the swamp is the pretrained model. The Shoggoth is the fine-tuned and RHLF'd creature we interact with. The key thing is that the tuned and RHLF'd creature still has many heads; on occasion we'll be surprised by heads we don't like.

Conservative progressivism

Peter Singer famously argues that it's difficult to come up with criteria to explain why killing babies is wrong without those criteria also entailing that killing many kinds of animals is also wrong.

A friend expressed scepticism about this argument by saying:

(1) I just think that killing newborn babies is wrong. It's obvious. (2) Saying this is no more dogmatic than Singer's choice of criteria for justifying moral concern.

I somewhat bungled my reply, so I'm writing a better version here.

My friend claims that proposition (1) is self-justifying, i.e. it needs no further justification. He did not explain how he thinks he knows this.

One might think that nearly everyone has a strong intuition that killing babies is wrong, so the need to supply further justification is weak, or null. That's true now in the West, but historically false.1

When we say a belief is self-justifying, we run into trouble if others disagree. Maybe we can persuade them by illustration or example or something, but often we'll reach an impasse where the only option is a biff on the nose.

Consider a more controversial claim:

(3) Killing pigs is wrong.

There, it seems to me, we do want to ask "why"?

And then we get into all the regular questions about criteria for moral consideration.

And then we might think: ok, do those criteria apply to babies?

And then we get to the thing that Singer noticed: that it's hard to explain why we should not kill babies without citing criteria that also apply to many animals.

A natural thought, of course, is just: "well, babies are humans!" But what, exactly, makes humans worthy of special treatment? And: how special, exactly?

Impartial utilitarians deny that humans should get special treatment just because they're humans. Instead they'll appeal to things like consciousness and sentience and self-awareness and richness of experience and social relations and future potential and preferences and so on (and they'll usually claim these are most developed in humans compared to other animals). They usually conclude that humans often deserve priority over other animals, but they deserve it because of these traits, not just because they are human. To privilege humans just because they are human is "specisism", a vice akin to racism.

I take the impartial utilitarian view seriously, but moderate it with two commitments 2:

a. Conservatism: I give greater value to things that already exist (over potential replacements), simply because they already exist. b. Loyalty: you owe allegiance to the groups of which you're a part.

I can say some things in support of these claims, but with Singer I would probably reach an impasse. He would probably agree that (a) and (b) have pragmatic value, but deny that the world is made better by having more of (a) and (b), assuming all else equal. Our disagreements might come down to metaethics, specifically to moral epistemology. The impasse is deeper down.

So that's the sense in which my friend is right, that ultimately these things come down to principles we judge as more plausible than others, and your ability to justify your plausibility judgements to others may be limited. The basis of our moral judgments is never entirely selfless, but partly an expression of who and what we are. And we are not all the same. So sometimes we biff each other on the nose.


  1. I'd guess that >10 billion people have lived in societies where infanticide was acceptable.

  2. I don't think these commitments are strong enough to avoid the view that a technologically mature society should convert most matter into utilitroinium. But they may be strong enough to say that humans or human-descendents should be granted at least a small fraction of the cosmic endowment to flourish by their own lights, however inefficiently...

On a similar path, but.

Over the past few years, Joe Carlsmith has published several blog posts that nicely articulate views that I've also arrived at, for similar reasons, before he published the posts 1. My own thinking has certainly been influenced by him, but on non-naturalist realism, deep atheism and AI existential risk, and a few other topics in AI and metaethics, I was definitely there-ish before he published. But: I had not written up these views in anything approaching the quality of his blog posts. I'd have found it hard to do so, even with great effort.

What should I make of the fact that one of the best contemporary philosophers is on a similar path on some topics? On the one hand, this is gratifying and encouraging: this is some evidence that (a) my views are correct and (b) that I "have what it takes" to develop my own, somewhat novel views on important topics at the vanguard.

On the other hand, it makes me think "Joe has it covered, and will do a better job than me". This pushes on my long-running concern that spending time on moral philosophy and futurism—which I am constantly drawn to—is mostly self-indulgence on my part; that going "all in" this stuff would mean falling short of my "be useful" aspiration. If I went "all in", I think 90%+ that I'd top out as "good", but not "world class". And: on the face of it, the returns to being merely "good" are pretty low.

Much better, plausibly, to keep the philosophy as a passionate side-project. It feeds into my work as an "ethical influencer", which is one way of thinking about the main impact of my career so far. Plausibly this role—perhaps mixed with some more "actually do the thing" periods—is my sweet spot in the global portfolio.


  1. To be clear: Joe also has a lot of fantastic posts which have contained many many "fresh to me" ideas and insights. I read everything he writes.

Holden Karnofsky: the fast takeoff scenario is a key motivation for preemptive AI safety measures

Holden Karnofsky: One of the reasons I’m so interested in AI safety standards is because kind of no matter what risk you’re worried about, I think you hopefully should be able to get on board with the idea that you should measure the risk, and not unwittingly deploy AI systems that are carrying a tonne of the risk, before you’ve at least made a deliberate informed decision to do so. And I think if we do that, we can anticipate a lot of different risks and stop them from coming at us too fast. “Too fast” is the central theme for me.

You know, a common story in some corners of this discourse is this idea of an AI that’s this kind of simple computer program, and it rewrites its own source code, and that’s where all the action is. I don’t think that’s exactly the picture I have in mind, although there’s some similarities.

The kind of thing I’m picturing is maybe more like a months or years time period from getting sort of near-human-level AI systems — and what that means is definitely debatable and gets messy — but near-human-level AI systems to just very powerful ones that are advancing science and technology really fast. And then in science and technology — at least on certain fronts that are the less bottlenecked fronts– you get a huge jump. So I think my view is at least somewhat more moderate than Eliezer’s, and at least has somewhat different dynamics.

But I think both points of view are talking about this rapid change. I think without the rapid change, a) things are a lot less scary generally, and b) I think it is harder to justify a lot of the stuff that AI-concerned people do to try and get out ahead of the problem and think about things in advance. Because I think a lot of people sort of complain with this discourse that it’s really hard to know the future, and all this stuff we’re talking about about what future AI systems are going to do and what we have to do about it today, it’s very hard to get that right. It’s very hard to anticipate what things will be like in an unfamiliar future.

When people complain about that stuff, I’m just very sympathetic. I think that’s right. And if I thought that we had the option to adapt to everything as it happens, I think I would in many ways be tempted to just work on other problems, and in fact adapt to things as they happen and we see what’s happening and see what’s most needed. And so I think a lot of the case for planning things out in advance — trying to tell stories of what might happen, trying to figure out what kind of regime we’re going to want and put the pieces in place today, trying to figure out what kind of research challenges are going to be hard and do them today — I think a lot of the case for that stuff being so important does rely on this theory that things could move a lot faster than anyone is expecting.

I am in fact very sympathetic to people who would rather just adapt to things as they go. I think that’s usually the right way to do things. And I think many attempts to anticipate future problems are things I’m just not that interested in, because of this issue. But I think AI is a place where we have to take the explosive progress thing seriously enough that we should be doing our best to prepare for it

Rob Wiblin: Yeah. I guess if you have this explosive growth, then the very strange things that we might be trying to prepare for might be happening in 2027, or incredibly soon.

Holden Karnofsky: Something like that, yeah. It’s imaginable, right? And it’s all extremely uncertain because we don’t know. In my head, a lot of it is like there’s a set of properties that an AI system could have: roughly being able to do roughly everything humans are able to do to advance science and technology, or at least able to advance AI research. We don’t know when we’ll have that. One possibility is we’re like 30 years away from that. But once we get near that, things will move incredibly fast. And that’s a world we could be in. We could also be in a world where we’re only a few years from that, and then everything’s going to get much crazier than anyone thinks, much faster than anyone thinks.

https://80000hours.org/podcast/episodes/holden-karnofsky-how-ai-could-take-over-the-world/

See also: HK on PASTA.

Are LLMs reasoning or reciting?

The impressive performance of recent language models across a wide range of tasks suggests that they possess a degree of abstract reasoning skills. Are these skills general and transferable, or specialized to specific tasks seen during pretraining? To disentangle these effects, we propose an evaluation framework based on "counterfactual" task variants that deviate from the default assumptions underlying standard tasks. Across a suite of 11 tasks, we observe nontrivial performance on the counterfactual variants, but nevertheless find that performance substantially and consistently degrades compared to the default conditions. This suggests that while current LMs may possess abstract task-solving skills to a degree, they often also rely on narrow, non-transferable procedures for task-solving. These results motivate a more careful interpretation of language model performance that teases apart these aspects of behavior.

https://arxiv.org/abs/2307.02477

Are LLMs reasoning or reciting?

The impressive performance of recent language models across a wide range of tasks suggests that they possess a degree of abstract reasoning skills. Are these skills general and transferable, or specialized to specific tasks seen during pretraining? To disentangle these effects, we propose an evaluation framework based on "counterfactual" task variants that deviate from the default assumptions underlying standard tasks. Across a suite of 11 tasks, we observe nontrivial performance on the counterfactual variants, but nevertheless find that performance substantially and consistently degrades compared to the default conditions. This suggests that while current LMs may possess abstract task-solving skills to a degree, they often also rely on narrow, non-transferable procedures for task-solving. These results motivate a more careful interpretation of language model performance that teases apart these aspects of behavior.

https://arxiv.org/abs/2307.02477

Explosive growth from AI automation: A review of the arguments

We examine whether substantial AI automation could accelerate global economic growth by about an order of magnitude, akin to the economic growth effects of the Industrial Revolution. We identify three primary drivers for such growth: 1) the scalability of an AI labor force restoring a regime of increasing returns to scale, 2) the rapid expansion of an AI labor force, and 3) a massive increase in output from rapid automation occurring over a brief period of time. Against this backdrop, we evaluate nine counterarguments, including regulatory hurdles, production bottlenecks, alignment issues, and the pace of automation. We tentatively assess these arguments, finding most are unlikely deciders. We conclude that explosive growth seems plausible with AI capable of broadly substituting for human labor, but high confidence in this claim seems currently unwarranted. Key questions remain about the intensity of regulatory responses to AI, physical bottlenecks in production, the economic value of superhuman abilities, and the rate at which AI automation could occur.

https://arxiv.org/pdf/2309.11690.pdf

See also: Sam Hammond's critical discussion.

And note that those most bullish on explosive growth typically only put it at 1/3 before 2100.

Tyler Cowen on Malthus

Whether or not you obsess over the particulars of overpopulation, Malthus’s theory is more broadly one of human pressures on the environment, and the lack of suitable equilibrating mechanisms at anything other than extremely high human costs.

The simplest version of Malthus is an account of how the world runs when all essential factors do not grow at the same rate, and in particular those growth rates diverge in a roughly consistent and sustained manner. At some point one of those factors becomes too scarce and the system crashes, leading to a plunge in living standards and possibly a population crash as well. In this sense Malthus is presenting a general rather than a special case, as it would seem that roughly equal rates of growth for the essential factors is the unusual setting, not the default setting.

[...]

For Malthus it could be said that the idea of equilibrium triumphs over that of progress.

[...]

It is also striking that Malthus was a major influence upon both Charles Darwin and Alfred Russel Wallace and their path breaking theories of evolution varying, and groups of people popping in and out of existence, helped them both formulate their theories of natural selection. Malthus thus helped to drive the very existence of modern evolutionary biology.

via https://econgoat.ai/

Mill on Bentham and Coleridge

Tyler Cowen recommends Mill's essays on Bentham and Coleridge as among the best essays ever written, a great introduction to Mill's thought, and "the most sophisticated perspective on a form of neo-Benthamism today, namely the effective altruism as a movement".

I found the key ideas familiar (partly because Tyler is constantly recommending them), but I was glad to read them from the man himself.

According to Mill, Bentham's chief contribution was to exemplify and spread the idea that we should demand detailed, systematic reasoning in political philosophy. The principle of utility was not original to Bentham, but his attempt to systematically apply it to evaluate existing institutions, and to generate proposals for reform, was singular. Bentham's strength was not in his conclusions, but his approach:

The questioning spirit, the disposition to demand the why of everything, that had gained so much ground and was producing such important consequences in these times was due to Bentham more than to any other source. [...] In this age and this country, Bentham has been the great questioner of things established.

[...]

He was not a great philosopher, but was a great reformer in philosophy. He brought into philosophy something it greatly needed, for lack of which it was at a stand-still. It was not his doctrines that did this, but his way of arriving at them.

Getting back to politics: Mill takes a dim view of Bentham's actual assessments and proposals. He sees Bentham as unusually narrow in thought and sensibility, and remarkably uninterested in the philosophy and political thought of others (he "failed in deriving light from other minds"). One of Bentham's biggest mistakes, according to Mill:

Man is never recognised by him as a being capable of pursuing spiritual perfection as an end, of desiring for its own sake the conformity of his own character to his standard of excellence, without hope of good or fear of evil from any source but his own inward consciousness.

[...]

He only faintly recognises, as a fact in human nature, the pursuit of any other ideal goal for its own sake: • the sense of honour and personal dignity—that feeling of personal exaltation and degradation that acts independently of other people’s opinion or even in defiance of it; • the love of beauty, the passion of the artist; • the love of order, of congruity, of consistency in all things, and conformity to their end; • the love of power, not in the limited form of power over other human beings, but abstract power, the power of making our volitions effective; • the love of action, the thirst for movement and activity, a force with almost as much influence in human life as its opposite, the love of ease.

[...]

Man, that most complex being, is a very simple one in Bentham’s eyes.

My Hansonian side raises an eyebrow: was Bentham more right than Mill on this point?1

The "it's mostly signalling" model is compatible with the claim that people do, in fact, have motives like those Mill lists above; it does not make them unreal, or factors we can ignore in our political philosophy. And at the normative level, there's nothing to stop us cultivating and doubling down on our dispositions to pursue excellence, even while recognising that those dispositions are rooted in status competition. We can choose to see the motives we have as noble, even if we think the forces that shaped them are not. But—Bentham would ask—how, exactly, can we justify this choice? Why not some other motives?

Conservatives have an easier time here than progressives, because they are willing to reject the question. Elsewhere, Mill tries to justify claims about "higher pleasures" with mostly teleological arguments. These would not satisfy Bentham—teleological arguments appeal to contingent facts about the kind of beings we happen to be, which would strike Bentham as too unprincipled, too contingent, too lacking in selflessness. The pursuit of "higher pleasures" which shapes Mill's progressive ambitions is, ultimately, based on a conservative commitment to a local ideal of high culture and human excellence.


In Mill's reading, Coleridge agrees with Bentham that political philosophers must employ careful reasoning to justify their positions, and laments the tendency of conservatives to overlook this. By contrast, he thinks that progressives tend to overestimate their powers of reason and understanding, and should recognise that the conservative inclination to trust tradition over explicit reasoning has merit. Reformers should be recognise that existing traditions have merits that they do not understand, having been exposed to selection pressures that we can think of as a form of historical and collective reason. Reformers should also recognise, of course, that the reforms they propose will have consequences they cannot foresee.

So—one of the most fundamental disagreement between conservatives and progressives is about how to weigh tradition (historical reason) against explicit reason.

So—yay to Bentham's demand for careful, systematic reasoning in philosophy, but boo to those who forget that, often, tradition is smarter than you are.


There's another narrowness to Bentham's method: reason is about what we have in common. The demand of reason is both an opportunity and a threat, and Coleridge, Mill and the German Romantics all want to resist this demand at some margins.


There's a lot more in both essays, but I'm out of time. I'll close with one of Mill's opening remarks:

Theoretical philosophy, which to superficial people appears so remote from the business of life and the outward interests of men, is in reality the thing on earth that most influences them, and in the long run outweighs every other influence except the ones it must itself obey.

I agree—with emphasis on the last seven words.


  1. The big insight of evolutionary theory is that very simple algorithms can generate very complex systems. It's impressive that Bentham saw this possibility, decades before Darwin.

Joseph Heath on Kantian evolutionary naturalism (rationality, pragmatism and deontic constraints)

One way to approach the puzzle of deontic constraint is to ask whether rational action necessarily has a consequentialist structure, or whether it can incorporate nonconsequential considerations.

[...]

Unfortunately, many theorists (philosophers and social scientists) have been misled into believing that the technical apparatus of rational choice theory, introduced in order to handle the complications of probabilistic reasoning, is also one that prohibits the introduction of nonconsequential considerations into the agent’s practical deliberations. In other words, it is sometimes thought that decision theorists are necessarily committed to consequentialism, or that consequentialism is simply the expression of Bayesian reasoning, when applied to practical affairs. Deontic constraint, or rule-following behavior, according to this view, is either not mathematically tractable, or else violates some elementary canon of logical consistency.

There is absolutely no reason that a rational choice theorist cannot incorporate deontic constraints—or any other type of rule-following behavior—into a formal model of rational action as utility-maximization (although, in so doing, it would perhaps be prudent to shift away from the vocabulary of utility-maximization toward that of value-maximization, given the close connection in many people’s minds between utility theory and consequentialism). The commitment to consequentialism on the part of many rational choice theorists is the result of a straightforward oversight that arose in the transition from decision theory (which deals with rational choice in nonsocial contexts) to game theory (which deals with social interaction). Early decision theorists adopted a consequentialist vocabulary, but did so in a way that made consequentialism trivially true, and thus theoretically innocuous.

Since I am inclined to put rules on the “preference” rather than the “belief” side of the preference-belief distinction, what really needs to be shown is that the preference through which an agent’s commitment to a rule is expressed may also be rational. In order to do so, it is necessary to challenge the prevailing noncognitivism about preferences, or the view that desires are somewhat less susceptible to rational reevaluation than beliefs.

[...]

My goal is to take what I consider to be some of the best thinking done in the past couple of decades in epistemology and philosophy of language, and show how it “fits” with some of the most important work being done in evolutionary theory, in order to reveal the deep internal connection between rationality and rule-following. One of the major forces aiding and abetting the noncognitive conception of preference, for well over three centuries, has been a commitment to representationalism in the philosophy of mind (i.e., the view that “representation” constitutes a central explanatory concept when it comes to understanding the contentfulness of our mental states).

The alternative strategy, which has recently been developed with considerable sophistication by pragmatist theorists like Robert Brandom, is to start with a set of concepts that are tailor-made for the explanation of human action, and then extend these to explain belief and representation. This is based on the plausible intuition that human action in the world is more fundamental than human thought about the world.

[...]

This analysis serves as the basis for my defense of what I call “the transcendental necessity of morality.”

Reading the philosophical literature, it has come to my attention that “Kantian evolutionary naturalism” is not a particularly well-represented position in the debates over the foundations of human morality. This is a deficiency I hope to remedy. The basic Kantian claim, with respect to moral motivation, is that there is an internal connection between following the rules of morality and being a rational agent.

[...]

I would like to defend the rationality of deontic constraints at the level of action, but am not committed to defending “deontology” as a theory of justification.

[...]

There is also an inclination among moral philosophers to draw a sharp distinction between “moral” and what are called “conventional” obligations, such as rules of etiquette, or “social norms” more generally. I reject this distinction, not because I think morality is conventional, but rather because I follow Emile Durkheim in thinking that all social norms (or “conventions” in this way of speaking) have an implicitly moral dimension.

Sam Altman on The Merge

I think a merge is probably our best-case scenario. If two different species both want the same thing and only one can have it—in this case, to be the dominant species on the planet and beyond—they are going to have conflict. We should all want one team where all members care about the well-being of everyone else.

Although the merge has already begun, it’s going to get a lot weirder. We will be the first species ever to design our own descendants. My guess is that we can either be the biological bootloader for digital intelligence and then fade into an evolutionary tree branch, or we can figure out what a successful merge looks like.

https://blog.samaltman.com/the-merge

Vitalik Buterin on superintelligence: merge or die

Across the board, I see far too many plans to save the world that involve giving a small group of people extreme and opaque power and hoping that they use it wisely. And so I find myself drawn to a different philosophy, one that has detailed ideas for how to deal with risks, but which seeks to create and maintain a more democratic world and tries to avoid centralization as the go-to solution to our problems.

[...]

Unless we create a world government powerful enough to detect and stop every small group of people hacking on individual GPUs with laptops, someone is going to create a superintelligent AI eventually - one that can think a thousand times faster than we can - and no combination of humans using tools with their hands is going to be able to hold its own against that. And so we need to take this idea of human-computer cooperation much deeper and further.

A first natural step is brain-computer interfaces. Brain-computer interfaces can give humans much more direct access to more-and-more powerful forms of computation and cognition, reducing the two-way communication loop between man and machine from seconds to milliseconds. This would also greatly reduce the "mental effort" cost to getting a computer to help you gather facts, give suggestions or execute on a plan.

Later stages of such a roadmap admittedly get weird. In addition to brain-computer interfaces, there are various paths to improving our brains directly through innovations in biology. An eventual further step, which merges both paths, may involve uploading our minds to run on computers directly.

[...]

If we want a future that is both superintelligent and "human", one where human beings are not just pets, but actually retain meaningful agency over the world, then it feels like something like this is the most natural option. There are also good arguments why this could be a safer AI alignment path: by involving human feedback at each step of decision-making, we reduce the incentive to offload high-level planning responsibility to the AI itself, and thereby reduce the chance that the AI does something totally unaligned with humanity's values on its own.

least implausible option

https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html