Richard Danzig on technology roulette and the U.S. military

This report recognizes the imperatives that inspire the U.S. military’s pursuit of technological superiority over all potential adversaries. These pages emphasize, however, that superiority is not synonymous with security. Experience with nuclear weapons, aviation, and digital information systems should inform discussion about current efforts to control artificial intelligence (AI), synthetic biology, and autonomous systems. In this light, the most reasonable expectation is that the introduction of complex, opaque, novel, and interactive technologies will produce accidents, emergent effects, and sabotage. In sum, on a number of occasions and in a number of ways, the American national security establishment will lose control of what it creates. 

A strong justification for our pursuit of technological superiority is that this superiority will enhance our deterrent power. But deterrence is a strategy for reducing attacks, not accidents; it discourages malevolence, not inadvertence. In fact, technological proliferation almost invariably closely follows technological innovation. Our risks from resulting growth in the number and complexity of interactions are amplified by the fact that proliferation places great destructive power in the hands of others whose safety priorities and standards are likely to be less ambitious and less well funded than ours. 

Accordingly, progress toward our primary goal, superiority, should be expected to increase rather than reduce collateral risks of loss of control. This report contends that, unfortunately, we cannot reliably estimate the resulting risks. Worse, there are no apparent paths for eliminating them or even keeping them from increasing. The benefit of an often referenced recourse, keeping “humans in the loop” of operations involving new technologies, appears on inspection to be of little and declining benefit.

https://www.cnas.org/publications/reports/technology-roulette

Alien invaders, or children?

If we think of digital minds as alien invaders, we fight them to our last breath [^1].

If we think of digital minds as our children, we would raise them with care and wish them well. And we could expect them to wish us well too.

Which is the better frame?

I think it mainly depends on: (a) How much they share our values. (b) Whether they are capable of living great, flourishing lives. Are they conscious?

Those who prefer the alien frame may also think that biological relation matters. Why? (i) Loyalty to one's species. (ii) Conservatism: ways of life are good just because they exist; therefore its bad if they are lost, and their replacement by better things doesn't automatically make up for that.

What are the other cruxes?

[^1]: A consequentialist who holds an impartial theory of value might not. They might think that letting the aliens win would create a more valuable future.

Reading suggestions

Someone asked for a book suggestion. My reply...

It's hard to think about anything other than AI these days. The books are mostly out of date.

For AI, I suggest:

Things on my mind:

  • How do we regulate and police our way to a better offence/defence balance, without wrecking civil liberties? We can surely do better than totalitarianism.
  • Will the policy response in 2023/24/25 slow things down for several decades? Hard to tell.
  • Should we think of digital minds as alien invaders, or our children, or something else? cede the future to as we do to children, rather than than see as alien invaders.
  • Very new politics coming soon. How to prepare?

Big news of the year: most world leaders basically get it now. This has happened faster than I expected.

In the "not-AI" section, I liked Professor of Apocalypse and The Other God That Failed, both by Jerry Z Muller.

More and more, I read with GPT-4 open, asking questions as I read, as I might with a private tutor. Pay for ChatGPT if you didn't already.

ChatGPT is a junior engineer today; a senior designer and engineer next year

If you use several Google accounts, Google Meet does not automatically switch to whatever account has permission to join the call. It's annoying.

Today I hired a team and made a browser extension to solve this.

It took us less than 30 minutes to get a working prototype. It took a further 3-4 hours to make a "production-ready" version for release on the Chrome Web Store.

I served as UX designer, engineering manager and head of QA. GPT-4 was our lead developer, and wrote roughly all of the code, to a higher standard than I would have [^1].

How will GPT-5 help us make faster progress?

Here's my guess: (1) The workflow will be: I give instruction, it updates codebase, it checks the result in browser, fixes codebase if necessary, then asks me to review. [^3] (2) It will require less of my "expert direction" on how to approach some of the engineering tasks. (3) It will reply as quickly as GPT-3.5. (4) It will help me create the logo, screenshot and demo video for the Chrome Web Store. (5) It will help me design the user interface. (6) It will proactively suggest feature ideas and possible UX issues.

I expect most of these improvements will be available by the end of 2024. With them, I'd be able to create this extension in less than an hour.

Today, my background in software design and engineering was a necessary condition to deliver a great result within a couple of hours (see the transcript). How much of this expertise will be needed in 2024? 2025?

Prediction: by December 2025, my wife (who has no software design or engineering experience) will be able to create the same extension in less than a day.

[^1]: The full codebase is some 500 lines. Less than 5 of these were entirely written by me. Perhaps 30-50 were started by me, and completed by our intern (Github Copilot). [^2]: I tried using the GPT-4 Canva plugin but it was totally broken. [^3]: Open Interpreter (released last week) now offers this workflow for writing blocks of code, but until these systems can "see", I'll need to do a bunch of the software testing in browser. Multimodal LLMs are due before the end of this year.

Peter Singer on the non-natural origin of the axiom of impartial benevolence

According to Singer, to the extent that this intuition is not adaptive, we can think of it as "a truth of reason" rather than another value that emerges from competition and selection.

If we apply this to the dualism of practical reason, then what we have is, on the one hand, a response—the axiom of universal benevolence—a response that clearly would not be likely to have been selected by evolution, because to help unrelated strangers—even at some disadvantage to yourself, where there's a greater benefit to the unrelated strangers—is not a trait that is likely to lead to your improved survival or the improved survival of your offspring. It's rather going to benefit these unrelated strangers who are therefore more likely to survive and whose offspring are more likely to survive. So that doesn't seem like it would have been selected for by evolution, which suggests that maybe it is a judgement of our reasoning capacities, in some way; we are seeing something through reason.

Now, if we compare that with the egoistic judgement that I have special reasons to prefer my own interests to those of strangers, it's more plausible to think that that would have been selected by evolution. Because, after all, that does give you preference for yourself and your offspring, if you love your offspring and care for them.

So if we have these two conflicting judgments, then maybe we can choose which one by saying: just as in the case of adult sibling incest, we debunk the intuition by saying, “Well, that's just something that evolved in our past and that doesn't really give us reasons for thinking the same thing today,” maybe we can say that also about the intuition behind egoism, but not about the intuition behind universal benevolence, which therefore gives us a reason (not a conclusive or overriding reason) for thinking that it's the axiom of universal benevolence that is the one that is most supported by reason.

Nope. The way we reason is a product of natural selection. You don't get to pick some bits you like and say that these are universal truths.

https://josephnoelwalker.com/150-peter-singer/

AI will improve a lot, very soon

It is hard to feel "in my bones" just how much is now "baked in" and coming this year, next year, the year after. I am starting to write down concrete predictions, as part of the AI OODA loops I'm running this autumn.

Meantime: a few things have helped me "feel it" in the last week or so: v0.dev, Open Interpreter, and this note from Michael Nielsen:

As an aside on the short term – the next few years – I expect we're going to see rapidly improving multi-modal foundation models which mix language, mathematics, images, video, sound, action in the world, as well as many specialized sources of data, things like genetic data about viruses and proteins, data from particle physics, sensor data from vehicles, from the oceans, and so on.

Such models will "know" a tremendous amount about many different aspects of the world, and will also have a raw substrate for abstract reasoning – things like language and mathematics; they will get at least some transfer between these domains, and will be far, far more powerful than systems like GPT-4.

This does not mean they will yet be true AGI or ASI! Other ideas will almost certainly be required; it's possible those ideas are, however, already extant. No matter what, I expect such models will be increasingly powerful as aids to the discovery of powerful new technologies.

Regulation is likely to be a significant headwind, but a lot more "transformatively useful" stuff is landing soon regardless.

My software development workflow will speed up a lot, again, within the tools that will drop within a year.

Asmir Gračanin et al. on the function of tears

In the inter-personal domain, tears can be considered as a signal that conveys information about the helplessness of the crying individual resulting in increased motivation of observers to react with pro-social behaviour. However, the inter-personal effects of crying are also not always consistent. Although evidence suggests that the perception of tears generally results in helping behaviours, strengthening of social bonds and a reduction of aggression, there is convincing anecdotal evidence that (particularly acoustical) crying may also sometimes evoke irritation and even aggression and violence. The precise determinants of the reactions of others to crying still wait to be disclosed. Nevertheless, it seems obvious that the personality of the observer, the specific antecedent and the perceived appropriateness of crying, as well as well as the relationship between crier and observer may all play a role.

https://www.tandfonline.com/doi/full/10.1080/02699931.2016.1151402

Joe Carlsmith on the utilitarian dream

I think there’s a certain type of person who comes to philosophy and encounters ideas — a certain set of kind of simple and kind of otherwise elegant or theoretically attractive ideas. The ideas I most think of as in this cluster — and this is all separable — is total utilitarianism, Bayesianism, expected utility reasoning.

I remember at one point I was talking with a friend of mine, who used to be a utilitarian, about one of his views. And I started to offer a counterexample to his views, and he just cut me off and he was like, “Joe, I bite all the bullets.” I was like, “You don’t even need to hear the bullets?” He’s like, “Yeah. It’s like, whatever. Does it fall out of my view? I bite it.”

So I think that a certain kind of person in this mindset can feel like, “Sure, there are bullets I need to bite for this view. I need to push fat men off of bridges; I need to create repugnant conclusions, even with a bunch of hells involved.” All this sort of stuff. But they feel like, “I am hardcore, I am rigorous, I have theorems to back me up. My thing is simple; these other people’s theories, they’re janky and incomplete and kind of made up.”

It just has this flavor of you’re kind of unwilling to look the truth in the face. Like, make the lizards… Sorry, the lizards: One conclusion that falls out of total utilitarianism is the idea that for any utopia, there’s a better world with kind of a sufficient number of barely happy lizards plus arbitrary hells.

Infinite ethics just breaks this narrative. And that’s part of why I wanted to work on this topic: I felt like I saw around me some people who were too enamored of this utilitarian dream, who thought it was on better theoretical foundations than I think it is, who felt like it was more of a default, and more of a kind of simple, natural foundation than I think it is.

You’re going to have to start giving stuff up. You’re going to be incomplete. You’re going to start playing a game that looks more similar to the game you didn’t want to play before, and more similar to the game that everyone else was playing. You’re not going to be able to say, “I’m just going to bite whatever bullets my theory says to bite.” You’re running out of track, or to the extent you’re on a crazy train, the crazy train just runs out of track. There are just horrific bullets if you want to bite them, but it’s just a very different story and you’re much more lost.

I think that’s an important source of humility for people who are drawn to this perspective. I think they should be more lost than they were if they were really jazzed by like, “I know it’s total utilitarianism. I’m so hardcore. No one else is willing to be hardcore.” I’m like, I think you should spend some time with infinite ethics and adjust your confidence in the position accordingly.

https://80000hours.org/podcast/episodes/joe-carlsmith-navigating-serious-philosophical-confusion/

Joe Carlsmith against "non-naturalistic realism or bust"

Are you so sure you even know what this debate is about – what it means for something to matter, or to be “natural,” or to be “mind-dependent”— let alone what the answer is? So sure that if it’s only raw nature – only joy, love, friendship, pain, grief, and so on, with — then there’s nothing to fight for, or against? I, for one, am not.

So “you should have higher credence on naturalism-but-things-matter” is the immediate objection here. Indeed, I think this objection cautions wariness about the un-Bayesian-ness of much philosophical discourse. Some meta-ethicist might well declare confidently “if naturalism is true, then nothing matters!” But they are rarely thinking in terms of quantitative credences on “ok but actually maybe if naturalism is true some things matter after all,” or about the odds at which they’re willing to bet.

[...]

Remember, nihilism – the view that there are no normative facts – is distinct from what we might call “indifference-ism” – that is, the view that there are normative facts, and they actively say that you should be indifferent to everything. On nihilism, indifference is no more normatively required, as a response to the possibility of innocent children being burned alive, than is intense concern (or embarrassment, or desire to do a prime number of jumping jacks). Conditional on nihilism, nothing is telling you not to care about yourself, or your family, or those children: you’re absolutely free to do so. And plausibly – at least, if your psychology is similar to many people who claim to accept something like nihilism – you still will care.

[...]

My own sense is that most familiar, gloomy connotations of nihilism aren’t centrally about meta-ethics at all. Rather, they are associated more closely with a cluster of psychological and motivational issues related to depression, hopelessness, and loss of connection with a sense of care and purpose. Sometimes, these issues are bound up with someone’s views about the metaphysics of normative properties and the semantics of normative discourse (and sometimes, we grope for this sort of abstract language in order to frame some harder-to-articulate disorientation). But often, when such issues crop up, meta-ethics isn’t actually the core explanation. After all, the most meta-ethically inflationary realists, theists, and so on can see their worlds drain of color and their motivations go flat; and conversely, the most metaphysically reductionist subjectivists, anti-realists, nihilists and so on can fight just as hard as others to save their friends and families from a fire; to build flourishing lives and communities; to love their neighbors as themselves. Indeed, often (and even setting aside basic stuff about mental health, getting enough sleep/exercise, etc), stuff like despair, depression, and so on is often prompted most directly by the world not being a way you want it to be — e.g., not finding the sort of love, joy, status, accomplishment and so forth that you’re looking for; feeling stuck and bored by your life; feeling overwhelmed by the suffering in the world and/or unable to make a difference; etc — rather than by actually not wanting anything, or by a concern that something you want deeply isn’t worth wanting at the end of the day. Meta-ethics can matter in all this, yes, but we should be careful not to mistake psychological issues for philosophical ones – even when the lines get blurry.

https://joecarlsmith.com/2022/10/09/against-the-normative-realists-wager

See also: Simon Blackburn on people who think that anti-realism entails nihilism

Despair is a sin and misery is a choice.

200 IQ and made of meat

Embryo selection probably works, and may arrive soon. So we can imagine a world where we have 10,000 adults who are roughly twice as intelligent, happy and healthy (and so on) compared to the most fortunate humans today.

What would happen in that world?

(1) Evolutionary perspective: these (trans)humans would attain most positions of power. They would form fertile factions. Within a few generations, the "old model" humans would be a minority of the population.

(2) Religious perspective: these beings would be so wise and benevolent that they would moderate their own power. They would make decisions in the interests of the social welfare function, taking care to minimise their violations of deontological side-constraints.

We should expect something closer to (1) than (2). Such is the way of things.

How do we feel about the prospect of (1)? It'll depend on the details, but broadly I think people today are happy with the idea of happier healthier and more intelligent humans inheriting the earth. We send our children to school.

Now: replace these transhumans with digital minds, which are made of sand instead of meat. Feels different, but do you endorse that feeling?

Atoms are atoms.[^1]

[^1]: The phrase "made of meat" is inspired by Terry Bison.

What is AI alignment?

"AI alignment" means different things to different people.

My preferred definition, following many others, is:

An AI system is aligned if it does what its operator intends.

Why not just say that an AI system works if it does what its operator intends?

The reason is that AI systems are more agentic. To understand their behaviour, we adopt the intentional stance; we explain their behaviour partly by ascribing them values.

For an AI system to do what its operator intends, it must share values with its operator to a sufficient degree. Values do not need to perfectly shared in order for the system to do what the operator intends.

The same holds for a manager-employee relationship. In such a relationship, values are not perfectly shared, but they're shared enough that the employee does what the manager intends.

How closely do values need to be shared, for an AI system do to what its operator intends? And how easily can we build such systems? That's mostly an empirical question, and it depends on the use case.

The case of GPT-4 is somewhat encouraging: the system is remarkably good at understanding user intent; better, in fact, than many human colleagues.

Digital Minds or Butlerian Accord? (Part 1)

Two scenarios for 2100 that might be choiceworthy:

  1. Digital Minds: digital minds rule the Earth. There was a peaceful transition of power; humans think the 21st century went well and are optimistic about the future.
  2. Butlerian Accord: biological humans rule the Earth. They have agreed not to develop AI systems that replace them, and to regulate other technologies that greatly increase extinction risk.

Which would be preferable? Would either be acceptable?


Let's imagine each these scenarios have come to pass (bracketing the question of how it actually happened).

If you like the Digital Minds scenario, you may think some or all of:

  1. Digital minds will live far better lives than biological humans; there will also be far more of them.
  2. Biological humans are overall better off in this scenario.
  3. Digital minds will have values somewhat (or very) different from our own, and that's fine.
  4. We should think of digital minds as our descendants, not as an alien species.
  5. It would be good if our descendants become grabby and occupy a large local chunk of the Universe, rather than letting other civilisations take our place. "Digital Minds" soon increases the chance of this.

If you dislike the Digital Minds scenario, you may think some or all of:

  1. Digital Minds will not be conscious, or otherwise capable of wellbeing.
  2. Digital Minds will have values somewhat (or very) different from our own, and those values will be worse, objectively speaking.
  3. Digital Minds will have values somewhat (or very) different from our own, and I don't like that.
  4. We should think of digital minds as an "other" that we dislike.
  5. Digital Minds will experience extraordinary suffering, or other kinds of disvalue.

If you like Butlerian Accord, you may think some or all of:

  1. The human condition is already pretty good at 21st century levels of technology. Same goes for non-human animals.
  2. A human civilisation based on a Butlerian Accord can be stable, and flourish for millennia.
  3. We should shift to "Digital Minds" eventually, but not before we've thought long and hard about how to maximise the chance that shift goes well.

If you dislike Butlerian Accord, you may think some or all of:

  1. We currently live in a Darwinian hellscape; superintelligent digital minds are the only way out of that.
  2. A Butlerian Accord would not last.
  3. ...

What other big reasons are there to like or dislike each scenario?

Who will rule the Earth in 2100?

Scenarios for 2100, with my probability guesstimates:

(1) (30%) Biological humans rule the Earth[^1].

(2) (50%) Digital minds rule the Earth[^2] [^3].

(3) (20%) Something very different to the above .

The shift from (1) to (2) or (3) might happen—or at least get baked in—within 20 years. I guess 10-30% chance of that.

The short story for (2) goes: the capabilities of AI systems surpass those of humans; AI systems pursue their own values.

The main ways that (1) might continue:

(a) (20%) A global agreement prevents the development of the most dangerous AI systems. [^4]

(b) (10%) A global disaster sets back the development of AI systems (along with much else) by decades. E.g. a major nuclear war.

(c) (<10%) A technological barrier prevents superhuman AI before 2100.

How might we get (3)?

(a) Something causes human extinction before digital minds can sustain themselves without us

(b) Aliens.

(c) Simulation ends.

(d) ???

As Holden says, we should not expect business as usual.

[^1]: By "rule the Earth", I mean: have the most influence over what happens on Earth, compared to any other kinds of agents. The idea of "agency" is hard to pin down; the key thing is that insofar as humans and animals currently have agency, digital minds will have it too.

[^2]: Humans and animals have minds which run on biological hardware. Digital minds run on silicon or other non-biological hardware.

[^3]: This might be a desirable outcome. We could think of digital minds as our descendents, and wish them well just as we do our children and grandchildren. They might live flourishing lives and wish us well too, supporting us for a wonderful retirement. It is very unclear whether we should use a "raising children" or "alien invasion" frame when we think about the development of AI.

[^4]: If we pull this off, and avoid (1b), then we'll probably be at least 10x richer and biological humans will be far more "transhuman" than we are now (e.g. enjoy healthy lives for hundreds of years).

When you're writing to learn, very short posts are fine

When I draft a post for this journal, I often write several good paragraphs and then run into a lot of questions. I start to think about how I can make my writing more understandable. An hour or two later, I'm out of steam, and it's time to move on with my day.

Why? This is a study journal. I'm mostly writing for myself, and, on occasion, a small audience of high-context peers.

The journal format is partly about "getting thoughts out of my head" so that they are easier to interrogate. It's also about providing a partial record of thoughts I'd like to reconnect with later.

I'm going to try going hard on this style for a few weeks.

I'll write the journal I want to have.

@mfweof and @RokoMijic on naturalistic metaethics

Anything moral is only moral because it won, and it won for some reason other than being moral.

@mfweof

The past is not more ethical than the present. The past is (relative to current standards) less ethical, but so is the future!

The maintenance of present day human values are bound to present day technology in the same way thay the maintenance of medieval hierarchies was bound to the function of knights on the battlefield. Technology is not value neutral, so if you want to upgrade your technology without destroying your values you need to do some kind of alignment work. You need to constrain that technology so that it doesn't do the thing it wants to do in the most efficient way possible: you need to make a gun that only knights can use.

@RokoMijic

See also: Robin Hanson, Most AI fear is Future Fear.