One way to think about the challenge of “making AI go well” is: digital minds are our children, and we need to be good parents.
Benigo, Hinton et al. have a new open letter. In a key passage, I tried replacing “AI systems” with “children”:
We need research breakthroughs to solve some of today’s technical challenges in creating children with safe and ethical objectives. Some of these challenges are unlikely to be solved by simply making children more capable. These include:
- Oversight and honesty: More capable children are better able to exploit weaknesses in oversight and testing—for example, by producing false but compelling output.
- Robustness: children behave unpredictably in new situations (under distribution shift or adversarial inputs).
- Interpretability: child decision-making is opaque. So far, we can only test children via trial and error. We need to learn to understand their inner workings.
- Risk evaluations: Frontier children might develop unforeseen capabilities only discovered during training or even well after deployment. Better evaluation is needed to detect hazardous capabilities earlier.
- Addressing emerging challenges: More capable future children may exhibit failure modes we have so far seen only in theoretical models. Children might, for example, learn to feign obedience or exploit weaknesses in our safety objectives and shutdown mechanisms to advance a particular goal.
Compare and contrast to the ways we raise our human children.