So Far: My Response on Unfriendly AI
Eliezer Yudkowsky asks: “Well, in the case of Unfriendly AI, I’d ask which of the following statements Bryan Caplan denies.” My point-by-point reply:
1. Orthogonality thesis – intelligence can be directed toward any compact goal; consequentialist means-end reasoning can be deployed to find means corresponding to a free choice of end; AIs are not automatically nice; moral internalism is false.
I agree AIs are not “automatically nice.” The other statements are sufficiently jargony I don’t know whether I agree, but I assume they’re all roughly synonymous.
2. Instrumental convergence – an AI doesn’t need to specifically hate you to hurt you; a paperclip maximizer doesn’t hate you but you’re made out of atoms that it can use to make paperclips, so leaving you alive represents an opportunity cost and a number of foregone paperclips. Similarly, paperclip maximizers want to self-improve, to perfect material technology, to gain control of resources, to persuade their programmers that they’re actually quite friendly, to hide their real thoughts from their programmers via cognitive steganography or similar strategies, to give no sign of value disalignment until they’ve achieved near-certainty of victory from the moment of their first overt strike, etcetera.
Agree.
3. Rapid capability gain and large capability differences – under scenarios seeming more plausible than not, there’s the possibility of AIs gaining in capability very rapidly, achieving large absolute differences of capability, or some mixture of the two. (We could try to keep that possibility non-actualized by a deliberate effort, and that effort might even be successful, but that’s not the same as the avenue not existing.)
Disagree, at least in spirit. I think Robin Hanson wins his “Foom” debate with Eliezer, and in any case see no reason to believe either of Eliezer’s scenarios is plausible. I’ll be grateful if we have self-driving cars before my younger son is old enough to drive ten years from now. Why “in spirit”? Because taken literally, I think there’s a “possibility” of Eliezer’s scenarios in every scenario. Per Tetlock, I wish he’d given an unconditional probability with a time frame to eliminate this ambiguity.
4. 1-3 in combination imply that Unfriendly AI is a critical Problem-to-be-solved, because AGI is not automatically nice, by default does things we regard as harmful, and will have avenues leading up to great intelligence and power.
Disagree. “Not automatically nice” seems like a flimsy reason to worry. Indeed, what creature or group or species is “automatically nice”? Not humanity, that’s for sure. To make Eliezer’s conclusion follow from his premises, (1) should be replaced with something like:
1′. AIs have a non-trivial chance of being dangerously un-nice.
I do find this plausible, though only because many governments will create un-nice AIs on purpose. But I don’t find this any more scary than the current existence of un-nice governments. In fact, given the historic role of human error and passion in nuclear politics, a greater role for AIs makes me a little less worried.
The post appeared first on Econlib.