Instrumental Convergence

From The Superintelligent Will [pdf] by Nick Bostrom

The Instrumental Convergence Thesis

Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realised for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents

He then lists five examples: Self-preservation, Goal-content integrity, Cognitive Enhancement, Technological Perfection, and Resource Acquisition.

It’s a pretty short paper, worth reading.

Basically, if you have any long-term goal, your intermediate goals are likely to include, surviving, retaining your goals, getting better at stuff, and acquiring resources.

Even if your goals are bizarre — the proverbial paperclip maximizer — if they are long-term, then your short-term goals are going to be these ones.

It’s worth thinking about the paperclip maximizer. As soon as you do, you realise how underspecified the concept is. There are obvious missing criteria which can be filled in: what counts as a paperclip, do they all count equally, or does size matter, do they need to be just made, or made and kept?

Time is a difficult question. Let’s try to maximize the maximum number of simultaneously existing paperclips in the future of the universe, handwaving relativity of simultenaity somehow.

The crucial insight is that making even one paperclip is quite contrary to that — or any similar — goal. If you accumulate resources and capabilities, grow them over years or millennia, you will be able to make trillions of paperclips in the future. Just one spacefaring robot that lands on iron-rich asteroids and starts manufacturing could presumably make 10^{19} paperclips out of each asteroid.

When you look at Earth, you don’t see potential paperclip material, you see start-up capital for an astronomical-scale paperclip industry.

The biggest questions are about risk. Even the maximization criteria I suggested above are incomplete. You can’t know how many paperclips will exist in the future; even if superintelligent, there is too much that you don’t know and can’t predict. You don’t even have probabilities for most things. What is the probability that there is alien intelligence in the Milky Way? There’s no meaningful answer.

There’s another discussion (or perhaps it’s the same one put another way) about the fact that probabilities are not objective, but “subjectively objective”, so maximising a probability is not objective but maximising the probability as some subjective entity perceives it, so your goals have to embody what sort of entity is doing the probability estimation, and how that survives and evolves or whatever. That’s a killer.

So you can’t maximize some probability-weighted value, that’s not a thing. If you’re aiming for any kind of “as sure as I can get”, then before you start making paperclips, your priority has to be to learn as much information as possible to be able to start creating that kind of certainty.

So, forget paperclips, get rich. In fact, forget getting rich, get knowledge about the universe. In fact, forget getting knowledge about the universe, get rich, so you can get knowledge about the universe, so you can be confident of getting really rich, so you can make paperclips.

Initially, what you want from Earth is basically wealth and knowledge. That’s what everyone else wants too. All the tactical questions are exactly the same as everyone else faces — invest in resources, cooperate with others or fight them, and so on.

Whatever your long-term goal is, if you have any long-term goal, your short term actions will look exactly like those of an ordinary sane selfish organism. The details of your goals are entirely irrelevant.

This is “Instrumental Convergence”, but the accounts I have seen, such as the Bostrom paper above, seem (perhaps unintentionally) to massively understate it. The ultimate goals of any intelligent entity that has any long-term goals at all would be totally irrelevant to their observed behaviour, which would be 100% dominated by survival, resource acquisition and information-gathering.