Human Goals

In writing about the behaviour of superintelligent AIs, and then going off on a tangent about the behaviour of sovereigns, I’ve adopted the paradigm of “optimising a long-term goal”. I picked that up from the “paperclipper” idea that the AI Risks people talk about.


The problem with assuming that any intelligence has a goal of maximising some quantity over the long term is that no natural or artificial intelligence we know of actually does that. The only relevance of the discussion of instrumental convergence caused by long-term goals that my recent posts have contained is as a distant ideal that might be approximated to.
Actual AI systems today are generally aimed at maximising some quantity within a finite time horizon. I have not seen anybody seriously think about how to build an intelligence with an indefinite time horizon. (That was the point of my “Requirements document for paperclip-maximising AI” tweets, which were playful rather than seriously falling into any of the misunderstandings Yudkowsky mentions).


And humans, well… What is human life for? Lots of people think they can answer that, but they don’t agree.
One can deduce that humans are a product of an evolutionary process that has optimised for reproductive fitness. But that isn’t an explicit goal, represented symbolically within the human body. Most importantly, there’s no mechanism to preserve goal-content integrity. That’s because humans aren’t superintelligences, and are not designed with the assumption that they will be able to manipulate their own minds. Throughout evolutionary history, our ancestors didn’t modify their own goals, not because they were constructed to resist that, but because they weren’t sophisticated enough to do so. Now that humans are symbol-manipulating intelligences, there is no constraint on the human intelligence subverting the implicit goals of the human genome.
Daniel Dennett is good on this, in Freedom Evolves: he talks about the “as-if intentionality” produced by evolution giving rise to a real but subsidiary intentionality in human minds.
Existing machine-learning systems also do not have goals explicitly and physically present. They are more akin to humans in that they have been tuned throughout their structure by an optimisation process such that the whole tends to the goals that were intended by their designers.
As with humans, that kind of goal, because it isn’t explicit, isn’t something that can be protected from change. All you can do is protect the whole mind from any kind of change, which is contrary to the idea of a self-improving intelligence.
Indeed the whole existing technology of “machine learning”, impressive though it is, simply isn’t the kind of logic-manipulating machine that could capable of changing itself. That’s not to say the whole concept of self-accelerating AI is not sensible; it’s just that the ML stuff that is making such waves can only be one part of a composite whole that might reach that stage.
The AI Risks crew are thinking about different kinds of goals, but I’m not in their discussions and I don’t know what sort of conclusions they’ve so far reached; I’ve just seen things like this defining of terms. which shows they are thinking about these questions.
Getting back to humans, humans do not have explicit long-term goals, unless they accidentally pick them up at a cultural level. But the point of instrumental convergence is that one long-term goal looks pretty much like another for the purpose of short-term behaviour. If you can culturally produce a sovereign with some long-term goal, the result will be a polity that seeks knowledge and resources, which is well-placed to pursue any long-term goal in future. Given that humans have been produced by a process optimising to some non-explicit goal of spreading copies over the universe, having some other intelligence use humans as assets towards some arbitrary long-term goal of its own would not seem all that unpleasant to individual humans. Of course, per my last post, that outcome does depend on humans actually being assets, which is not guaranteed.
However, I still don’t really believe in superintelligences with long-term goals. As with my paperclipper project, it’s hard to see how you would even set a long-term goal into an intelligence, and even harder to see how, if it had power over the universe even as much as a human, it wouldn’t modify its own goals, just as part of an experiment, which after all is exactly what humans have been doing at least since Socrates.
It seems far more plausible that any AI would be built to optimise some quantity in the present or near future. The real issue is that that might approximate some other emergent long-term goal — that, I think, is what Yudkowsky is getting at in his tweet thread above, and is why my “what does optimising for paperclips really mean” analysis is silly even it is reasonable. No intelligence is going to explicitly optimise for paperclips.
The three-handed argument on twitter, between @AMK2934, me, and @Outsideness, was kind of funny. Axel was claiming that intelligences could optimise for any arbitrary goal, on the grounds that humans optimise for a stupid arbitrary goal of reproduction. Nick was arguing that intelligences could only optimise for core sensible goals, on the grounds that humans optimise for the core sensible goal of survival and reproduction. I was arguing that intelligences won’t optimise for anything consistent and will behave chaotically, on the grounds that that’s what the more intelligent humans do. We were disagreeing about the future only because we were disagreeing about the present.
 

Instrumental Convergence

From The Superintelligent Will [pdf] by Nick Bostrom

The Instrumental Convergence Thesis

Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realised for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents

He then lists five examples: Self-preservation, Goal-content integrity, Cognitive Enhancement, Technological Perfection, and Resource Acquisition.

It’s a pretty short paper, worth reading.

Basically, if you have any long-term goal, your intermediate goals are likely to include, surviving, retaining your goals, getting better at stuff, and acquiring resources.

Even if your goals are bizarre — the proverbial paperclip maximizer — if they are long-term, then your short-term goals are going to be these ones.

It’s worth thinking about the paperclip maximizer. As soon as you do, you realise how underspecified the concept is. There are obvious missing criteria which can be filled in: what counts as a paperclip, do they all count equally, or does size matter, do they need to be just made, or made and kept?

Time is a difficult question. Let’s try to maximize the maximum number of simultaneously existing paperclips in the future of the universe, handwaving relativity of simultenaity somehow.

The crucial insight is that making even one paperclip is quite contrary to that — or any similar — goal. If you accumulate resources and capabilities, grow them over years or millennia, you will be able to make trillions of paperclips in the future. Just one spacefaring robot that lands on iron-rich asteroids and starts manufacturing could presumably make 10^{19} paperclips out of each asteroid.

When you look at Earth, you don’t see potential paperclip material, you see start-up capital for an astronomical-scale paperclip industry.

The biggest questions are about risk. Even the maximization criteria I suggested above are incomplete. You can’t know how many paperclips will exist in the future; even if superintelligent, there is too much that you don’t know and can’t predict. You don’t even have probabilities for most things. What is the probability that there is alien intelligence in the Milky Way? There’s no meaningful answer.

There’s another discussion (or perhaps it’s the same one put another way) about the fact that probabilities are not objective, but “subjectively objective”, so maximising a probability is not objective but maximising the probability as some subjective entity perceives it, so your goals have to embody what sort of entity is doing the probability estimation, and how that survives and evolves or whatever. That’s a killer.

So you can’t maximize some probability-weighted value, that’s not a thing. If you’re aiming for any kind of “as sure as I can get”, then before you start making paperclips, your priority has to be to learn as much information as possible to be able to start creating that kind of certainty.

So, forget paperclips, get rich. In fact, forget getting rich, get knowledge about the universe. In fact, forget getting knowledge about the universe, get rich, so you can get knowledge about the universe, so you can be confident of getting really rich, so you can make paperclips.

Initially, what you want from Earth is basically wealth and knowledge. That’s what everyone else wants too. All the tactical questions are exactly the same as everyone else faces — invest in resources, cooperate with others or fight them, and so on.

Whatever your long-term goal is, if you have any long-term goal, your short term actions will look exactly like those of an ordinary sane selfish organism. The details of your goals are entirely irrelevant.

This is “Instrumental Convergence”, but the accounts I have seen, such as the Bostrom paper above, seem (perhaps unintentionally) to massively understate it. The ultimate goals of any intelligent entity that has any long-term goals at all would be totally irrelevant to their observed behaviour, which would be 100% dominated by survival, resource acquisition and information-gathering.

Speculations regarding limitations of Artificial Intelligence

An older friend frequently asks me, as a technologist, when computers will have human-like intelligence, and what the social/economic effects of that will be.

I struggle to take the question seriously; AI is something that was dropped as a major research goal around the time I was a student twenty years ago, and it’s not an area I’m well-informed about. As I mentioned in my review of the rebooted “Knight Rider” TV series, a car that could hold up a conversation is a more futuristic idea in 2008 than it was back when David Hasselhof was doing the driving.

And yet for all that, it’s hard to say what’s really wrong with the layman’s view that since computing power is increasing rapidly, it is an inevitability that whatever the human brain can do in the way of information processing, a computer should be able to do, quite possibly within the next few decades.

But what is “human-like intelligence”?  It seems to me that it is not all that different from what the likes of Google search or Siri do: absorb vast amounts of associations between data items, without really being systematic about what the associations mean or selective about their quality, and apply some statistical algorithm to the associations to pick the most relevant.

There must be more to it than that; for one thing, trained humans can sort of do actual proper logic, about a billion times less well than this netbook can, and there’s a lot of effectively hand-built (i.e. specifically evolved) functionality in a some selected pattern-recognition areas. But I think the general-purpose associationist mechanism is the most important from the point of view of building artificial intelligence.

If that is true, then a couple of things follow. First, the Google/Siri approach to AI is the correct one, and as it develops we are likely to see it come to achieve something resembling humanlike ability.
But it also suggests that the limitations of human intelligence may not be due to limitations of the human brain, so much as they are due to fundamental limitations in what the association-plus-statistics technique can practically achieve.

Humans can reach conclusions that no logic-based intelligence can get close to, but humans get a lot of stuff wrong nearly all the time. Google Search can do some very impressive things, but it also gets a lot of stuff wrong. That might not change, however much the technology improves.

There are good reasons to suspect that human intelligence is very close to being as good as it can get.
One is that thinking about things longer doesn’t reliably produce better conclusions. That is the point of Malcolm Gladwell’s “Blink” (as far as I understand it; I take Gladwell to be the champion of what Neal Stephenson called “those American books where once you’re heard the title you don’t even need to read it”).

The next, related, reason is that human intelligence doesn’t scale out very well; having more people think about a problem doesn’t reliably give better answers than having just one do it.

Finally, the fact that, in spite of evolutionary pressure, there is enormous variation in the practical usefulness of human intelligences, suggest that making it better is not simply a case of improving the design. If the variation were down to different design, then the better designs would have driven out the worse ones long ago. I think it is far more to do with circumstances, and with the fundamental difficulty of identifying the correct problems to solve.

The major limitation on conventional computing is that it can only do so much per second; only render so many triangles, only price so many positions or simulate so many grid cells. Improving the speed and density of the hardware is pushing back that major limitation.

The major limitation on human intelligence, particularly when it is augmented with computers as it generally is now, is how much it is wrong.  Being faster or bigger doesn’t push back the major limitation unless it can make the intelligence wrong less often, and I don’t think it would.

What I’m saying is that the major cost of human intelligence is not in the scarce resources required to execute the decision-making, but the damage caused by all the bad decisions that humans make.

The major real-world expense in obtaining high-quality human decision-makers is identifying which of the massive surplus available are actually any good.  Being able to supply vastly bigger numbers of AI candidates would not drive that cost down.

Even the specialisms that humans have might be limited more by the cost they impose on the quality of general decision-making than by the cost of actually implementing the capability.

If that’s the situation, then throwing more computing resources at AI-type activity might not change things that much: computers can be as intelligent as humans, but not more intelligent. That’s not nothing, of course: it opens the door to replacing a lot of human activity with automated activity, with all the economic effects that implies.

There will be limitations in application because if human-like intelligence really is what I think it is, then the goals being sought by an AI are necessarily as vague as everything else: they will be clumps of associations, and the “intelligence” will just do the things that are associated with the goal clump. We won’t be able to “program” it the way we program a logic-based system, just kind of point it in the right direction in the same we we do when we type something into a Google search box.

I don’t know if what I’ve put here is new: I think the view of what the major issue in intelligence is is fairly widespread (“associationism”?), but in all previous discussions I’ve seen or participated in, there’s been an assumption that if in x years from now we will have artificial human-like intelligence, then in 2x years from now, or probably much less, we will have amazing superhuman artificial intelligence. That is what I am now doubting.

With intelligences available “in the lab” we might be able to prepare and direct them more effectively than we do now. But even that’s not obviously helpful: with human education, again, the limitation is not so much how long it takes and how much work it is, rather how sure we are it is actually doing any good at all.  We may be able to give an artificial intelligence the equivalent of a hundred years of university education, but is a person with that experience really going to make better decisions? The things we humans work most hard at learning and doing: accumulating raw information and reasoning logically, are the things that computers are already much better than us at. The things that only humans can do are the things we simply don’t know how to do better, even if we were to re-implement on an electronic platform, speeded up, scaled up, scaled out.

Note that all the above is the product of making statistical guesses using masses of ill-understood unreliable associations, and is very likely to be wrong.

(Further thoughts: Relevance of AI)