richard_ngo

Former AI safety research engineer, now PhD student in philosophy of ML at Cambridge. I'm originally from New Zealand but have lived in the UK for 6 years, where I did my undergrad and masters degrees (in Computer Science, Philosophy, and Machine Learning). Blog: thinkingcomplete.blogspot.com

Sequences

EA Archives Reading List

Wiki Contributions

Comments

AGI safety from first principles

Ah, I like the multiagent example. So to summarise: I agree that we have some intuitive notion of what cognitive processes we think of as intelligent, and it would be useful to have a definition of intelligence phrased in terms of those. I also agree that Legg's behavioural definition might diverge from our implicit cognitive definition in non-trivial ways.

I guess the reason why I've been pushing back on your point is that I think that possible divergences between the two aren't the main thing going on here. Even if it turned out that the behavioural definition and the cognitive definition ranked all possible agents the same, I think the latter would be much more insightful and much more valuable for helping us think about AGI.

But this is probably not an important disagreement.

AGI safety from first principles

Ah, I see. I thought you meant "situations" as in "individual environments", but it seems like you meant "situations" as in "possible ways that all environments could be".

In that case, I think you're right, but I don't consider it a problem. Why might it be the case that adding more compute, or more memory, or something like that, would be net negative across all environments? It seems like either we'd have to define the set of environments in a very gerrymandered way, or else there's something about the change we made that lands us in a valley of bad thinking. In the former case, we should use a wider set of environments; in the latter case, it seems easier to bite the bullet and say "Yeah, turns out that adding more of this usually-valuable trait makes agents less intelligent."

AGI safety from first principles

One thing I'm confused about is whether Legg's definition (or your rephrasing) allows for situations where it's in principle possible that being smarter is ex ante worse for an agent (obviously ex post it's possible to follow the correct decision procedure and be unlucky).

There definitely are such cases - e.g. Omega penalises all smart agents. Or environments where there are several crucial considerations which you're able to identify at different levels of intelligence, so that as intelligence increases, your success increases and decreases.

But in general I agree with your complaint about Legg's definition being defined in behavioural terms, and how it'd be better to have a good definition of intelligence in terms of the cognitive processes involved (e.g. planning, abstraction, etc). I do think that starting off in behaviourist terms was a good move, back when people were much more allergic to talking about AGI/superintelligence. But now that we're past that point, I think we can do better. (I don't think I've written about this yet in much detail, but it's quite high on my list of priorities.)

AGI safety from first principles

I intended mine to be a slight rephrasing of Legg and Hutter's definition to make it more accessible to people without RL backgrounds. One thing that's not obvious from the way they use "environments" is that the goal is actually built into the environment via a reward function, so describing each environment as a "task" seems accurate.

A second non-obvious thing is that the body the agent uses is also defined as part of the environment, so that the agent only performs the abstract task of sending instructions to that body. A naive reading of Legg and Hutter's definition would interpret a stronger agent as being more intelligent. Adding "cognitive" I think rules this out, while also remaining true to the spirit of the original definition.

Curious if you still disagree, and if so why - I don't really see what you're pointing at with the Raven's Matrices example.

On the limits of idealized values

Fantastic post. A few scattered thoughts inspired by it:

If you aren’t trying to conform to some standard, than how can you truly, and non-arbitrarily, choose?

Why does our choice need to be non-arbitrary? If we take certain intuitions/desires/instincts as primitives, they may be fundamentally arbitrary, but that's because we  are unavoidably arbitrary. Yet this arbitrary initial state is all we have to work from.

What’s needed, here, is a type of choice that is creating, rather than trying to conform — and which hence, in a sense, is “infallible.”

It feels like infallible is the wrong type of description here, for the same reason that it would be odd to say that my taste in food is infallible. At a certain level the predicate "correct" will stop making sense. (Maybe that level isn't the level of choices, though; maybe it's instincts, or desires, or intuitions, or tastes - things that we don't see ourselves as having control over.)

richard_ngo's Shortform

There's an old EA forum post called Effective Altruism is a question (not an ideology) by Helen Toner, which I think has been pretty influential.*

But I was recently thinking about how the post rings false for me personally. I know that many people in EA are strongly motivated by the idea of doing the most good. But I was personally first attracted to an underlying worldview composed of stories about humanity's origins, the rapid progress we've made, the potential for the world to be much better, and the power of individuals to contribute to that; from there, given potentially astronomical stakes, altruism is a natural corollary.

I think that leaders in EA organisations are more likely to belong to the former category, of people inspired by EA as a question. But as I discussed in this post, there can be a tradeoff between interest in EA itself versus interest in the things EA deems important. Personally I prioritise making others care about the worldview more than making them care about the question: caring about the question pushes you to do the right thing in the abstract, but caring about the worldview seems better at pushing you towards its most productive frontiers. This seems analogous to how the best scientists are more obsessed with the thing they're studying than the downstream effects of their research.

Anyway, take all this with a grain of salt; it's not a particularly firm opinion, just one personal perspective. But one longstanding EA I was talking to recently found it surprising, so I thought it'd be worth sharing in case others do too. 


* As one datapoint: since the EA forum has been getting more users over time, a given karma score is more impressive the older a post is. Helen's post is twice as old as any other post with comparable or higher karma, making it a strong outlier.

Why should we *not* put effort into AI safety research?

Drexler's CAIS framework attacks several of the premises underlying standard AI risk arguments (although iirc he also argues that CAIS-specific safety work would be valuable). Since his original report is rather long, here are two summaries.

Should you do a PhD in science?

I suspect 1/3 is a significant overestimate since US universities attract people who did their PhDs all across the world.

Why AI is Harder Than We Think - Melanie Mitchell

I was pleasantly surprised by this paper (given how much dross has been written on this topic). My thoughts on the four fallacies Mitchell identifies:

Fallacy 1: Narrow intelligence is on a continuum with general intelligence

This is hard to evaluate, since Mitchell only discusses it very briefly. I do think that people underestimate the gap between solving tasks with near-infinite data (like Starcraft) vs low-data tasks. But saying that GPT-3 isn't a step towards general intelligence also seems misguided, given the importance of few-shot learning.

Fallacy 2: Easy things are easy and hard things are hard

I agree that Moravec's paradox is important and underrated. But this also cuts the other way: if chess and Go were easy, then we should be open to the possibility that maths and physics are too.

Fallacy 3: The lure of wishful mnemonics

This is true and important. My favourite example is artificial planning. Tree search algorithms are radically different from human planning, which operates over abstractions. Yet this is hard to see because we use the same word for both.

Fallacy 4: Intelligence is all in the brain

This is the one I disagree with most, because "embodied cognition" is a very slippery concept. What does it mean? "The representation of conceptual knowledge is ... multimodal" - okay, but CLIP is multimodal.

"Thoughts are inextricably associated with perception, action, and emotion." Okay, but RL agents have perceptions and actions. And even if the body plays a crucial role in human emotions, it's a big leap to claim that disembodied agents therefore can't develop emotions.

Under this fallacy, Mitchell also discusses AI safety arguments by Bostrom and Russell. I agree that early characterisations of AIs as "purely rational" were misguided. Mitchell argues that AIs will likely also have emotions, cultural biases, a strong sense of selfhood and autonomy, and a commonsense understanding of the world. This seems plausible! But note that none of these directly solves the problem of misaligned goals. Sociopaths have all these traits, but we wouldn't want them to have superhuman intelligence.

This does raise the question: can early arguments for AI risk be reformulated to rely less on this "purely rational" characterisation? I think so - in fact, that's what I tried to do in this report.

Some quick notes on "effective altruism"

Well, my default opinion is that we should keep things as they are;  I don't find the arguments against "effective altruism" particularly persuasive, and name changes at this scale are pretty costly.

Insofar as people want to keep their identities small, there are already a bunch of other terms they can use - like longtermist, or environmentalist, or animal rights advocate. So it seems like the point of having a term like EA on top of that is to identify a community. And saying "I'm part of the effective altruism community" softens the term a bit.

around half of the participants (including key figures in EA) said that they don’t self-identify as "effective altruists"

This seems like the most important point to think about; relatedly, I remember being surprised when I interned at FHI and learned how many people there don't identify as effective altruists. It seems indicative of some problem, which seems worth pursuing directly. As a first step, it'd be good to hear more from people who have reservations about identifying as an effective altruist. I've just made a top-level question about it, plus an anonymous version - if that describes you, I'd be interested to see your responses!

Load More