The Differential Cognitive Ability hypothesis

Posted on 2024-04-22

[2024-04-22 Mon]

  1. GPTs are Predictors, not Imitators - Effective Altruism forum viewer

    1. It seems like Paul Christiano’s core bet is about differential cognitive abilities

      To illustrate, we can imagine asking the model to either (i) predict the outcome of a news story, (ii) predict a human thinking step-by-step about what will happen next in a news story. To the extent that (ii) is smarter than (i), it indicates that some significant part of the model’s cognitive ability is causally downstream of “predict what a human would say next,” rather than being causally upstream of it. The model has learned to copy useful cognitive steps performed by humans, which produce correct conclusions when executed by the model for the same reasons they produce correct conclusions when executed by humans.

      (In fact (i) is smarter than (ii) in some ways, because the model has a lot of tacit knowledge about news stories that humans lack, but (ii) is smarter than (i) in other ways, and in general having models imitate human cognitive steps seems like the most useful way to apply them to most economically relevant tasks.)

      Of course in the limit it’s overdetermined that the model will be smart in order to predict what a human would say, and will have no use for copying along with the human’s steps except insofar as this gives it (a tiny bit of) additional compute. But I would expect to AI to be transformative well before approaching that limit, so that this will remain an empirical question.

      1. And this makes a lot of sense: if a model can have intense cognitive abilities in one domain (ML research) and pretty bad cognitive abilities in another domain (deception), then it makes a lot of sense to be more optimistic about stuff like AI control

      2. And this is where Paul Christiano seems to disagree with Eliezer: Eliezer’s usual focus / claim is about this ‘sharp left turn’, or a convergent attractor for capabilities

      3. Essentially Paul thinks that the model can better simulate a human

    2. Jan points at cognitive trade-offs

      The boundedness is overall a central concept here. Neither humans nor GPTs are attempting to solve ‘how to predict stuff with unlimited resources’, but a problem of cognitive economy—how to allocate limited computational resources to minimise prediction error.