Posted on 2024-05-06

When I haven’t figured out where to move the notes to yet.

[2024-04-30 Tue]

  1. There’s an interesting trap I’ve noticed that one can fall into, when it comes to alignment research. It involves being somewhat nerdsniped

    1. So my last question was something like “is agent foundations a nerdsnipe?” and while I don’t think this is the case usually (having a better understanding of what is going on in general is good), there’s a tendency to go down rabbitholes that I’ve noticed in me

    2. Specifically, I seem to devote an inordinate amount of time asking “what if?” questions and uh, being vulnerable to arguments that try to mess with foundational assumptions

      1. So for example, there’s one strain of thought a few acquaintances of mine are exploring, that essentially involves some form of moral realism

        1. I sympathize with the sentiment, and I do think there’s some merit to the research agenda (and do believe they are onto something), but the number of concepts they leverage and attempt to build upon, to help one understand what is going on, is huge

          1. On one hand, it seems important to not be confused about fundamental things, since they seem to be crucial considerations that have a massive impact on the foundational parts of your strategy

          2. On the other hand, it seems likely that you can get lost forever down a rabbit hole of such things – there can exist memeplexes that are optimized to be sinkholes of human attention and computation, and replicate somehow

            1. Age of Attention - SDr is relevant here

            2. The point is that in a world where information and attention is as abundant as we see it is right now, it does make sense that the evolutionary pressure on memeplexes is massively increased

            3. I don’t really have a good current strategy to deal with things

              1. A good stopgap, given that your memetic immune system is compromised, would be to minimize your use of the internet.

              2. A more useful strategy would probably be to notice memeplexes and their threads and their tendrils as they try to manipulate you, and then figure out what to do about it

                1. Valentine has talked about this before, although the way he orients to dealing with dangerous memeplexes is to create a fundamental shift in the foundational substrate of his mind, such that the memes that are selected for (because he pays attention to them) are the ones that are beneficial to him and to his values

                  1. But for this, you need to have reflection, you need your mind to pay attention to itself and to reflect upon its exepriences, and to gauge how things are and were going. Without that, one cannot get enough feedback for this to work

                  2. So this is one strong argument for meditating daily for a tiny amount of time. Or reflecting on paper. Or just taking a walk alone. The point is to have the space to think.

    3. The core point I want to emphasize here is that there’s a sort of cognitive strategy that involves cutting through to the core of the thing that matters, and this would likely make it easier for you to have some sort of coherence even when dealing with complicated belief clusters that try to convince you of psyop-like contractions (such as “suffering is good actually!”, as Malcolm Ocean put it in some tweet)

      1. I think cutting partially involves some level of letting go, of forgetting, of rebuilding things in your mind from scratch, of trying to reason ‘from first principles’

      2. And math! Math, at the very least, seems to be the most solid set of abstractions one can rest their belief clusters upon

      3. I assume that if one sat down and spent half an hour ‘meditating’, their brain would automatically be drawn to certain things, and an optimistic assumption is that processing these things would lead to good outcomes as predicted by the brain

        1. On the other hand, one can find themselves mostly thinking about stuff unrelated to their work, and if so, one could put a time-box where they allow themselves to explore the things their brain cares about, in a certain context

        2. Although, you know, you could also do it while thinking ‘on paper’, although if you are doing it on a device connected to the internet, I anticipate that you’ll be distracted quite easily

    4. I think that another thing that may be involved is a sort of mistrust of your own epistemics

      1. Yeah I think this may play a huge role in being vulnerable to this class of memeplexes

      2. I think there’s some sense in feeling this way though. As Yud once said, most people seem to be unable to derive all the AGI risk arguments “from the empty string”.

        1. And of course, it is quite impressive how supposedly smart people such as LeCun have such interesting beliefs about AGI risk (well, ASI risk).

      3. I call this the “But what if I’m wrong?” issue.

        1. I think someone even explicitly mentioned this to me during MATS as part of a joke about how I relate to research projects.

        2. I think this may partially be a sort of intense loss aversion? An aversion to waste.

          1. A good reminder to oneself when one notices they are in such a situation would be: “The optimal amount of waste isn’t zero.” Although you can use the desire to not have this happen again to figure out systematic root cause fixes, not patch fixes.

          2. Okay yeah, this is also a very good point. The “But what if I’m wrong?” seems to be some sort of generalized patch fix of worry as the impetus for better decision making. A systematic fix would have been to, in every situation where I felt intense regret / guilt / shame / self-hate, would have been to do a sort of root-cause analysis. Hard to do that when your parents are screaming at you though.

          3. A better way to orient to the “But what if I’m wrong?” question would be to notice that you got information you didn’t have before, and that this is valuable.

            1. It makes sense to optimize for maximizing the value of the information you gain from your actions or projects, although you would have to balance that with other priorities such as gaining resources and maintaining the system that is you.

[2024-05-02 Thu]

  1. Let’s talk takeoff speeds

    1. Takeoff speeds seem like they obscure more than they enlighten

    2. Especially due to the inherent proxy-ness of them

      1. They are in essence a sort of proxy of what we think constitutes takeoff, which itself is a rough proxy for what we think indicates progress towards the building of an ASI

      2. Given this, it seems clear that takeoff speeds, while they seem to have policy implications (since they are supposedly easily communicable in terms of real-world impact in the moment and in the near future, to people who have political power), they can be very dissociated from tracking the mechanisms underlying why there is takeoff happening

        1. gwern posits, for example, a sequence of sigmoidal breakthroughs, each of which is discontinuous, but contributes overall to a seemingly smooth curve

          1. So the question is: what is this takeoff speed variable useful for? What are you using it for?

        2. Continuity in the aggregate is not necessarily caused by continuity in its constituent parts

        3. Heck, even discontinuity in the aggregate is not necessarily caused by discontinuity in its constituent parts

        4. Even more important, there are variables that are downstream of cybernetic feedback loops that seek to maintain its value, and not investigating this causal factor can lead you to have an incorrect understanding of what the variable is telling you

          1. See Yud’s quote on this

            Physics is continuous but it doesn’t always yield things that “look smooth to a human brain”. Some kinds of processes converge to continuity in strong ways where you can throw discontinuous things in them and they still end up continuous, which is among the reasons why I expect world GDP to stay on trend up until the world ends abruptly; because world GDP is one of those things that wants to stay on a track, and an AGI building a nanosystem can go off that track without being pushed back onto it.

          2. Also Nick Land’s model of capital and technological advancements as a runway positive feedback loop

  2. Note that I haven’t really been tracking the discontinuities inherent to capabilities as well as I would want

    1. I don’t think I had an implicit smooth curve in my head for capabilities, sure, but my explicit model of capabilities seemed to rely on some level of ‘scale up and you’ll get better performance’, which seems correct on the outside / behavior level, but that doesn’t track what is going on inside, and Eliezer’s model seems to care a lot about the discontinuities inherent to how those capabilities unfurl

  3. I wonder what underlies the intuition that people have that involves modeling technological progress as a conjunctive endeavour instead of a sequence of disjunctive breakthroughs, such that they feel satisfied by claiming that one concrete scenario for a technology to exist is impossible therefore that technology will not exist

  4. The notion of ‘local coherence’ seems to be quite valuable in modeling what is ‘selected for’ by evolution and learning, and consequently what we lose when we have mind-shattering or myopia

    1. Integration is a process that has a consequence of increasing coherence across inputs / environments / time / scenarios

    2. Similarly, natural selection selects for more and more (locally) coherent patterns

      1. And it seems like there’s a phase shift at a point where this coherent pattern thing can increase the breadth of its coherence by itself (eg. in-lifetime learning), which seems to be built on top of selection at a finer level (eg. selection over world models, selection over cells)

      2. And it seems like there’s another phase shift at a point where this coherent pattern can generally increase the breadth of its coherence by itself (eg. humans versus chimps)

  5. I’d like to write about the concept of the locality of coherence

    1. The intention is to provide a more useful handle for the sort of abstraction that people can use instead of autistically constantly using utility functions for every goddamn thing that involves systems that exhibit agent-like behavior

      1. This is useful in the same way machines with lower expressive power is useful, having finer grained distinctions between ‘coherence’ is valuable, and the notion of ‘local coherence’ track a very important point about how systems with agent-like behavior empirically come about.

[2024-05-03 Fri]

  1. If coherence is always local, then alignment can also only be local, in domains where the parties involved are coherent (or act as if they have coherent preferences)

[2024-05-06 Mon]

  1. You know, there’s a move that some people make when it comes to ‘bayesian updating on evidence’, that I think is pretty sub-optimal, especially in our current situation

    1. I recall talking to M. and she said that what we are seeing today seems like some evidence for ‘slow takeoff’, whatever that means

      1. I assume she was talking about Paul Christiano’s models of how things go

    2. The problem here is that people are taking some evidence, and then they are bundling up massive causal models and then upweighing or downweighing evidence on them via black-boxing them

      1. This seems like a bad move, especially since you are throwing away valuable information you already have about the world, and throwing away causal information seems particularly dangerous to me

      2. Given one simple causal model describing reality and another convoluted causal model (perhaps with ‘God’ or something) describing reality, in general, even if evidence seems to prefer the latter over the former, it does seem like a bad idea to ‘update’ towards a God-based world

      3. Worse, the causal model described by Eliezer may have been incorrect about how much mundane utility we’ll see pulled out of AI models, but people aren’t asking “Why?”, they are just updating

        1. ooooh, I’m uuuupdatinggggg

          1. updaters

        2. If they asked “Why?” they’d at least be able to construct sensible causal models of what is going on and then ‘update’ appropriately

          1. Note that the arguments in IEM are extremely solid, and the concrete things we shall see leading up to the creation of a consequentialist cognition taking over the world is not really talked about

          2. And we haven’t had any evidence (that I have in mind as of writing, that is) countering IEM’s arguments

          3. Exponential curves really mess with people’s intuitions, I think

            1. You don’t really get much evidence as to whether you are on a linear curve or an exponential curve or a double exponential curve – in general you have to extrapolate based on abstract reasoning

            2. You can make concrete predictions, but it still seems pretty difficult

    3. I now think that the intuition that P. and T. have and share makes some sense

    4. Paul Christiano seems to think that if companies are incentivized to reduce ‘overhang’, the curve will be less and less exponential

      1. This actually does make sense to me

      2. However I believe that the exponential curve comes out of this drive to use capital and intelligence and technology to get more of these, and reducing ‘overhang’ seems to play right into this dynamic

        1. Essentially, the ‘reduce overhang’ strategy seems to be priced in

    5. Let’s consider Quintin Pope / AI optimists arguments

      1. I like that the “evolution provides no evidence […]” post attempted to model the causal dynamics of SGD and evolution, it makes a lot of difference to analyze things at that level

      2. The core argument here seems to be “Hey it seems like you would have more optimistic priors about how likely it is for us to create aligned near-term-AGI models, given the level of fine control we have and can leverage using SGD, if you considered the disanalogies between evolution and SGD”

        1. Which is a pretty good argument!

        2. In fact, I expect this is one source of the intuition behind Alex Turner’s model of shard theory (specifically Alex because I don’t have a good enough model of Quintin’s thoughts on these things)

        3. But I think this does not extend to models that have consequentialist cognition, those that can do abstract reasoning and can hide their cognition and make it less and less interpretable (see gwern’s steganography comment)

          1. on top of that, for AGI models that are doing tasks that involve them working in new domains entirely (see the Gillen and Barnett post), and learning things, they’d very likely have alien ontologies that we wouldn’t have good enough understanding of, and therefore won’t have as much fine control as this intuition seems to provide

    6. I think maybe most people probably don’t have a very good causal model of what is going on and therefore end up using black-box-like abstractions to assign evidence weights to

      1. Or perhaps people ‘defer’ (see Tsvi’s post) to others and don’t attempt to build the causal models involved to update effectively, which is understandable, although it seems to systematically lead people to worse models of reality in certain domains such as ones with exponential growth

        1. Perhaps neurosis is an effective adaptation in response to being unable to systematically have a model of such things

        2. Although I think that in total it is better to have systematic updates to causal models of reality instead of having neuroses, for being able to understand and steer reality the way you desire

    7. Note add link to Karnofsky post on exponential curve?

      1. Well, maybe don’t just dump links onto people, maybe don’t do it

    8. Sidenote: perhaps the point of what Yud, MIRI, and Connor are doing is to attempt to influence the elites – you don’t have to influence a broad spectrum of the public, but if you can build up some level of momentum and influence and talk in public so you make the ideas palatable, and then mainly focus on convincing the elites, I assume that if the elite ideology shifts, then maybe that combined with some level of strategy-to-convince-the-masses will result in a ‘win’

    9. Sidenote

      1. here’s a quote or two highlighting the extent to which ‘insights’ seem to be cheap in the cutting-edge of a research field while empirical evidence is what provides something real value

        I think it’s often challenging to just understand where the frontier is, because it’s so far and so many things are secret. And if you’re not at a scaling lab and then also don’t keep up with the frontier of the literature, it’s natural to overestimate the novelty of your insights. And then, if you’re too scared to investigate your insights, you might continue to think that your ideas are better than they are. Meanwhile, as an AI Safety researcher, not only is there a lot less distance to the frontier of whatever subfield you’re in, you’ll probably spend most of your time doing work that keeps you on the frontier.

        Random insights can be valuable, but the history of deep learning is full of random insights that were right but for arguably the wrong reasons (batch/layernorm, Adam, arguably the algorithm that would later be rebranded as PPO), as well as brilliant insights that turned out to be basically useless (e.g. consider a lot of the Bayesian neural network stuff, but there’s really too many examples to list) if not harmful in the long run (e.g. lots of “clever” or not-so-clever ways of adding inductive bias). Part of the reason is that people don’t get taught the history of the field, and see all the oh-so-clever ideas that didn’t work, or how a lot of the “insights” were invented post-hoc. So if you’re new to deep learning you might get the impression that insights were more causally responsible for the capabilities advancements, than they actually are. Insofar as good alignment requires deconfusion and rationality to generate good insights, and capabilities does not, then you should expect that the insights you get from improving rationality/doing deconfusion are more impactful for alignment than capabilities.

        I mean, if you actually do come up with a better initialization scheme, a trick that improves GPU utilization, or some other sort of cheap algorithmic trick to improve training AND check it’s correct through some small/medium-scale empirical experiments, then sure, please reconsider publishing that. But it’s hard to incidentally do that—even if you do come up with some insight while doing say, mech interp, it feels like going out of your way to test your capability ideas should be a really obvious “you’re basically doing capabilities” sign? And maybe, you should be doing the safety work you claim to want to do instead? – Please stop publishing ideas/insights/research about AI - LessWrong 2.0 viewer

        1. This is mainly why gwern is more ‘empirical’ than insights-focused: his experience seeing how many times some supposed improvement is publicized and it turns out to be garbage after empirical testing

          How new are you to AI?

          Not very.

          Literally every area you just mentioned has had a new replacement that allowed all around improvements to the previous methods in the past 3-6 years.

          I didn’t say that improvements never happened. I said that in these three areas in particular, people pursue the white whale of improving over them, only to be shocked at how hard it turns out to beat them (ie. much harder than fooling yourself or a lot of other people), and that for this reason people should not care about them until they are much better validated than they usually are in the proposed paper.

          most networks used to use tanh or sigmoid functions until eventually Relu was presented and was noticed to provide better all around results across domains

          Which took something like 20-30 years, and it took a similar period to go from the earliest connectionist work which insisted on non-differentiable NNs with 0/1 activations to imitate spiking networks in the 1950s to using tanh/sigmoid in the 1980s PDP revolution.

          and now the state of the art general methods being used commonly by the most established labs is cosine learning rate schedule

          Way I recall it from back then when I was still bothering to read LR scheduling papers, cosine was 8 years ago in 2016, not 3-6 years ago. (I also consider cosine to just be a specific kind of cyclical learning rate schedule, 2015.)

          And then another method was found to later have even better results than Relu and that’s SwiGLU, and now much of the state of the art industry uses swiglu for general improvements such as Llama models. SwiGLU is now widely adopted over the past 12 months by many of the largest AI organizations.

          You mean SwiGLU (4 years ago in 2020, not 12 months ago) had better results than GELU (8 years ago in 2016); but yes, it has been widely adopted… So that is why it is worth discussing, whereas the original SwiGLU paper was not (especially given how minimal its benchmarking was, resting mostly on Shazeer’s well-earned reputation as a DL genius - although it did give us the unusually honest & immortal Conclusions section: “We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence”).

          Flash attention proposed by Tri dao is a great example too of “free lunch” actually working, he proposed a method to optimize transformer operations that claimed to make GPUs train transformers twice as fast, and many people said the same thing “that’s not possible, free lunch isn’t possible”

          I don’t know who those people were. It wasn’t anyone I recall - I only recall immediate wild excitement, which was subsequently fully justified. The whole point of FlashAttention was that it computed the same dense quadratic attention but in a cache-friendly and otherwise superior manner which was so efficient it computed long after the default OOMed; it’s hard to be wrong about that. And it wasn’t. I believed in FlashAttention the day I read it, and expected it to be adopted widely. Nor was it all that improbable, because when real programmers optimize memory access, it is common to observe FlashAttention-like speedups: a lot of algorithms are orders of magnitude slower than they have to be. (And so you would not be skeptical about FlashAttention in the way you would be skeptical about all of the many, many linear or other efficient attentions, all of which have disappointed in much the same way and could be added to my list as ‘#4. someone proposes a new sub-quadratic attention and claims it’s much better; they omit LRA or other benchmarks in favor of a new measurement they made up.’)

          Yes ofcourse there is going to be many papers that propose methods such as lion that dont actually provide general all around improvements and aren’t good enough to replace the state of the art, there is dozens or even hundreds of papers on different optimizer proposals that aren’t better than the current methods

          I’m glad we agree. I would just delete ‘dozens’ - it’s definitely at least hundreds, and probably more like thousands. And you would agree that we shouldn’t have thousands of random ML submissions to /r/slatestarcodex. (Maybe to a ML subreddit, but not /r/SSC.)

          KAN is especially unique since its not like optimizers where a new similar thing is proposed all the time

          It seems like a new learnable activation function, and those are proposed all the time. (Or at least, used to be pre-2020. I’ll admit, there seems to have been a lot fewer activations proposed since then. Even SwiGLU was 2020.) – Kolmogorov-Arnold Networks Paper - r/slatestarcodex

        2. Although note that this is mainly downstream of misaligned incentives between researchers and the community as a whole

        3. Even so, it does make sense to consider empirical evidence quite valuable

        4. Note that the problem is that we don’t yet have many viable ideas about alignment, not as much as we have for capabilities

          1. Although I guess the idea here is that funding people doing interp (for example) is both creating new insights and validating them empirically

          2. On the other hand, this incrementalist approach seems unlikely to make much difference, but maybe I am missing something about the use of this

            1. IIRC the point made by Neel Nanda and Rohin Shah is that they use these methods to build up insights that would eventually lead people to stumble upon theories, and I assume they think that trying to do things from the agent foundations route is significantly harder to converge to insights, or perhaps they just aren’t specialized in that route

  2. Reading 2021 MIRI Conversations

    1. Discussion with Eliezer Yudkowsky on AGI interventions

      1. Why is alignment not subject to the same ‘garden of forking paths’ argument that makes sense for capabilities increase?

        1. Oh, we don’t even have a theoretical understanding of how to do this. Capabilities increase on the other hand seems quite plausible.

          1. No, this feels not an actual argument

          2. Yes, we don’t have a theoretical understanding of alignment. Do we have a theoretical understanding of increase of capabilities?

            1. Scaling hypothesis? Number go up?

            2. Algorithmic advancements?

            3. Oh, you have a feedback loop with reality, there’s a gigantic amount of information reality provides as feedback that you can use to increase capabilities

            4. On the other hand, how do we even create a feedback loop for alignment? RLHF makes sense at some levels but doesn’t really hold sway with more advanced models

          3. Still doesn’t feel very much like it engages with the ‘garden of forking paths’ argument

      2. I think we still have time, actually

        1. I think I still could spend two years and complete my masters, focused on machine learning, and then focus next on working on alignment research at a frontier lab

          1. This doesn’t necessarily mean that the labs are doing good work, but it does mean that I could be investing in a strategy that leads to contributing to good outcomes later on

          2. But… there are lots of people who can do an ML masters, and who will have the machine learning skills to ‘contribute to good outcomes later on’ – the market is already building up that reserve

          3. Therefore this doesn’t seem like a very useful strategy to actually make a difference – what would differentially make an impact would be to focus on doing things that would make an impact, and in general, this seems to mean actually thinking about what makes sense to do, and work on – and this stuff has an opportunity cost. Look at J. – he likely has had a pretty good time and a standard track for completing his master’s program. The question is whether that has been effective in getting him to what and where he wanted

            1. I think the most sensible option is to go gwernmode, as linear put it, until you figure out what you want to do next