Miscellaneous

When I haven't figured out where to move the notes to yet.

[2024-04-30 Tue]

There's an interesting trap I've noticed that one can fall into, when it comes to alignment research. It involves being somewhat nerdsniped
1. So my last question was something like "is agent foundations a nerdsnipe?" and while I don't think this is the case usually (having a better understanding of what is going on in general is good), there's a tendency to go down rabbitholes that I've noticed in me
2. Specifically, I seem to devote an inordinate amount of time asking "what if?" questions and uh, being vulnerable to arguments that try to mess with foundational assumptions
  1. So for example, there's one strain of thought a few acquaintances of mine are exploring, that essentially involves some form of moral realism
    1. I sympathize with the sentiment, and I do think there's some merit to the research agenda (and do believe they are onto something), but the number of concepts they leverage and attempt to build upon, to help one understand what is going on, is huge
      1. On one hand, it seems important to not be confused about fundamental things, since they seem to be crucial considerations that have a massive impact on the foundational parts of your strategy
      2. On the other hand, it seems likely that you can get lost forever down a rabbit hole of such things – there can exist memeplexes that are optimized to be sinkholes of human attention and computation, and replicate somehow
        
        Age of Attention - SDr is relevant here
        
        The point is that in a world where information and attention is as abundant as we see it is right now, it does make sense that the evolutionary pressure on memeplexes is massively increased
        
        I don't really have a good current strategy to deal with things
        
        A good stopgap, given that your memetic immune system is compromised, would be to minimize your use of the internet.
        
        A more useful strategy would probably be to notice memeplexes and their threads and their tendrils as they try to manipulate you, and then figure out what to do about it
        
        Valentine has talked about this before, although the way he orients to dealing with dangerous memeplexes is to create a fundamental shift in the foundational substrate of his mind, such that the memes that are selected for (because he pays attention to them) are the ones that are beneficial to him and to his values
        
        But for this, you need to have reflection, you need your mind to pay attention to itself and to reflect upon its exepriences, and to gauge how things are and were going. Without that, one cannot get enough feedback for this to work
        
        So this is one strong argument for meditating daily for a tiny amount of time. Or reflecting on paper. Or just taking a walk alone. The point is to have the space to think.
3. The core point I want to emphasize here is that there's a sort of cognitive strategy that involves cutting through to the core of the thing that matters, and this would likely make it easier for you to have some sort of coherence even when dealing with complicated belief clusters that try to convince you of psyop-like contractions (such as "suffering is good actually!", as Malcolm Ocean put it in some tweet)
  1. I think cutting partially involves some level of letting go, of forgetting, of rebuilding things in your mind from scratch, of trying to reason 'from first principles'
  2. And math! Math, at the very least, seems to be the most solid set of abstractions one can rest their belief clusters upon
  3. I assume that if one sat down and spent half an hour 'meditating', their brain would automatically be drawn to certain things, and an optimistic assumption is that processing these things would lead to good outcomes as predicted by the brain
    1. On the other hand, one can find themselves mostly thinking about stuff unrelated to their work, and if so, one could put a time-box where they allow themselves to explore the things their brain cares about, in a certain context
    2. Although, you know, you could also do it while thinking 'on paper', although if you are doing it on a device connected to the internet, I anticipate that you'll be distracted quite easily
4. I think that another thing that may be involved is a sort of mistrust of your own epistemics
  1. Yeah I think this may play a huge role in being vulnerable to this class of memeplexes
  2. I think there's some sense in feeling this way though. As Yud once said, most people seem to be unable to derive all the AGI risk arguments "from the empty string".
    1. And of course, it is quite impressive how supposedly smart people such as LeCun have such interesting beliefs about AGI risk (well, ASI risk).
  3. I call this the "But what if I'm wrong?" issue.
    1. I think someone even explicitly mentioned this to me during MATS as part of a joke about how I relate to research projects.
    2. I think this may partially be a sort of intense loss aversion? An aversion to waste.
      1. A good reminder to oneself when one notices they are in such a situation would be: "The optimal amount of waste isn't zero." Although you can use the desire to not have this happen again to figure out systematic root cause fixes, not patch fixes.
      2. Okay yeah, this is also a very good point. The "But what if I'm wrong?" seems to be some sort of generalized patch fix of worry as the impetus for better decision making. A systematic fix would have been to, in every situation where I felt intense regret / guilt / shame / self-hate, would have been to do a sort of root-cause analysis. Hard to do that when your parents are screaming at you though.
      3. A better way to orient to the "But what if I'm wrong?" question would be to notice that you got information you didn't have before, and that this is valuable.
        
        It makes sense to optimize for maximizing the value of the information you gain from your actions or projects, although you would have to balance that with other priorities such as gaining resources and maintaining the system that is you.

[2024-05-02 Thu]

Let's talk takeoff speeds
1. Takeoff speeds seem like they obscure more than they enlighten
2. Especially due to the inherent proxy-ness of them
  1. They are in essence a sort of proxy of what we think constitutes takeoff, which itself is a rough proxy for what we think indicates progress towards the building of an ASI
  2. Given this, it seems clear that takeoff speeds, while they seem to have policy implications (since they are supposedly easily communicable in terms of real-world impact in the moment and in the near future, to people who have political power), they can be very dissociated from tracking the mechanisms underlying why there is takeoff happening
    1. gwern posits, for example, a sequence of sigmoidal breakthroughs, each of which is discontinuous, but contributes overall to a seemingly smooth curve
      1. So the question is: what is this takeoff speed variable useful for? What are you using it for?
    2. Continuity in the aggregate is not necessarily caused by continuity in its constituent parts
    3. Heck, even discontinuity in the aggregate is not necessarily caused by discontinuity in its constituent parts
    4. Even more important, there are variables that are downstream of cybernetic feedback loops that seek to maintain its value, and not investigating this causal factor can lead you to have an incorrect understanding of what the variable is telling you
      1. See Yud's quote on this
        
        Physics is continuous but it doesn’t always yield things that “look smooth to a human brain”. Some kinds of processes converge to continuity in strong ways where you can throw discontinuous things in them and they still end up continuous, which is among the reasons why I expect world GDP to stay on trend up until the world ends abruptly; because world GDP is one of those things that wants to stay on a track, and an AGI building a nanosystem can go off that track without being pushed back onto it.
      2. Also Nick Land's model of capital and technological advancements as a runway positive feedback loop
Note that I haven't really been tracking the discontinuities inherent to capabilities as well as I would want
1. I don't think I had an implicit smooth curve in my head for capabilities, sure, but my explicit model of capabilities seemed to rely on some level of 'scale up and you'll get better performance', which seems correct on the outside / behavior level, but that doesn't track what is going on inside, and Eliezer's model seems to care a lot about the discontinuities inherent to how those capabilities unfurl
I wonder what underlies the intuition that people have that involves modeling technological progress as a conjunctive endeavour instead of a sequence of disjunctive breakthroughs, such that they feel satisfied by claiming that one concrete scenario for a technology to exist is impossible therefore that technology will not exist
The notion of 'local coherence' seems to be quite valuable in modeling what is 'selected for' by evolution and learning, and consequently what we lose when we have mind-shattering or myopia
1. Integration is a process that has a consequence of increasing coherence across inputs / environments / time / scenarios
2. Similarly, natural selection selects for more and more (locally) coherent patterns
  1. And it seems like there's a phase shift at a point where this coherent pattern thing can increase the breadth of its coherence by itself (eg. in-lifetime learning), which seems to be built on top of selection at a finer level (eg. selection over world models, selection over cells)
  2. And it seems like there's another phase shift at a point where this coherent pattern can generally increase the breadth of its coherence by itself (eg. humans versus chimps)
I'd like to write about the concept of the locality of coherence
1. The intention is to provide a more useful handle for the sort of abstraction that people can use instead of autistically constantly using utility functions for every goddamn thing that involves systems that exhibit agent-like behavior
  1. This is useful in the same way machines with lower expressive power is useful, having finer grained distinctions between 'coherence' is valuable, and the notion of 'local coherence' track a very important point about how systems with agent-like behavior empirically come about.

[2024-05-03 Fri]

If coherence is always local, then alignment can also only be local, in domains where the parties involved are coherent (or act as if they have coherent preferences)

[2024-05-06 Mon]

You know, there's a move that some people make when it comes to 'bayesian updating on evidence', that I think is pretty sub-optimal, especially in our current situation
1. I recall talking to M. and she said that what we are seeing today seems like some evidence for 'slow takeoff', whatever that means
  1. I assume she was talking about Paul Christiano's models of how things go
2. The problem here is that people are taking some evidence, and then they are bundling up massive causal models and then upweighing or downweighing evidence on them via black-boxing them
  1. This seems like a bad move, especially since you are throwing away valuable information you already have about the world, and throwing away causal information seems particularly dangerous to me
  2. Given one simple causal model describing reality and another convoluted causal model (perhaps with 'God' or something) describing reality, in general, even if evidence seems to prefer the latter over the former, it does seem like a bad idea to 'update' towards a God-based world
  3. Worse, the causal model described by Eliezer may have been incorrect about how much mundane utility we'll see pulled out of AI models, but people aren't asking "Why?", they are just updating
    1. ooooh, I'm uuuupdatinggggg
      1. updaters
    2. If they asked "Why?" they'd at least be able to construct sensible causal models of what is going on and then 'update' appropriately
      1. Note that the arguments in IEM are extremely solid, and the concrete things we shall see leading up to the creation of a consequentialist cognition taking over the world is not really talked about
      2. And we haven't had any evidence (that I have in mind as of writing, that is) countering IEM's arguments
      3. Exponential curves really mess with people's intuitions, I think
        
        You don't really get much evidence as to whether you are on a linear curve or an exponential curve or a double exponential curve – in general you have to extrapolate based on abstract reasoning
        
        You can make concrete predictions, but it still seems pretty difficult
3. I now think that the intuition that P. and T. have and share makes some sense
4. Paul Christiano seems to think that if companies are incentivized to reduce 'overhang', the curve will be less and less exponential
  1. This actually does make sense to me
  2. However I believe that the exponential curve comes out of this drive to use capital and intelligence and technology to get more of these, and reducing 'overhang' seems to play right into this dynamic
    1. Essentially, the 'reduce overhang' strategy seems to be priced in
5. Let's consider Quintin Pope / AI optimists arguments
  1. I like that the "evolution provides no evidence […]" post attempted to model the causal dynamics of SGD and evolution, it makes a lot of difference to analyze things at that level
  2. The core argument here seems to be "Hey it seems like you would have more optimistic priors about how likely it is for us to create aligned near-term-AGI models, given the level of fine control we have and can leverage using SGD, if you considered the disanalogies between evolution and SGD"
    1. Which is a pretty good argument!
    2. In fact, I expect this is one source of the intuition behind Alex Turner's model of shard theory (specifically Alex because I don't have a good enough model of Quintin's thoughts on these things)
    3. But I think this does not extend to models that have consequentialist cognition, those that can do abstract reasoning and can hide their cognition and make it less and less interpretable (see gwern's steganography comment)
      1. on top of that, for AGI models that are doing tasks that involve them working in new domains entirely (see the Gillen and Barnett post), and learning things, they'd very likely have alien ontologies that we wouldn't have good enough understanding of, and therefore won't have as much fine control as this intuition seems to provide
6. I think maybe most people probably don't have a very good causal model of what is going on and therefore end up using black-box-like abstractions to assign evidence weights to
  1. Or perhaps people 'defer' (see Tsvi's post) to others and don't attempt to build the causal models involved to update effectively, which is understandable, although it seems to systematically lead people to worse models of reality in certain domains such as ones with exponential growth
    1. Perhaps neurosis is an effective adaptation in response to being unable to systematically have a model of such things
    2. Although I think that in total it is better to have systematic updates to causal models of reality instead of having neuroses, for being able to understand and steer reality the way you desire
7. Note add link to Karnofsky post on exponential curve?
  1. Well, maybe don't just dump links onto people, maybe don't do it
8. Sidenote: perhaps the point of what Yud, MIRI, and Connor are doing is to attempt to influence the elites – you don't have to influence a broad spectrum of the public, but if you can build up some level of momentum and influence and talk in public so you make the ideas palatable, and then mainly focus on convincing the elites, I assume that if the elite ideology shifts, then maybe that combined with some level of strategy-to-convince-the-masses will result in a 'win'
9. Sidenote
  1. here's a quote or two highlighting the extent to which 'insights' seem to be cheap in the cutting-edge of a research field while empirical evidence is what provides something real value
    
    I think it’s often challenging to just understand where the frontier is, because it’s so far and so many things are secret. And if you’re not at a scaling lab and then also don’t keep up with the frontier of the literature, it’s natural to overestimate the novelty of your insights. And then, if you’re too scared to investigate your insights, you might continue to think that your ideas are better than they are. Meanwhile, as an AI Safety researcher, not only is there a lot less distance to the frontier of whatever subfield you’re in, you’ll probably spend most of your time doing work that keeps you on the frontier.
    
    Random insights can be valuable, but the history of deep learning is full of random insights that were right but for arguably the wrong reasons (batch/layernorm, Adam, arguably the algorithm that would later be rebranded as PPO), as well as brilliant insights that turned out to be basically useless (e.g. consider a lot of the Bayesian neural network stuff, but there’s really too many examples to list) if not harmful in the long run (e.g. lots of “clever” or not-so-clever ways of adding inductive bias). Part of the reason is that people don’t get taught the history of the field, and see all the oh-so-clever ideas that didn’t work, or how a lot of the “insights” were invented post-hoc. So if you’re new to deep learning you might get the impression that insights were more causally responsible for the capabilities advancements, than they actually are. Insofar as good alignment requires deconfusion and rationality to generate good insights, and capabilities does not, then you should expect that the insights you get from improving rationality/doing deconfusion are more impactful for alignment than capabilities.
    
    I mean, if you actually do come up with a better initialization scheme, a trick that improves GPU utilization, or some other sort of cheap algorithmic trick to improve training AND check it’s correct through some small/medium-scale empirical experiments, then sure, please reconsider publishing that. But it’s hard to incidentally do that—even if you do come up with some insight while doing say, mech interp, it feels like going out of your way to test your capability ideas should be a really obvious “you’re basically doing capabilities” sign? And maybe, you should be doing the safety work you claim to want to do instead? – Please stop publishing ideas/insights/research about AI - LessWrong 2.0 viewer
    1. This is mainly why gwern is more 'empirical' than insights-focused: his experience seeing how many times some supposed improvement is publicized and it turns out to be garbage after empirical testing
      
      How new are you to AI?
      
      Not very.
      
      Literally every area you just mentioned has had a new replacement that allowed all around improvements to the previous methods in the past 3-6 years.
      
      I didn't say that improvements never happened. I said that in these three areas in particular, people pursue the white whale of improving over them, only to be shocked at how hard it turns out to beat them (ie. much harder than fooling yourself or a lot of other people), and that for this reason people should not care about them until they are much better validated than they usually are in the proposed paper.
      
      most networks used to use tanh or sigmoid functions until eventually Relu was presented and was noticed to provide better all around results across domains
      
      Which took something like 20-30 years, and it took a similar period to go from the earliest connectionist work which insisted on non-differentiable NNs with 0/1 activations to imitate spiking networks in the 1950s to using tanh/sigmoid in the 1980s PDP revolution.
      
      and now the state of the art general methods being used commonly by the most established labs is cosine learning rate schedule
      
      Way I recall it from back then when I was still bothering to read LR scheduling papers, cosine was 8 years ago in 2016, not 3-6 years ago. (I also consider cosine to just be a specific kind of cyclical learning rate schedule, 2015.)
      
      And then another method was found to later have even better results than Relu and that’s SwiGLU, and now much of the state of the art industry uses swiglu for general improvements such as Llama models. SwiGLU is now widely adopted over the past 12 months by many of the largest AI organizations.
      
      You mean SwiGLU (4 years ago in 2020, not 12 months ago) had better results than GELU (8 years ago in 2016); but yes, it has been widely adopted… So that is why it is worth discussing, whereas the original SwiGLU paper was not (especially given how minimal its benchmarking was, resting mostly on Shazeer's well-earned reputation as a DL genius - although it did give us the unusually honest & immortal Conclusions section: "We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence").
      
      Flash attention proposed by Tri dao is a great example too of “free lunch” actually working, he proposed a method to optimize transformer operations that claimed to make GPUs train transformers twice as fast, and many people said the same thing “that’s not possible, free lunch isn’t possible”
      
      I don't know who those people were. It wasn't anyone I recall - I only recall immediate wild excitement, which was subsequently fully justified. The whole point of FlashAttention was that it computed the same dense quadratic attention but in a cache-friendly and otherwise superior manner which was so efficient it computed long after the default OOMed; it's hard to be wrong about that. And it wasn't. I believed in FlashAttention the day I read it, and expected it to be adopted widely. Nor was it all that improbable, because when real programmers optimize memory access, it is common to observe FlashAttention-like speedups: a lot of algorithms are orders of magnitude slower than they have to be. (And so you would not be skeptical about FlashAttention in the way you would be skeptical about all of the many, many linear or other efficient attentions, all of which have disappointed in much the same way and could be added to my list as '#4. someone proposes a new sub-quadratic attention and claims it's much better; they omit LRA or other benchmarks in favor of a new measurement they made up.')
      
      Yes ofcourse there is going to be many papers that propose methods such as lion that dont actually provide general all around improvements and aren’t good enough to replace the state of the art, there is dozens or even hundreds of papers on different optimizer proposals that aren’t better than the current methods
      
      I'm glad we agree. I would just delete 'dozens' - it's definitely at least hundreds, and probably more like thousands. And you would agree that we shouldn't have thousands of random ML submissions to /r/slatestarcodex. (Maybe to a ML subreddit, but not /r/SSC.)
      
      KAN is especially unique since its not like optimizers where a new similar thing is proposed all the time
      
      It seems like a new learnable activation function, and those are proposed all the time. (Or at least, used to be pre-2020. I'll admit, there seems to have been a lot fewer activations proposed since then. Even SwiGLU was 2020.) – Kolmogorov-Arnold Networks Paper - r/slatestarcodex
    2. Although note that this is mainly downstream of misaligned incentives between researchers and the community as a whole
    3. Even so, it does make sense to consider empirical evidence quite valuable
    4. Note that the problem is that we don't yet have many viable ideas about alignment, not as much as we have for capabilities
      1. Although I guess the idea here is that funding people doing interp (for example) is both creating new insights and validating them empirically
      2. On the other hand, this incrementalist approach seems unlikely to make much difference, but maybe I am missing something about the use of this
        
        IIRC the point made by Neel Nanda and Rohin Shah is that they use these methods to build up insights that would eventually lead people to stumble upon theories, and I assume they think that trying to do things from the agent foundations route is significantly harder to converge to insights, or perhaps they just aren't specialized in that route
Reading 2021 MIRI Conversations
1. Discussion with Eliezer Yudkowsky on AGI interventions
  1. Why is alignment not subject to the same 'garden of forking paths' argument that makes sense for capabilities increase?
    1. Oh, we don't even have a theoretical understanding of how to do this. Capabilities increase on the other hand seems quite plausible.
      1. No, this feels not an actual argument
      2. Yes, we don't have a theoretical understanding of alignment. Do we have a theoretical understanding of increase of capabilities?
        
        Scaling hypothesis? Number go up?
        
        Algorithmic advancements?
        
        Oh, you have a feedback loop with reality, there's a gigantic amount of information reality provides as feedback that you can use to increase capabilities
        
        On the other hand, how do we even create a feedback loop for alignment? RLHF makes sense at some levels but doesn't really hold sway with more advanced models
      3. Still doesn't feel very much like it engages with the 'garden of forking paths' argument
  2. I think we still have time, actually
    1. I think I still could spend two years and complete my masters, focused on machine learning, and then focus next on working on alignment research at a frontier lab
      1. This doesn't necessarily mean that the labs are doing good work, but it does mean that I could be investing in a strategy that leads to contributing to good outcomes later on
      2. But… there are lots of people who can do an ML masters, and who will have the machine learning skills to 'contribute to good outcomes later on' – the market is already building up that reserve
      3. Therefore this doesn't seem like a very useful strategy to actually make a difference – what would differentially make an impact would be to focus on doing things that would make an impact, and in general, this seems to mean actually thinking about what makes sense to do, and work on – and this stuff has an opportunity cost. Look at J. – he likely has had a pretty good time and a standard track for completing his master's program. The question is whether that has been effective in getting him to what and where he wanted
        
        I think the most sensible option is to go gwernmode, as linear put it, until you figure out what you want to do next

[2024-05-25 Sat]

Some high-level thoughts on incubating conceptual alignment researchers
1. One of the biggest causes of the dearth of conceptual alignment researchers, as far as I can tell, was MIRI's decision to go [closed research](https://intelligence.org/2018/11/22/2018-update-our-new-research-directions/#section3).
  1. Sure, there are a number of reasons why this may seem sensible ex-ante – but as far as I can tell, it had gigantic costs in terms of future of conceptual alignment research
  2. You can't really learn how to do conceptual research without having some form of a feedback loop with other people (researchers) who are more experienced than you and have put significantly more thought into things compared to you
    1. At the very least, having regular research artifacts (even lower quality than academic requirement ones like LW posts) would likely have led to some thread that one could follow as they learn conceptual alignment research
  3. Another thing is that I don't think MIRI tried all that much to identify and cultivate conceptual alignment researchers the way MATS is doing
    1. One reason you could give is that there wasn't enough funding
    2. I'd say that CFAR was supposed to play a role of being a MIRI hiring pipeline, except it failed miserably at both increasing the sanity waterline and at being a MIRI pipeline
  4. There seems to be a psychological mechanism here, I think, where experienced conceptual alignment researchers fall into an epistemic trap
    1. One is this notion that we have very short timelines and therefore they don't feel any motivation to do 'field building'
    2. Another is this (very strange) notion of trying to find someone who is just really fucking smart and will solve the problem singlehandedly
      1. There's a lot of emphasis on individual heroism here, which makes sense: it seems empirically very difficult to stack conceptual alignment researchers (see Nate's post), and also, there are compounding returns to the skill of thinking clearly about confusing problems.
    3. Another could be that it is just very hard to help people learn how to do conceptual alignment research – to the point that most MIRI researchers have given up on this attempt after a few tries
      1. This seems pretty plausible to me – I continue to find it difficult how confused most people who were a part of the 2021 MIRI conversations sound, in comparison to Yudkowsky's arguments
  5. Another thing is that it seems like it is quite off-putting to work with Eliezer and Nate.
    1. That would play a huge role in making it difficult to build a pipeline that incubates alignment researchers.
2. Let's talk about the time it takes
  1. It takes a ridiculous amount of time to incubate a capable conceptual alignment researcher, as far as I can tell.
  2. John Wentworth estimates that takes at least 3 years before they start not making dumb mistakes in their thinking
  3. I think it might take less time than that, depending on whether they have mentorship / feedback loops, and whether they are just really good at thinking skills
  4. It seems like a significant variable is some sort of epistemological maturity
    1. This also leads to a sort of 'survival of the fittest' mentality, I expect, where someone thinks, "If they could ever figure it out, they'd have figured it out by now."
    2. See the fire alarm post
3. Oh yeah, Arbital is a big deal as a thing you'd want to read if you want to learn conceptual alignment research
4. This notion of 'epistemological maturity' would likely lead Eliezer (for example), to look for people who seem to be abnormally good at these things such that they'd make immense headway in solving the alignment problem
  1. Again, a sort of individual heroism
  2. Even the infohazard notion is downstream of individual heroism
5. The only exception, I think, was Vivek Hebbar and team, who were hired by Nate. Nate didn't want to mentor either, it seems – a lot of it was implicit tests where they had to figure out the answers
6. Also the whole idea that the way that Eliezer and Nate are thinking is infohazardous since that could be used to promote capabilities
  1. Such a conclusion is a big reason why there's no pipeline for conceptual alignment researchers today
  2. I mean, maybe Eliezer and Nate made bad bets once, one of them being Paul
7. John Wentworth is the main and only reason why there's a non-trivial amount of newcomers in the conceptual alignment research community
8. Let's be honest: there's a sort of chain here
  1. Mark Xu? Paul Christiano, and I think Evan Hubinger
  2. Evan Hubinger? Paul Christiano
  3. Paul Christiano? MIRI, probably Eliezer and/or Nate
  4. Nate? Eliezer
  5. John Wentworth? Somewhat independent, but it seems he's been interacting with the LW and MIRI's literature since 2013, even if he focused on alignment research fulltime since 2018 or so
9. Let's talk about the time it takes, again
  1. It seems likely that it takes at least two years, full-time, to be able to think coherently about the alignment problem in a way that allows you to make progress on the theoretical problems (ontology identification, corrigibility)
  2. Maybe another year to learn the skill to track causality better and be able to think about details relevant to timelines, for example
  3. This is not the sort of thing that we have any communal infrastructure built to promote
    1. Refine lasted 3 months! I'm not sure what Adam Shimi was thinking, but his model of what conceptual alignment research is was clearly mistaken.
    2. Randomness for the sake of randomness is not very useful in general
    3. Imagine thinking that within three months you'll have people who ship new agendas
10. Communication seems very hard for existing conceptual alignment researchers (large inferential distances)
  1. See Nate's posts and how they are received
  2. See Eliezer's posts and how they are received
  3. See Tsvi's posts and how they are not easy to understand
  4. It is a sensible decision for some conceptual alignment researchers to give up here
11. Back to the infohazard things
  1. It somewhat is understandable – many places have NDAs, I assume MIRI's decision seemed somewhat similar
12. Legibility and Signalling
  1. OpenPhil and SFF seem to be quite unenthusiastic about conceptual alignment research.
    1. OpenPhil due to their model of things (which is similar to Anthropic and Paul Christiano)
  2. LTFF funds independent alignment researchers, and as far as I can tell, they are also very focused on empirical researchers
    1. It can be quite stressful to get funding for half an year, three or four months after you sent a request, and either made an expensive bet (and got rejected) or took a safe strategy and did work (and therefore cannot show the things you've done as you described in your application in time)
    2. This is, of course, exacerbated for organizations of people and I empathize
  3. For conceptual alignment researchers, it is somewhat / significantly harder to invest in signalling and legibility since the things they thought made sense or were important in the last month, or the last week, may have shifted enough that any drafts they wrote back then would be invalidated
    1. On top of that, lots of thinking that conceptual alignment researchers do is inchoate and involves trying to be less confused about things that seem irrelevant to the object level but are very strongly upstream (for example, why did people systematically disbelieve connectionism?)
    2. The results don't make for very clean and legible outputs
  4. Combine that with long incubation times and this becomes a trap
    1. If you had long incubation times and easier signalling, then it would at least be equal difficulty to doing empirical stuff
13. Systematic risks
  1. Newcomers doing conceptual alignment research usually have short timelines, and/or have a learned aversion to doing things that don't seem focused on 'solving the problem'
  2. This usually results in them echewing traditional prestige options in exchange for the slack to be able to try to 'solve the problem'
    1. I'm considering dropping out of TUM for example
    2. Johannes stays at his parent's place
    3. Emrik runs on welfare
    4. Another conceptual alignment researcher I know claimed that if she stopped being funded, she'll live on welfare money and continue to try to solve alignment by following her own agenda
  3. This seems quite dangerous for most people, for they are foregoing their options to be able to 'solve the problem'
  4. This also creates a sort of self-selection mechanism where people who care less about alignment research, or are more confused about alignment research, gain more prestige and influence, while those do care a lot more keep losing prestige or getting sidelined
    1. This would be fine if I thought there was a sensible path from solving the problem to it being implemented, but the recent OpenAI debacle has made me a bit more cynical.
    2. Not that I think the 'inside game' strategy has any benefit or meaning of course – I think most people who thought they would use that strategy found themselves unable to keep their composure when OpenAI started playing hardball. (Has Paul Christiano told us about whether he signed the non-disparagement contract or not?)
  5. I worry that without a real pipeline for conceptual alignment researchers, what is happening is that people are asked to sink or swim, and most will sink, not swim.
    1. That is, you'd lose out on potentially great conceptual alignment researchers as they systematically shift focus, or their potential to do conceptual alignment research is wasted due to their bad decision making
14. Vivek and team are probably the only real direct field-building successes of MIRI, I think.
  1. Nate, of course, considers it a failure but he considers everything that hasn't solved alignment within an year to be a failure.
15. Timelines
  1. Some conceptual alignment researchers aim to make a difference before their mainline prediction of AI takeover (a friend and mentor says he's working on solving ontology identification, for example), but for most conceptual alignment researchers, it makes sense to expect that even if you figure out a conceptual breakthrough before AI takeover, translating that into implementations is something that would be unlikely for labs to do instead of whatever empirical strategy they are using to box the AI or whatever, since they'd be in a pretty tight race.
  2. Also note that OpenAI debacle means it is unlikely they'd cooperate with the other labs without strong incentives, and government regulation is (unfortunately) our best bet to make OpenAI heel
  3. Effectively speaking, everyone including me who is doing conceptual alignment research is hoping to make a difference before their mainline prediction for AI takeover but, also implicitly models that only given more time, will their efforts make an impact
    1. I think
  4. Tsvi is right (link timeline post) that if you are stressing out about not being able to contribute within the next five years, then it is better to make bets that make a difference in at least the timelines where we have more than five years (although I don't think it is very likely to pass)
16. Anyway. Recommendations for newcomers
  1. If you have to choose between trying to do conceptual alignment research and a lower QOL, and just having fun and not doing alignment research, I recommend choosing the latter
  2. It is correct that if you are one of the few people not confused about things, your efforts have a significantly greater impact on solving the problem at a theoretical level, but it seems quite likely that without regulations a theoretical solution would not shift the needle much on when we have AI takeover.
  3. Consider maybe making a lot of money and having fun, and using that to fund other conceptual alignment researchers
    1. Consider starting your own org perhaps to incubate such researchers

[2024-06-02 Sun]

Concretization as a mental move to deal with inchoate intuitions
1. Very related: Methodology of unbounded analysis - Arbital viewer
2. Concretization also seems very helpful in general, such as with anxieties
  1. Concretization seems expensive, and feelings seem like cached summaries used for decision making
Lean into uncertainty and anxiety – concretize it

[2024-06-14 Fri]

Essay idea: explain why all probability measures are conditional probability measures
1. deceptively simple and fundamental question, and deconfusing it is a big deal

English words I've historically mispronounced so badly that they were unintelligible to others

word	correct (BrE) IPA	my mispronounced IPA
hazard	[ˈhæz əd]	[ˈhə zɑːd]
magnanimity	[ˌmæɡ nə ˈnɪm ət i]	[məɡ ˈnæ ni mɪt i]
manacle	[ˈmæn ək (ə)l]	[mə ˈnæk (ə)l]
mandala		[mən 'dɑ: lə]
Maginot		['mægi: nɒt]
famine		['feɪ mɑi:n]