Deconfusion
[2024-04-30 Tue]
Curio[u]sity as the fundamental tool / orientation / weapon that underlies deconfusion
Not a desperation-based seeking, and not a blind attention-flow-based consumption, but a sort of grounded and reflective state from which one is curious and seeks to understand
Succinct thought experiments as fundamental tools for deconfusion
Concrete thought experiments to separate out conflicting intuitions
Reading The Sense Of Physical Necessity: A Naturalism Demo (Introduction) - LessWrong…
Yes, I consider Naturalism a skill intertwined or a core facet of deconfusion
Logan talks about the idea of making something 'truly part of you'.
It seems pretty unlikely that most cognitive tools we have and use can be efficiently made 'truly part of us'. Alternately, it seems likely to me that the amount of cognitive scaffolding necessary for us to orient and steer reality is immense and trying to make things truly part of ourselves in general seems massively inefficent, much more so than using spaced repetition in general
But yes. For a few things? It makes a lot of sense. I'd count my investigation into moral cognition, and my desire to investigate surrogation, and my time spent in 2023 investigating deconfusion, as instances of me attempting this
Not that I succeeded to an extent that I feel satisfied about
This is actually an interesting field report on attempting to investigate (but not necessarily deconfuse) an inchoate concept
[2024-05-01 Wed]
Deconfusion is hard
You have access to very little information, and are working on the edges of human knowledge, dealing with things that people may have been confused about for millenia
One huge common bottleneck for deconfusion likely is math and theoretical computer science understanding. A lot of times, understanding a concept (why does an apple fall down to the ground?) relies on a vast edifice of observations and mathematical scaffolding such that you can build a theory on top of it that makes sense and gives you predictions
Until then, it is likely that you'll be absorbed in a form of investigation (of the sort that Logan points at in their Naturalism sequence)
And even then, this investigation is not enough without the mathematical scaffolding
Just because you have observed all these disparate pieces of evidence for 'a thing called gravity', you won't be able to coherently describe it in understandable terms without some mathematical scaffolding
You can have non-formalist theoretical scaffolding (like Newton's three 'laws'), and that does constitute a progress in our understanding of the situation, as long as you have some causal explanation for why the heck things are the way they are (otherwise you are just doing a linear regression and summarizing the resulting model)
So to a certain extent you do have an incremental progress in understanding, but progressing to the point that mathematical scaffolding is involved results in a massive increase in the level of understanding and the quality and precision of predictions you can make based on your theory
See Faraday's notebooks that some dude read and turned into Maxwell's equations
In general distillation is incredibly valuable and also incredibly undervalued
Maybe I could carve out a niche where what I do is essentially distillation work
Holdable thought experiments in general serve as an advancement in deconfusion research, because it is communicable to others and they can see what you are pointing at
Also, it serves as a powerful waypoint as you try to build up scaffolding and understanding about things
In decision theory, for example, the Newcomb's problem served as one such tool
Other such tools may be the blegg-rube question in Yud's Words sequence, the "Do I look fat in this jeans?" question (for surrogation), maybe the questions in Thinking Physics (although I'm not sure and would like to check), the mirror asymmetry question in the beginning of Good and Real
The mentorship issues that the people usually attracted to this can exacerbate this difficulty too
Butterfly Ideas posits that the thing that is the source of your curiousity, your new idea, your inspiration, your confusion, that thing is fragile, that exposing it to other people's opinions will usually crush the butterfly, and due to this you lose the ability to follow the butterfly idea until you have a more coherent and communicable and robust-to-scrutiny version of it
I think that maybe it is a better idea for individuals trying to learn and master deconfusion to be able to have more resilience in the face of outside scrutiny of their intuitions and feelings
This is important because having such an ability seems like a proxy for being able to notice such intuitions, to return to them again and again because you find yourself continually interested in them, to allow them the space to unfurl in your mind
A fundamental orientation of curiosity instead of rejection (when it comes to claims or beliefs) seems useful – although instead of setting yourself up to be psyopped by memeplexes that eat up massive amounts of your time and attention, you can attempt to use whatever strategy you want to investigate what is going on
But even then, overall, an orientation of curiosity in general is very much the antithesis of a sort of rushed or hurried attempt at investigation
After all, it is kind of like introspection, or Naturalism. You are trying to understand, and sense-making cannot be rushed.
Building gears-based models is expensive and involves what is effectively a capital investment
So, there’s this inherent problem with deep gearsy models, where you have to convey a bunch of upstream gears (and the evidence supporting them) before talking about the downstream questions of interest, because if you work backwards then peoples’ brains run out of stack space and they lose track of the whole multi-step path. But if you just go explaining upstream gears first, then people won’t immediately see how they’re relevant to alignment or timelines or whatever, and then lots of people just wander off. Then you go try to explain something about alignment or timelines or whatever, using an argument which relies on those upstream gears, and it goes right over a bunch of peoples’ heads because they don’t have that upstream gear in their world-models.
– Yudkowsky and Christiano discuss "Takeoff Speeds" - LessWrong 2.0 viewer (quoted from a gwern comment)
This matters a lot for deconfusion-style research
I couldn't make headway into decision theory or even understand what the hell it is all about and why would one care about all this, until I had enough "upstream gears" to be able to make some headway into this
This was also the case for deconfusion – I read some adam shimi essays but actually understanding and valuing deconfusion came about after I had some mentorship that built up the foundational experiences I would use to have the intuition to bootstrap my own understanding and vision for deconfusion
So, what could we conclude from this?
Tentatively, I'd believe that the most valuable work one can do as a part of deconfusion research, aside from actually continuing to track the confusions they are trying to understand and work towards doing so, is to spend time learning critical gears-ey models that seem core or relevant to the things they are trying to do
You usually get evidence for some field being relevant and therefore you'd actually care about wanting to understand what is going on – if you find yourself using 'shoulds' to motivate yourself, it is an indication that perhaps this isn't really relevant (and that you should investigate why you feel this way, and perhaps figure out why you ended up in this situation in the first place)
For example, one could want to significantly improve their understanding of computability theory or mathematical logic, because they feel like it is relevant to the things they are thinking about and investing in improved understanding here would be quite valuable
This can be modeled as a form of investment, because it is, given opportunity costs and the almost discrete nature of the investment (badly learning, or learning half of basic computability theory, for example, is not as useful as learning a 'discrete' chunk of it that is useful in its entirety)
In general I'd recommend having some time everyday maintaining your gears, some time everyday investing in learning fundamental gears, and some time everyday trying to do deconfusion (assuming you are attempting to do deconfusion full time)
[2024-05-06 Mon]
There's an interesting dichotomy here in the MIRI research style that seems to involve, on one hand, go as concrete as possible when it comes to your proposals for solutions, and on the other hand, a focus on mathematical abstractions
Fundamentally, the whole problem here is, “You’re allowed to look at floating-point numbers and Python code, but how do you get from there to trustworthy nanosystem designs?” So saying “Well, we’ll look at some thoughts we can understand, and then from out of a much bigger system will come a trustworthy output” doesn’t answer the hard core at the center of the question. Saying that the humans will have AI support doesn’t answer it either. – 2021 miri conversations
Why is that?
I get that concreteness helps with proposals and being able to grasp onto things, and think about them, and especially when it comes to deconfusion
And I guess the mathematical abstractions are another set of thinking tools intended to help think about this, that have high value
This style of research seems not very amenable to the sort of incremental strategy that academia tends to foster, I think
I want to read arbital – it likely has some writing on concreteness I want to reread
I didn't find anything relevant, which is interesting and strange
Okay I got it – Executable philosophy
• Many academic philosophers haven’t learned the programmers’ discipline of distinguishing concepts that might compile. If we imagine rewinding the state of understanding of computer chess to what obtained in the days when Edgar Allen Poe proved that no mere automaton could play chess, then the modern style of philosophy would produce, among other papers, a lot of papers considering the ‘goodness’ of a chess move as a primitive property and arguing about the relation of goodness to reducible properties like controlling the center of a chessboard.
There’s a particular mindset that programmers have for realizing which of their own thoughts are going to compile and run, and which of their thoughts are not getting any closer to compiling. A good programmer knows, e.g., that if they offer a 20-page paper analyzing the ‘goodness’ of a chess move in terms of which chess moves are ‘better’ than other chess moves, they haven’t actually come any closer to writing a program that plays chess. (This principle is not to be confused with greedy reductionism, wherein you find one thing you understand how to compute a bit better, like ‘center control’, and then take this to be the entirety of ‘goodness’ in chess. Avoiding greedy reductionism is part of the skill that programmers acquire of thinking in effective concepts.)
Many academic philosophers don’t have this mindset of ‘effective concepts’, nor have they taken as a goal that the terms in their theories need to compile, nor do they know how to check whether a theory compiles. This, again, is one of the foundational reasons why despite there being a very large edifice of academic philosophy, the products of that philosophy tend to be unuseful in AGI.
Here's another, from Methodology of unbounded analysis
The pitfall of residual terms.
Besides “simplifying away the confusing part of the problem”, another way that unbounded thinking can “bounce off” a confusing problem is by creating a residual term that encapsulates the confusion. Currently, there are good unbounded specifications for Cartesian non-self-modifying : if we allow the agent to use unlimited computing power, don’t allow the environment to have unlimited computing power, don’t ask the agent to modify itself, separate the agent from its environment by an impermeable barrier through which only sensory information and motor outputs can pass, and then ask the agent to maximize a sensory reward signal, there’s . If we then introduce permeability into the Cartesian boundary and allow for the possibility that the agent can take drugs or drop an anvil on its own head, nobody has an unbounded solution to that problem any more.
So one way of bouncing off that problem is to say, “Oh, well, my agent calculates the effect of its motor actions on the environment and the expected effect on sensory information and the reward signal, plus a residual term γ which stands for the expected utility of all effects of the agent’s actions that change the agent’s processing or destroys its hardware”. How is γ to be computed? This is left unsaid.
In this case you haven’t omitted the confusing part of the problem, but you’ve packed it into a residual term you can’t give an effective specification for calculating. So you no longer have an unbounded solution—you can’t write down the Python program that runs given unlimited computing power—and you’ve probably failed to shed any important light on the confusing part of the problem. Again, one of the warning signs here is that the paper is very easy to write, and reading it does not make the key problem feel less like a hard opaque ball.
So unbounded analysis allows for concreteness, which allows for making it easier to think about things
This is interesting, and I think I would like to dive deeper into this theme of concreteness as it relates to deconfusion, but perhaps not immediately
I think an important realization is that math is a thinking tool, an aid for clear thought, similar to other thinking tools such as unbounded analysis or concreteness
or thought experiments
Basically, Daniel Dennett's thoughts on thinking tools seems quite appropriate here
[2024-05-07 Tue]
Dropped Dennett's Intuition Pumps book after reading this quote
A young child is asked what her father does, and she answers, “Daddy is a doctor.” Does she believe what she says? In one sense, of course, but what would she have to know to really believe it? (What if she’d said, “Daddy is an arbitrager” or “Daddy is an actuary”?) Suppose we suspected that she was speaking without understanding, and decided to test her. Must she be able to produce paraphrases or to expand on her claim by saying her father cures sick people? Is it enough if she knows that Daddy’s being a doctor precludes his being a butcher, a baker, a candlestick maker? Does she know what a doctor is if she lacks the concept of a fake doctor, a quack, an unlicensed practitioner? For that matter, how much does she need to understand to know that Daddy is her father? (Her adoptive father? Her “biological” father?) Clearly her understanding of what it is to be a doctor, as well as what it is to be a father, will grow over the years, and hence her understanding of her own sentence, “Daddy is a doctor,” will grow. Can we specify—in any nonarbitrary way—how much she must know in order to understand this proposition “completely”? If understanding comes in degrees, as this example shows, then belief, which depends on understanding, must come in degrees as well, even for such mundane propositions as this. She “sorta” believes her father is a doctor—which is not to say she has reservations or doubts, but that she falls short of the understanding that is an important precondition for any useful concept of belief.
The issue here is that it seems pretty obvious to me that the words uttered are not necessarily the things communicated. The child is stating something to the effect of "the person I relate to in these ways, who I call Daddy, said that he is a 'doctor', although I don't know what that means".
There's something coherent being communicated here, but Dennett's thought experiment implies that we use an arbitrary standard for what the sentence is supposed to communicate or intend to communicate.
The words are not the thoughts behind them. It is a surrogation error.
[2024-05-14 Tue]
Deconfusion: concreteness, unbounded analysis / toy models, thought experiments, sub-problems and factorization
deconfusion seems more a grab bag of thinking tools than I had thought before
Although on the other hand it also seems significantly more… integrating? There's possibly a pattern underlying this
Deconfusion: paradoxes, conflicting intuitions, confusions
deconfusion: philosophy, math, logic, epistemology
[2024-05-15 Wed]
Is deconfusion bottlenecked on object-level theoretical and empirical work?
That is, the sort of research that involves Darwinian collection and cataloguing of specimens?
Recall that it took astronomy and physics advances for Laplace to posit a mechanical (clockwork) universe, and since then we've had more and more evidence that this is the case
quantum mechanics would have gone the way of 'superposition' / wave collapse without Everett's proposal – and even then few people really believed him
On that note, let's say you succeed at deconfusion work. You figured out an answer.
What's the likelihood that you can steer the world to actually rely on your answer?
Not that this question matters since MIRI hasn't solved corrigibility or a limited pivotal act level alignment, but it still is impressive just how difficult it has been for people to understand and wrap their head around MIRI's model of things
Lots of deconfusion progress seems to have been downstream of philosophical progress, regardless of the field: Everett, Laplace, Daniel Dennett (consciousness), Judea Pearl (causality), Nick Bostrom (existential risk), Eliezer Yudkowsky (most of ASI alignment stuff)
Although they also seemed to have been doing a lot of engineering stuff beforehand?
Judea Pearl (B.S. electrical engineering, M.S. electrical engineering, M.S. physics, Ph.D. electrical engineering)
Eric Drexler (B.S inderdisciplinary sciences, M.S. Astro/aerospace engineering, Ph.D. MIT Media Lab (nanotech stuff))
Nick Bostrom (B.A. ?, M.A. philosophy and physics, M.Sc. computational neuroscience, Ph.D. philosophy)
Daniel Dennett (B.A. philosophy, D.Phil. philosophy)
Aside from Daniel Dennett, most of the other people seem to have been collecting academic credentials
Anyway, the point is that maybe deconfusion progress is bottlenecked on more object level work?
On one hand, Judea Pearl and Eric Drexler seemed to have needed the object level work to coalesce their philosophical insights, to support them
On the other hand, how much of deconfusion really is bottlenecked on such empirical evidence? The point of deconfusion really is that you understand, and understanding usually seems to involve having causal models that don't rely on concrete pieces of evidence you see in reality
And yet, it must have been really hard for people to grok and argue for heliocentrism before the mathematician and astronomer Kepler started finding evidence to support heliocentrism
The discovery of the phases of Venus was one of the more influential reasons for the transition from geocentrism to heliocentrism.[10] Sir Isaac Newton's Philosophiæ Naturalis Principia Mathematica concluded the Copernican Revolution. The development of his laws of planetary motion and universal gravitation explained the presumed motion related to the heavens by asserting a gravitational force of attraction between two objects
Copernicus proposed a heliocentric model, which was rough, and most likely incorrect in many ways, and people built on top of it, refining it.
The evidence that Kepler saw that convinced him of heliocentrism over geocentrism worked because he had competing theories in his mind
So to a certain extent I think it still is important to have philosophical progress made farther than empirical progress such that empirical evidence is used to select between plausible hypotheses
On the other hand, it probably is stupid hard and also hard to make progress on, the philosophical models you propose, without some valuable pieces of empirical data backing things up
I'm tempted to let other people do the mathematical logic and deconfusion work (main body) and perhaps switch to a more aggressive strategy of being generally capable of making things happen in ML and software engineering, given the rather fucked status quo of the frontier labs
We seem to lack people who are generally competent and are not confused and are relatively aligned with the notion of trying to stay alive
To be fair, we seem to have an even greater lack of people doing deconfusion and mathematical logic style theoretical research.
[2024-05-16 Thu]
Book recommendation: Physics Avoidance, Mark Wilson
Rec by Adam Shimi
[2024-05-21 Tue]
Had a shower thought that I unravelled to the notion of writing an entire post, describing "top-down deconfusion" and "bottom-up deconfusion"
I could first write it in /notes/deconfusion and then make a post, and I think this will be pretty fast and easy given that I have mostly a clear idea of the distinction between these two concepts and what is going on there
It is somewhat depressing just how fucking difficult top-down deconfusion research is, given the amount of theoretical CS / math that you seem to need to know
Let's look at some arbital stuff to get a better sense of deconfusion
Oh. Drescher's deconfusion examples involve using mathematical models downstream of inchoate 'bottom-up' confusions
While I guess you could use these two distinctions for the top level problem, I don't think they carve reality well enough to serve as useful enough distinctions, especially given Drescher's style
Eliezer Yudkowsky's Essays | I continue to be worried about damage done to my…
Yud points at the notion of people not tracking causal mechanisms
I think a part of it is that the sequences unfortunately did not emphasize causal models of reality, as much as Bayesian probability theory, and that may have been a part of why people seemed to be okay with using 'outside views' or non-causal models, and using prediction markets in a sort of 'small updates, hedgehogging' style
Also I think there's a thing where VCs mistrust explanations, even if causal, because they are in an adverserial situation with respect to founders, and their reliance on blind empirical evidence and a mistrust of explanations percolates down to people who are dependent on their funding.
This can also be one explanation for why Greg and Sam and similar people (Demis? Dario?) don't really seem to get the difficulty of the AI alignment problem (in the way Yud does) – because they've optimized their cognition and actions such that they can best convince stakeholders who don't understand the context of the problem to continue to trust them and give them funding and power.
I now think that understanding how VC funding works may be significantly more important than I thought before
Of course. Eliezer already wrote about this
See Eliezer Yudkowsky's Essays | Ascending to Angelhood, Version 1
Essentially his model is similar to mine in that VCs say stuff like "Ideas don't matter." mainly because they are not able to distinguish between different ideas. Also this is why they seem to aim to invest in people instead of ideas.
What is really interesting about this essay though, is the way Eliezer breaks down the VC ecosystem and its flaws and notices systematic inefficiencies
AND the importance of money to making things happen
Money is society's unit of actually caring about things, and it has increasingly dawned upon me how impossible it is to get anything done without large money flows. Even reputation largely follows money except under certain very odd circumstances. If you want prestige, if you want to be taken seriously, if you want to be listened to in the moment of crisis, I'm starting to think it's more important to have a big heap of money than to have good ideas - or at least that the ideas will not be listened-to without the money. Status follows money, respect follows status.
Of course, we are now on the clock, with less than a decade, and it seems like not a very good decision to focus on making money right now
I think that computability theory (and the notion of effective computability) is a pretty good case study on deconfusion, and how it works, its relation between informal things and formal things, and its impact on thought
The halting problem and the notion of formalizing effective computability as extremely good examples of the sort of things that MIRI aims to do
THe habit of going from an inchoate set of intuitions and trying to build concrete (math, programming) models to try to resolve these sets of intuitions, seems uniformly better as a strategy, than continuing to play around in inchoate states
I mean, you could continue to play around in inchoate states but I assume that you need to be pretty skilled at moving from one inchoate state to a less inchoate state
Usually that might involve a series of thought experiments, I think
See Newcomb's problem for a central example of this
And the series of succeeding thought experiments that spawned
In that perspective, I guess one could even say that the diamond maximizer problem and the strawberry problem are instances of 'thought experiment-likes'
"Effective philosophy" as an equivalent term to "effective computability"
That is, the fact that you can never really claim that "effective computability" is what the TM / lambda calculus models express, and can only provide increasing evidence for this being the case, is the point and is to be expected
You can't really prove that a certain formalization captures the entirety of the inchoate notions in your head – you can only find yourself achieving greater and greater confidence for this being the case
"You cannot proceed from the informal to the formal by formal means" – Alan J. Perlis
Even so, the Turing machine was the biggest advancement in computability theory, especially as it brought about immense clarity when it comes to thinking about a 'more concrete version of' effective computability and proved that assuming effective computability is equivalent to Turing computability, the properties of TMs hold, and the properties shown to hold are very important and have a wide-ranging set of impacts in our decision-making and understanding of computability
Specifically, the halting problem. It serves as a form of fundamental limitation to computability, similar to the notion of perpetual motion machines for physics
Or take cryptography! Cryptography also serves as a really good model of the sort of thinking deconfusion is intended to involve
Given a certain set of assumptions about reality, the cryptographic model (Alice and Bob and Mallory and a specific communication pattern and obstacle) reflects the sort of real world outcomes we shall see and face, and given such properties, we can use math to investigate possible ways to achieve secure communication
The math properties make sense, and the more a physical situation approximates the model described, the more the properties predict what we would see in reality
However, its not likely that you can fully-specify any physical situation in as much detail as you see in reality, and then try to do mathematical modeling
Perhaps davidad / Steve Omohundro / Miyazono will find that their hopes for formal strategies will remain just dreams
I dearly hope that at least some useful work will come of this, but it seems like throwing category theorists at the problem is uh, simply enabling people to be nerdsniped
Honestly I think they just took on a problem that is harder than just intellectually solving alignment
And their main hope is that they can outsource the work of building formalized world models to mostly-aligned AGIs
How is that very different from trying to get useful work out of AGIs via controlling them, which is basically what OpenAI and Anthropic and Redwood Research and ARC believed was the most viable plan?
The notion is something like "we don't need to know that the AIs we generate are aligned, as long as they meet our formal criteria"
Also I think that they'll use AIs to create formal proofs to verify that the domain-specific AIs are aligned
So all of this basically hinges on making a scaffolding for a human domain expert, with the help of a GPT-5 (or equivalent model) to build a formal specification of the domain that we want our AI to interact in
Also they might try to write their own formal specifications for very small pieces of base hardware such as simple circuits (recall Steve Omohundro's 2024 talk and thoughts).
Compared to the RR model of using a boxed GPT-5 to make it do basic programmer drudgery such as automatic experiments (wow, so grad student descent), in essence being able to outsource the mundane parts of research such that you can keep getting high-quality data to have a better idea of what works and what doesn't, how would davidad's model compare in terms of viability?
It seems like in both cases you would expect models to be boxed, and that you are pulling useful work out of them both. One way to differentiate the cases would be to see which capability would require more 'abstract reasoning' stuff. It seems likely that the ability to generate formal specifications would strongly correlate with having the ability to abstractly reason about real world phenomena (for obvious reasons – if you can reason about some domain specific thing and help with building a formal world-model there, then you can abstractly reason (to some extent) about that domain at the very least)
I guess then the key question would be whether the AI in question has long term goals / is non-myopic, given that it has the ability to do abstract reasoning
I guess even doing programmer drudgery would also rely on the ability to do abstract reasoning, but would also need to have relatively non-myopic goals
I don't really have a good enough model about these two things right now
Davidad's plan involves spending a lot of compute on formal stuff, while the RR-style plan involves leveraging all that compute for both capabilities research (and supposedly safety research but I doubt it, so we can just model capabilities research for now)
Which means what you do is build formalizations that you believe could capture effectively all of the inchoate concept involved, and work with thinking about properties or other things in that
Alex Zhu describes a similar set of mental moves in his way of doing theoretical research
- Figuring out what you want to happen in real-world cases
- Translating what you want in real-world cases into desiderata for simple cases
- Articulating an algorithm for solving simple cases
- Finding cases where your algorithm doesn’t do what you want
Quotes from a conversation (all sentences are my messages)
my goals feel less like philosophy and more like the sort of stuff you see from Alan Turing or Shannon, for example. the creation of the Turing thesis and information theory are pretty damn good examples of 'deconfusion'.
they took inchoate concepts that we found slippery and therefore confusing because of multiple clashing intuitions, and proposed a concrete mathematical scaffolding that we believe fully specifies the inchoate concept involved
that's what I expect deconfusion research to look like
that's what Judea Pearl did to causality, IIRC
The point is that its significantly easier to grok what the fuck is going on when you have fully specified models that we believe capture the essence of the thing we have in mind
look at cryptography. All those threat models do not take into account side-channel attacks or rubber-hose attacks. Does that mean its all useless? Well, no. The more our physical situation approximates the model we have built, the better the properties downstream of that model predicts what we shall see
cryptography is not a particularly clean example compared to the Church-Turing thesis, or the creation of information theory
My point was that while you can go from more confused to less confused and still stay in the realm of philosophy, what i expect most people doing MIRI-style deconfusion research aim at, is using math models to pin down ideas well enough that they can coherently communicate and think about them
and that's why I use the Church-Turing thesis (and TMs and lambda calculus) and information theory as examples here
It seems useful to keep a mental distinction between the inchoate philosophical concept and the mathematical / fully-specified model, just as Boolos does in his textbook for effective computability and Turing computability
And while you can try to unravel an inchoate philosophical concept and explore its implications and properties, and get a sense for the central examples involved here, and the desiderata that you believe you expect from a model, this is separate from the attempt to find a good mathematical model that captures the essence of all these things you consider desiderata and are downstream of the central concepts you had in mind
This also reminds me of how Jaynes went from qualitative examples of probabilistic reasoning, to listing a set of desiderata that he believed captured the essence of what is going on (and served to mostly pin down a model), and then go from that to a fully-specified model for probabilistic reasoning
Very impressive work, pedagogically speaking
Even more impressive that I just noticed that this is an example of… not deconfusion per se, but something in the realm of it
Reduction! Yeah, reduction is probably a good word for this
Seems like Bayesian updating in general is a dumb way to think about rationality and instantiate rational algorithms, and in general the notion of modeling causality seems just as important, if not more important
That is, blind Bayesian updates without causal models involved anywhere, seems pretty difficult and unlikely for humans to do very well (at least consciously)
And Bayesian updates that involve updating causal models seems like the most sensible way to think about epistemology
Essentially, I'd only consider Bayesian updates that involve some causal / systematic reason for me to update, that is, some level of surprise or change in what I think I would expect moving forward, and ignore slight surprises that seem likely to be downstream of noise, given my current causal model of things while of course having some uncertainty perhaps if I find myself consistently surprised due to what I expected as effects of noise
Track reality as closely as possible, and use every surprise or update to track reality even better. Bind yourself to reality, or reality to you, and wield it like the weapon it is
It seems likely that the reason conversation and rubber ducking are very effective at helping with problem solving is that it enables you to more fully specify the concept / problem you are trying to solve, which in general significantly helps with grasping it and solving it compared to leaving it inchoate and trying to solve it via implicit context
There's a cost to communicating the thing / rubber ducking, but that's part of the cost you pay to fully specify the problem (and / or communicate the surrounding context), such that you can load it all in your head and then grasp at potential inconsistencies or stuff you missed, et cetera
[2024-06-01 Sat]
{2405.19832} AI Safety: A Climb To Armageddon?
Surprisingly good attempt at delineating arguments
[2024-06-07 Fri]
Fancy math doesn't make simple math stop being true
Prefer simpler (K-complexity) arguments over complex arguments
A counter-argument that does not attempt to find central flaws in the proposed argument, or isn't simpler, is a discourse-destroying tool
[2024-06-15 Sat]
abstraction agnosticism: when you are okay with using whatever abstractions fit the problem at hand such that you can solve the problem
[2024-06-17 Mon]
It seems like building a deep understanding of 'waypoints' in a long chain of inference you want to walk someone through seems quite important
Say you want to help someone understand the logic of AI existential risk, and they are a normal individual, and you expect you don't have to push against social reality based resistances to understanding
If a chain of inference is long enough, you will be unable to get them to understand the thing within one conversation
For example, trying to explain reductionism, Lawfulness, evolutionary theory (such that people avoid anthromorphism and have an intuition of why things might go wrong), and then consequentialism, and then the orthogonality thesis
Such a chain of inference is intractable for someone to grasp within a conversation due to a few human limitations:
working memory limitations (since they are unfamiliar with the ideas and cannot rely on chunking)
cognitive energy limitations (empirically, it is hard to continue to pay attention to novel abstract reasoning and arguments such as theoretical math past a few hours)
under-specified arguments / concepts (when explaining things to other people, math professors usually use shorthand instead of fully explicitly writing the most precise interpretation of their ideas down. They implicitly expect their listeners to 'error correct' the thing they see or hear, such that they grasp the idea that the professor is trying to communicate, without them getting bogged down in 'formalisms')
See Andrej Bauer's talk, where he calls this "elaboration"
All three issues I have identified (there may be more) seem to rely on someone spending the time to grasp each significant 'waypoint' such that they can then move forward on the chain of inference
For the three points
WM limitations are minimized due to chunking after learning / understanding
Cognitive energy limitations are bypassed due to you spending time across days or weeks as you try to grasp each waypoint
Under-specified arguments / concepts are less of an issue for you, the listener, since you can 'error correct' or 'fill in the gaps' of the argument, such that
your interlocuter / explainer does not get bogged down in efforts to continually fix your understanding because every piece of ambiguity becomes a "but what if…?" or "Okay but why not…?"
This can and does become untenable, enough that many people do give up on explaining things, or get extremely exhausted at the efforts to do the elaboration process on-the-fly (since they don't think of things in the formalized manner)
You can, by yourself, evaluate potential ways the argument is incorrect, and therefore get the sense as to whether that part of the argument makes sense or not
Also note that this also matters when trying to understand something yourself: the better you grasp the waypoint, the more your ability to error-correct as you attempt to reach forward to figure out some hypothetical solution to your problem, or a possible answer to your question
What are 'waypoints', then?
Off the top of my head: probability theory, linear algebra, calculus / real analysis, evolutionary theory, evolutionary game theory (people usually only understand game theory well enough to shoot themselves in the foot – also paranoid hawkish people like von Neumann really make the board game worse for everyone), epistemology (this one is really important – epistemological maturity gives you the ability to 'error correct' in a wide variety of domains of abstract reasoning, especially ones that are devoid of feedback loops such as AI risk arguments), probably certain physics bits such as thermodynamics
Other things more specific to AI risk arguments probably is Lawfulness, materialistic reductionism, orthogonality thesis, consequentialism
The deeper your understanding of a waypoint, the less bits of information you need from your interlocuter to grok the corresponding bit of their argument, for you can do the inference extremely quickly, yourself
This implies that, contrary to some people's arguments that you can do conceptual alignment research by just learning as little as possible or learning whatever you need just-in-time, people might benefit from doing an isolated deep dive / focus on one waypoint, and then another, and then another.
There's a real trade-off here, of course, but this is a qualitative shift
Your focus here is not to only learn the things that are on the specific path of your research / argument, but also things densely connected to it (in that waypoint), such that you have the ability to error correct
From that perspective, the sort of isolated and relatively arbitrary courses you find in a university is somewhat more palatable
I assume that the entire point of 'going deep' is simply to make a maximally robust foundation via stress testing, and that most people don't expect the advanced stuff to matter most of the time
This also means that these people would be vulnerable to counter-arguments that leverage weaknesses in their understanding of the waypoints
Say if someone thinks there is only one inferential path to concluding that AI risk is existential. Then all you have to do is make someone uncertain about one of the links in the chain of inference, such that they discount the AI x-risk argument or think that it is invalid.
People grokking 'waypoints' means that they can do the 'error correction' themselves and be immune to the class of counter-arguments that involve leveraging their confusion or lack of knowledge
[2024-06-25 Tue]
Native ways of understanding takes time to build up
There's translation cost when it comes to learning new things
This is yet another thing that has a cost for bridging inferential distance, aside from waypoints
They are somewhat related though
[2024-06-28 Fri]
What would a deconfusion reading list look like?
Gwern's Unseeing post
Yud's Reductionism 101
Drescher's Good and Real
None of the sources listed will help you learn some of the tacit knowledge involved in deconfusion thinking, though
I assume that the best way to learn it is by doing, and getting feedback from a mentor about how you are doing it
[2024-07-10 Wed]
Okay. Maybe first articulation-focused writing?
The essay
Let's put a deadline to ship it? Maybe before I leave for HAISS?
Let's maybe start off with describing examples?
Effective computability
Information theory
William Thurston's paper?
Doesn't seem relevant since it involves empirical support for math, which is different from deconfusing inchoate concepts
Cryptography!
Judea Pearl, causality (look into it?)
Lockhart!
Math objects are not physical objects. Math objects are approximations of physical objects. Physical objects are approximations of math objects.
The different way you relate to this would lead to different ways of understanding reality, I think
Game theory?!
Games and Information starting chapter?
An idealized model! Or wait, does this makes sense for effective computability?
Oh, they are like platonic objects! And the more reality approximates it, the more we see its properties fulfilled
Like cryptographic models
Like Lockhart's stuff. My circle example
Economics
Game theory
Idealization is different from toy models. Reality rarely approximates toy models
The point is that the core of deconfusion is this process of trying to find better and better models that tell us more about reality or capture the essence of that domain of reality accurately
And a sub-point that evidence for a model being correct is empirical, and not mathematical or formal, and this will always be the case. (Alan Perlis quote)
So deconfusion is a process of finding better formalizations (or fully specified models, or better specified models) for concepts we inchoately understand
Note that having more concrete examples is data that makes your 'definition' of the concept better – the deconfusion process is simply an attempt at compression
So you could divide deconfusion into two parts – the data collection, and the compression.
The data collection is incremental
The compression process probably isn't incremental
Similar to how a neural network can have moments where it 'groks' a domain it is being trained on a bit, and then gets a lower loss, you can assume that there is a discontinuity in the effectiveness and power of one model versus another, and slight perturbations to one model will not get you another
The confusion is an indication that you have data contradicting your current understanding / model, whether explicit or implicit
Noticing your confusion is basically paying attention to the data. Not throwing away the data.
So deconfusion is also capabilities? If so then I expect that AGIs would be not confused about most of the things we are confused about – including alignment (or at least, it would try to not be confused about alignment, and also would probably have the ability to not be confused about alignment)
You don't need to understand, though, if you have good enough tests (evals). In both cases you have a probability of being incorrect, and it seems incorrect to privilege the model over the tests, especially when the world runs due to tests and error-correction
Note that some people think that you can have some level of continuity when it comes to compression, by making more and more fully-specified / formalized models (similar to how you can go from naive set theory to more rigorously defined set theory)
This… makes sense, but I find it hard to imagine anything that is 'partially formalized'. It would be more like cryptographic models where you use full specification for certain scenarios in mind, or isolate certain key facets of the problem (see corrigibility paper for example)
Cognitively speaking, it seems like even deconfusion is basically a sort of brute-force trying of different models you come up with, testing it against your data / test cases in mind, seeing the fit or lack thereof, and then trying to search for another one while using the feedback you got
This is kind of brute force and empirical, except it happens in the brain
I think I felt like deconfusion / modeling was something qualitatively special in some sense, while it seems more likely that deconfusion is more like empirical fuckery than I thought
People doing empirical fuckery probably also have their own models of what is going on, and are also doing brute-force searches, and then updating based off of that
What if 'alignment' is like 'cryptography', in that you will not have a simple core concept / formal object that contains it?
Especially when preferences of different people vary wildly
It seems more likely that 'corrigibility' may be a coherent formal object, compared to 'alignment'
Intuitions are pointers to data
Sidenote, reading deconfusion posts, and stumbled upon something interesting
"In research, we don’t have that guardrail, and we especially don’t have that guardrail when finding the right definitions is part of the problem. I have literally spent months pushing symbols around without getting anywhere at all. Math is a high-dimensional space; brute force search does not work." – What's So Bad About Ad-Hoc Mathematical Definitions? - LessWrong 2.0 viewer
This is really interesting
This is probably why John tried to inculcate the notion of focusing on getting feedback / data from reality – and I assume this is instead of trying to brute force things
It seems likely that John is correct, that the most efficient way of moving forward always is to have surgically targetted experiments to gain maximum information possible, so update your model as much as you can, such that you can move forward
Structure is compression, deconfusion is compression, models are compression?
[2024-07-11 Thu]
Continued focus on this thing