This is part 5 of 6 in my series of summaries. See this post for an introduction.

Part V

Mere Goodness

his part asks how we can justify, revise, and naturalize our values and desires. What makes something valuable in the moral, aesthetic or prudential sense? How can we understand our goals without compromising our efforts to actually achieve them? When should we trust our impulses?

Value theory is the study of what people care about: goals, tastes, pleasures, pains, fears, and ambitions. This includes morality, but also everyday things like art, food, sex, friendship, and going to the movies with Sam. It includes not only things we already care about, but also things we wish we would care about if we were wiser and better people. How we act is not always how we wish we’d act. Our preferences and desires may conflict with each other, or we may lack the will or insight needed to act the way we’d like to. Hence, humans are not instrumentally rational.

Philosophers, psychologists and politicians disagree wildly about what we want and about what we ought to want; they even disagree about what it means to “ought” to want something. There is a gulf between how we think we wish we’d act, and how we actually wish we’d act. Humanity’s history with moral theory does not bode well for you if you’re trying to come up with a reliable and pragmatically useful specification of your goals – whether you want to build functional institutions, decide which charity to donate to, or design safe autonomous adaptive AI.

But a deeper understanding of your values should make you better at actually fulfilling them. And it helps to ask, what outcomes are actually valuable? Yudkowsky calls his attempt to figure out what our ideal vision of the future would look like “Fun Theory”. Questions of fun theory intersect with questions of transhumanism, like hedonism vs. eudaimonia, cryonics, mind uploading, and large-scale space colonization. Yet creating a valuable future takes work, so we also have to ask: how shall we get there? It’s not just about the destination, but also the journey.

21	Fake Preferences
	This chapter collects a sequence of blog posts on failed attempts at theories of human value.

21.1 Not for the sake of happiness (alone)

It is a misconception that “rational” preferences must reduce to selfish hedonism (i.e. caring strictly about personally experienced pleasure). Our values can’t be reduced to happiness alone, because happiness is not the only important consequent of our decisions. We treat people as ends in themselves, and care not only about subjective states, but also objective accomplishments. We have preferences for how the world is, not just for how we think the world is. An ideal Bayesian agent can have a utility function that ranges over anything (including art, science, love, freedom, etc.), not just internal subjective experiences.

21.2 Fake selfishness

Some people think they ought to be selfish, and they find ways to rationalize altruistic behavior (e.g. contributing to society) in selfish terms. But they probably started with the bottom line of espousing this idea, rather than truly trying to be selfish. Thus, they aren’t really selfish; if they were, there would be a lot more productive things to do with their time than espouse the philosophy of selfishness. Instead, these people do whatever it is they actually want, including being nice to others, and then find some sort of self-interest rationalization for it.

21.3 Fake morality

Many people provide fake reasons for their own moral reasoning, such as following divine command ethics. Religious fundamentalists, who fear the prospect of God withdrawing its threat to punish murder, reveal an internal moral compass: in other words, a revulsion to murder which is independent of whether God punishes murder or not (but not eating pork or sneezing). They steer their decision system by that moral compass. If you fear that humans would lack morality without an external threat, it means you like morality, rather than just being forced to abide by it. Other examples of fake morality include selfish-ists who provide altruistic justifications for selfishness, and altruists who provide selfish justifications for altruism. If you want to know how moral someone is, don’t look at their reasons; look at what they actually do.

21.4 Fake utility functions

The only process that reliably regenerates all the local decisions you would make given your morality, is your morality. For very tough problems, like Friendly AI, people propose solutions very fast: they suggest an “amazingly simple utility function”. These folks should learn from the tragedy of group selectionism, our thousand shards of desire, the hidden complexity of planning, and the importance of keeping to your original purpose. Yudkowsky had to write the sequences on evolution, fragile purposes, and affective death spirals so he could make this point.

Many people have the seeming fascination with trying to compress morality down to a single principle. They think they know the amazingly simple utility function that is all we need to program into an artificial superintelligence and then everything will turn out fine. But a utility function doesn’t have to be simple. We try to steer the future according to our terminal values, which are complicated, and leaving one value out of an AI’s utility function could lead to existential catastrophe. Yet the “One Great Moral Principle” makes people go off in a happy death spiral.

21.5 Detached Lever Fallacy

The detached lever fallacy is thinking that you can pry a control lever (e.g. words, loving parents) from a human context and use it to program an AI, without knowing how the underlying machinery works. The lever may be visible and variable, but there is a lot of constant machinery hidden beneath the words (and rationalist’s Taboo is one way to make a step towards exposing it). If the AI doesn’t have the internal machinery, then prying the lever off the ship’s bridge won’t do anything. People (and AIs) aren’t blank slates. For example, even when human culture genuinely contains a whole bunch of complexity, it is still acquired as a conditional genetic response (which isn’t always “mimic the input” – hence why you can’t raise children to be perfectly selfless). In general, the world is deeper by far than it appears.

21.6 Dreams of AI design

Aristotle talked about telos, the “final cause” (or purpose) of events. But there are three fallacies of teleological reasoning. The first is backward causality: making the future a literal cause of the past. The second is anthropomorphism: to attribute goal-directed behavior to things that are not goal-directed, because you used your own brain as a black box to predict external events. And the third is teleological capture: to treat telos as if it were an inherent property of an object or system, thus committing the mind projection fallacy. These contribute to fake reductions of mental properties.

It can feel as though you understand how to build an AI when really, you’re still making all your predictions based on empathy. AI designs made of empathic human parts (detached levers) are only dreams; you need genuine reduction, not mysterious black boxes, to arrive at a non-mental causal model that explains where intelligence comes from and what it will do. Your AI design will not work until you figure out a way to reduce the mental to the non-mental. Otherwise it can exist in the imagination but not translate into transistors. And this is why AI is so hard.

21.7 The design space of minds-in-general

AIs don’t form a natural class like humans do; they are part of the vast space of minds-in-general, which is part of the space of optimization processes. So don’t generalize over mind design space (e.g. by claiming that all or no minds do X)!

There is an incredibly wide range of possibilities. Somewhere in mind design space is at least one possible mind with almost any kind of logically consistent property you care to imagine. Having one word for “AI” is like having a word for everything which isn’t a duck. This is also a reason why predicting what the future will be like after the Singularity is pretty hard.

›

22	Value Theory
	This chapter is about obstacles to developing a new theory of value, and some intuitively desirable features of such a theory.

22.1 Where recursive justification hits bottom

If every belief must be justified, and those justifications in turn must be justified, how do you terminate the infinite recursion of justification? Ultimately, when you reflect on how your mind operates and consider questions like “why does Occam’s Razor work?” and “why do I expect the future to be like the past?” (the problem of induction), you have no other option but to use your own mind. Always question everything and play to win, but note that you can only do so using your current intelligence and rationality. Reflect on your mind’s degree of trustworthiness, using your current mind – it’s not like you can use anything else. There is no way to jump to an ideal state of pure emptiness and evaluate these claims without using your existing mind (just as there is no argument that you can explain to a rock). This reflective loop has a meta-level character and is not a circular logic.

22.2 My kind of reflection

Yudkowsky’s ideas on reflection differ from those of other philosophers; his method is massively influenced by his work in AI. If you think induction works, then you should use it in order to use your maximum power. It’s okay to use induction and Occam’s Razor to think about and inspect induction and Occam’s Razor, because that is how a self-modifying AI would improve its source code with the aim of winning. Reflective coherence is just a side-effect. (And since this loop of justifications goes through the meta-level, it’s not the same thing as circular logic on the object level.)

22.3 No universally compelling arguments

There is no irreducible central ghost inside a mind who looks over the neurons or source code. Minds are causal physical processes, so it is theoretically possible to specify a mind which draws any conclusion in response to any argument. For every system processing information, there is a system with inverted output which makes the opposite conclusion. This applies to moral conclusions, and regardless of the intelligence of the system. For every argument, however convincing it may seem to us, there exists at least one possible mind that doesn’t buy it. There is no argument that will convince every possible mind. There are no universally compelling arguments because compulsion is a property of minds, not a property of arguments! Thus there is no “one true universal morality that must persuade any AI”.

22.4 Created already in motion

When confronted with a difficult question, don’t try to point backwards to a misunderstood black box. When you find a black box, look inside it, and resist mysterious answers or postulating another black box inside it (which is called passing the recursive buck). For example, you don’t need “meta free will” to explain why your brain chooses to follow empathy over fear. The buck stops here: you did not choose for heroic duty to overpower selfishness – that overpowering was the choice. Similarly, in Lewis Carroll’s story “What the Tortoise Said to Achilles”, when Achilles says “If you have [(A&B)→Z] and you also have (A&B) then surely you have Z,” and the Tortoise replies “Oh! You mean <{(A&B)&[(A&B)→Z]}→Z> don’t you?”, the Tortoise is passing the recursive buck. To stop the buck immediately, the Tortoise needs the dynamic of adding Z to the belief pool.

A mind, in order to be a mind, needs some sort of dynamic rules of inference or action. Your mind is created “already in motion”, because it dynamically implements modus ponens by translating it into action. But you can’t give dynamic processes to a static thing like a rock; thus it won’t add Y to its belief pool when it already has beliefs X and (X→Y). There is no computer program so persuasive that you can run it on a rock. A “dynamic” is something that happens inside a cognitive system over time; so if you try to write a dynamic on a piece of paper, the paper will just lie there.

22.5 Sorting pebbles into correct heaps

This is a parable about anthropomorphic optimism. There is an imaginary society with arbitrary, alien values. The species has a passion for sorting pebbles into correct heaps. The Pebblesorting People want to create correct heaps, but don’t know what makes a heap “correct” (although their heaps tend to be prime-number sized). Their disagreements have caused wars. They now want to build a self-improving AI and they assume that its intelligence would inevitably result in it creating reasonable heap sizes. Why wouldn’t smarter minds equal smarter heaps?

22.6 2-place and 1-place words

It is possible to talk about “sexiness” as a property of an observer and a subject: Sexiness as a 2-place function depends on both the admirer and the entity being admired. This function takes two parameters and gives a result, for example:

Sexiness: Admirer, Entity → [0, ∞).

However, it is also possible to talk about “sexiness” as a property of a subject: Sexiness as a 1-place function depends only on the entity, as long as each observer can have a different process to determine how sexy someone is (so speakers may use different 1-place words!). This function could look like:

Sexiness: Entity → [0, ∞).

In this case, Fred and Bloogah are different admirers and Fred::Sexiness is different from Bloogah::Sexiness. Through the mathematical term currying, a two-parameter function (the uncurried form) is equivalent to a one-parameter function returning another function (the curried form). We can curry x=plus(2, 3) and write y=plus(2); x=y(3) instead, thus turning a 2-place function plus into a 1-place function y. Failing to keep track of this distinction can cause you trouble. Treating a function of two arguments as though it were a function of one argument is an instance of the Mind Projection Fallacy or Variable Question Fallacy. This is relevant to confusing words like “objective”, “subjective” and “arbitrary”.

22.7 What would you do without morality?

To those who say “nothing is real”, Yudkowsky replies, “that’s great, but how does the nothing work?” Imagine being persuaded that there was no morality and that everything was permissible. Suppose that nothing is moral and that all utilities equal zero. What would you do? Would you still tip cabdrivers? Would you still eat the same kinds of foods, and would you still drag a child off the train tracks? When you cannot be innocent or praiseworthy, what will you choose anyway? And if some “external objective morality” (some great stone tablet upon which Morality was written) tells you to kill people, what then? Why should you even listen, rather than just do what you’d have wished the stone tablet had said in the best case scenario? If you could write the stone tablet yourself, what would it say? Maybe you should just do that.

22.8 Changing your metaethics

There is opposition to rationality from people who think it drains meaning from the universe. But if life seems painful, reductionism may not be the real source of your problem; if living in a world of mere particles seems too unbearable, maybe your life isn’t exciting enough right now? A lot of existential angst comes from people trying to solve the wrong problem; so check and see if you’re not just feeling unhappy because of something else going on in your life. Don’t blame the universe for being a “meaningless dance of particles”, but try to solve problems like unsatisfying relationships, poor health, boredom, and so on. It is a general phenomenon that poor metaethics (e.g. misunderstanding where your morality comes from) messes people up, because they end up joining a cult or conclude that love is a delusion or that real morality arises only from selfishness, etc.

It is easier to agree on morality (e.g. “killing is wrong”) than metaethics – what it means for something to be bad or what makes it bad. Yet to make philosophical progress, you might need to shift metaethics at some point in your life. To shift your metaethics, you need a line of retreat to hold your will in place; for example, by taking responsibility and deciding to save lives anyway. If your current source of moral authority stops telling you to save lives, you could just drag the child off the train tracks anyway – and it was your own choice to follow this morality in the first place.

22.9 Could anything be right?

Causation is distinct from justification; so while our emotions and morals ultimately come from evolution, that’s no reason to accept or reject them. Just go on morally reflecting. You can’t jump out of the system, for even rebelling against your evolved goal system must take place within it! When we rebel against our own nature, we act in accordance with our own nature. There is no ghost of perfect emptiness by which you can judge your brain from outside your brain. So can you trust your moral intuitions at all, when they are the product of mere evolution?

You do know quite a bit about morality… not perfect or reliable information, but you have some place to start. Otherwise you’d have a much harder time thinking about morality than you do. If you know nothing about morality, how could you recognize it when you discover it? Discarding all your intuitions is like discarding your brain. There must be a starting point baked into the concept of morality – a moral frame of reference. And we shouldn’t just be content with ignorance about moral facts. Why not accept that, ceteris paribus, joy is preferable to sorrow?

22.10 Morality as fixed computation

When you think about what you could do, your brain generates a forward-chaining search tree of states that are primitively reachable from your current state. Should-ness flows backwards in time and collides with the reachability algorithm to produce a plan that your brain labels “right”. This makes rightness a derived property of an abstract computation capable of being true (like counterfactuals), subjunctively objective (like mathematics), and subjectively objective (like probability). So morality is a 1-place function that quotes a huge number of values and arguments.

Imagine a calculator which, instead of computing the answer to the question, computes “what do I output?” This calculator could output anything and be correct. If it were an AI trying to maximize expected utility, it would have a motive to modify the programmer to want something easily obtainable. Analogously in moral philosophy, if what is “right” is a mere preference, then anything that anyone wants is “right”. But we ourselves are like a calculator that computes “what is 2+3?” except we don’t know our own fixed question, which is extremely complicated and includes a thousand shards of desire. So saying “I should X” means that X answers the question, “what will save my people, and how can we all have more fun, and how can we get more control over our own lives, and what’s the funniest joke we can tell, etc.?”

22.11 Magical categories

Some mental categories we draw are tricky because they are primarily drawn in such a way that whether or not something fits into that category is important information to our utility function (so they are relevant to decision-making). Unnatural categories are formed not just by empirical structures, but also our values themselves. Moral arguments are about redrawing the boundaries, e.g. of a concept like “personhood”. This issue is partly why technology creates new moral dilemmas, and why teaching morality to a computer is so hard.

The fallacy of magical categories is thinking you can train an AI to solve a real problem by using shallow machine-learning data that reflect an unnatural category (i.e. your preferences). We underestimate the complexity of our own unnatural categories. The problem of Friendly AI is one of communication: transmitting category boundaries like “good” that can’t be fully delineated in any training data you can give the AI during its childhood. Generally, there are no patches or band-aids for Friendly AI!

22.12 The true Prisoner’s Dilemma

The standard visualization of the Prisoner’s Dilemma is fake, because neurologically intact human beings are not truly and entirely selfish – we have evolved impulses for honor, fairness, sympathy and so on. We don’t really, truly and entirely prefer the outcome where we defect (D) and our confederate cooperates (C) such that we go free and he spends three years in prison. The standard payoff matrix for player 1 and player 2 looks like this:

So in the True dilemma, (D,C)>(C,C)>(D,D) must actually hold from our perspective. A situation where mutual cooperation doesn’t seem right is as follows: imagine that four billion human beings have a fatal disease that can only be cured by substance S, which can only be obtained by working with an alien paperclip maximizer from another dimension, who wants to use substance S for paperclips.

We would feel indignant at trading off billions of human lives for a few paperclips, yet still prefer (C,C) to (D,D).

22.13 Sympathetic minds

Empathy is how we humans predict each other’s minds. Mirror neurons allow us to empathize with other humans, and sympathy reinforces behavior that helps relatives and allies (which is probably why we evolved it). And the most formidable among the human kind is not the strongest or the smartest, but the one who can call upon the most friends. But not all optimization processes are sympathetic and worth being friends with (e.g. AI, some aliens, natural selection). An AI could model our minds directly, but we should not expect it to have the human form of sympathy.

22.14 High challenge

Life should not always be made easier for the same reason that video games should not always be made easier. You need games that are fun to play, not just fun to win. As humans we don’t wish to be passive blobs experiencing pleasure; we value ongoing processes like solving challenging problems – thus subjective experience is not enough: it matters that the journey and destination are real. We prefer real goals to fake ones; goals that are good to pursue and not just good to satisfy. So think in terms of eliminating low-quality work to make way for high-quality work, rather than eliminating all challenge. There must be the true effort, true experience, and true victory.

22.15 Serious stories

Pain can be much more intense than pleasure, and it seems that we prefer empathizing with hurting, sad, and even dead characters. Stories require conflict; otherwise we wouldn’t want to read them. Do we want the post-Singularity world to contain no stories worth telling? Perhaps Eutopia is not a complete absence of pain, but the absence of systematic pointless sorrow, plus more intense happiness and stronger minds not overbalanced by pain. Or perhaps we should eliminate pain entirely and rewire our neurons not to feel like a life of endless success is as painful as reading a story in which nothing ever goes wrong. Yudkowsky prefers the former approach, but doesn’t know if it can last in the long run.

22.16 Value is fragile

Goodness is inseparable from an optimization criterion or utility function; there is no ghostly essence of goodness apart from values like life, freedom, truth, happiness, beauty, altruism, excitement, humor, challenge, and many others. So don’t go looking for some pure essence of goodness distinct from, you know, actual good. If we cannot take joy in things that are merely good, our lives shall be empty indeed. Moreover, any future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals will contain almost nothing of worth.

An interesting universe (that would be incomprehensible to the universe today) is what the future looks like if things go right. But there are a lot of things that we value such that if we did everything else right when building an AI, but left out that one thing, the future would end up looking flat, pointless or empty. Merely human values do not emerge in all possible minds, and they will not appear from nowhere to rebuke and revoke the utility function of an expected paperclip maximizer. Value is fragile, as there is more than one way to shatter all value, especially when we let go of the steering wheel of the future. A universe devoid of human morals is dull moral noise, because value is physically represented in our brains alone.

22.17 The gift we give to tomorrow

Why would evolution produce morally motivated creatures? If evolution is stupid and cruel, how come we experience love and beauty and hope? Because long ago, evolution coughed up minds with values and preferences, and they were adaptive in the context of the hunter-gatherer savanna… these things need a lawful causal story that begins somehow. A complex pattern like love has to be explained by a cause that is not already that complex pattern. So once upon a time, there were lovers created by something that did not love. And because we love, so will our children’s children. Our complicated values are the gift that we give to tomorrow.

›

23	Quantified Humanism
	This chapter is on the tricky question of how we should apply value theory to our ordinary moral intuitions and decision-making. The cash value of a normative theory is how well it translates into normative practice. When should we trust our vague, naïve snap impulses, and when should we ditch them for a more explicit, informed, sophisticated and systematic model?

23.1 Scope insensitivity

The human brain can’t represent large quantities. Scope insensitivity (aka scope neglect) manifests itself when the willingness to pay for an altruistic action is affected very little by the scope of the action, even as exponentially more lives are at stake. For example, studies find that an environmental measure (cleaning up polluted lakes) that will save 200,000 birds doesn’t conjure anywhere near a hundred times the emotional impact and willingness-to-pay of a measure that would save 2,000 birds. This may be caused by “evaluation by prototype” or “purchase of moral satisfaction”.

23.2 One life against the world

Saving one human life might feel just as good as saving the whole world, but we ought to maximize and save the world when given the choice. Just because you have saved one life does not mean your duty to save lives has been satisfied. Beyond the warm glow of moral satisfaction, there is a huge difference: however valuable a life is, the whole world is billions of times as valuable. Choosing to save one life when you could have saved more is damning yourself as thoroughly as any murderer. To be an effective altruist you need to process those unexciting inky zeroes on paper.

23.3 The Allais Paradox

The Allais Paradox shows that experimental subjects, when making decisions, violate the axiom of independence (which says that preferring X over Y implies preferring P(x) over P(y) independently of (1-P)*(Z)) of decision theory. You are offered two pairs of gambles, each with a choice: the first set gives you $24K with certainty or a 33/34 chance of winning $27K and 1/34 chance of nothing; the second set gives you a 34% chance of winning $24K and 66% chance of nothing or a 33% chance of winning $27K and 67% chance of nothing. Most people prefer 1A over 1B but also prefer 2B over 2A. This is inconsistent, because 2A is the same as 1A with 1/3 probability, and 2B is the same as 1B with 1/3 probability. This inconsistency turns you into a money pump, because dynamic preference reversals can be exploited by trading bets for a price. Beware departing the Bayesian way!

23.4 Zut Allais!

When hearing about the Allais Paradox, some people defend their silly, incoherent preferences. Their intuition tells them that the certainty of $24K should count for something. But the elegance of Bayesian decision theory is not pointless. You are a flawed piece of machinery. If you want to achieve a goal, warm fuzzies and intuitions can lead you astray – so do the math and maximize expected utility. Preference reversals and the certainty effect (treating a shift from 0.99 to 1 as special) make you run around in circles. It’s not steering the future; it’s a mockery of optimization.

23.5 Circular altruism: Feeling moral

Would you rather have a googolplex people get dust specks in their eyes, which irritates their eyes a little for a fraction of a second (the “least bad” bad thing), or one person horribly tortured for 50 years? Many people would choose dust specks, or refuse to answer. They’d feel indignant at anyone who suggests torture. To merely multiply utilities seems too cold-blooded. But altruism isn’t the warm fuzzy feeling you get from being altruistic; it’s about helping others whatever the means. When you face a difficult and uncomfortable choice, you have to grit your teeth and choose anyway.

Suppose a disease or war is killing people, and you only have enough resources to implement one of the following policies: option 1 saves 400 lives with certainty; option 2 saves 500 lives with 90% probability and no lives with 10% probability. Many people would refuse to gamble with human lives. But if the options were (1) that 100 people die with certainty, or (2) that there is a 90% chance that nobody dies and 10% chance that 500 people die, the majority would choose the second option. The exact same gamble, framed differently, causes circular preferences. People prefer certainty, and they refuse to trade off sacred values (e.g. life) for unsacred ones. But our moral preferences shouldn’t be circular. If policy A is better than B and B is better than C, then A should really be better than C. If you actually want to help people, it’s not about your feelings; just shut up and multiply!

23.6 The “intuitions” behind “utilitarianism”

Why should we accept that e.g. two harmful events are worse than one? Like anything, utilitarianism is built on intuitions; but ignoring utilitarianism leads to moral inconsistencies (like circular preferences). Our intuitions, the underlying cognitive tricks that we use to build our thoughts, are an indispensable part of our cognition; but many of those intuitions are incoherent or undesirable upon reflection. Our intuitions about how to get to the destination are messed up due to things like scope insensitivity, so when lives are at stake we should care more about the destination than the journey. If you try to “renormalize” your intuition, you wind up with what is essentially utilitarianism. So if you want to save as many lives as possible, shut up and multiply.

23.7 Ends don’t justify means (among humans)

As Lord Acton said, “power tends to corrupt, and absolute power corrupts absolutely.” The possibility of unchecked power is tempting, and can corrupt even those who aren’t evil mutants; perhaps for the evolutionary reason that exploiting power helped our ancestors leave more offspring (ceteris paribus). Hence, young revolutionaries with good intentions will execute the adaptation for seeking power by arguing that the old power-structure is corrupt and should be overthrown (repeating the cycle). So can you trust your own seeming moral superiority?

The human brain is untrustworthy hardware, so we reflectively endorse deontological principles like “the ends don’t justify the means”. We have bizarre rules like “for the good of the tribe, do not cheat to seize power even for the good of the tribe”, to prevent people’s corrupted hardware from computing the idea that it would be righteous and altruistic to seize power for themselves. Thus we can live in peace. But superintelligent Friendly AI would follow classical decision theory, so it may well (and rightly) kill one person to save five. In the Trolley Problem thought experiment, we humans cannot occupy the epistemic state the thought experiment stipulates, but a hypothetical AI might. If the ends don’t justify the means, what does? In fact, deontological prohibitions are just consequentialist reasoning at one meta-level up.

23.8 Ethical injunctions

We evolved to feel ethically inhibited from violating the group code, because ethically cautious individuals reproduced more. The rare instances of punishment outweighed the value of e.g. stealing; likewise for hurting others “for the greater good”. You can get caught even if you think you can get away with it. Some people justify lying by appealing to expected utility; but maintaining lies is complex and when they collapse, you can get hurt. It is easier to recover from honest mistakes when you have ethical constraints. Ethics can sometimes save you from yourself.

Ethical injunctions (simple exceptionless principles) against self-deception or murdering innocents are useful for protecting you from your own cleverness when you’re tempted to do what seems like the right thing, because you are likely to be mistaken that these are right. When you lie, you take a black swan bet that can blow up and undo all the good it ever did. Knowing when you are better off with a false belief is the epistemically difficult part – it’s like trying to know which lottery ticket will win. Do not be impressed with people who say in grave tones, “it is rational to do unethical thing X because it will have benefit Y”. They will abandon their ethics at the very first opportunity.

23.9 Something to protect

A rationalist acquires his or her powers from having something to protect. Rationalists must value something more than “rationality”; something more than your own life and pride must be at stake, and you must have a desperate need to succeed. It takes something really scary to cause you to override your intuitions with math. Only then will you grow beyond your teachings and master the Way. In the dilemma where you can save 400 lives with certainty or save 500 lives with 90% probability (and no lives with 10% probability), your daughter is one of the 500 but you don’t know which one. Will you refuse to gamble with human lives and choose the comforting feeling of certainty because you think it is “rational” to choose certainty, or will you shut up and multiply to notice that you have an 80% chance of saving your daughter in the first case, and 90% in the second? Hopefully you care more about your daughter’s life than your pride as a rationalist.

23.10 When (not) to use probabilities

There is a chance, however remote, that novel physics experiments could destroy the Earth. For example, some people fear that the Large Hadron Collider (LHC) might create a black hole. They have even assigned a 1 in 1,000,000 probability of the LHC destroying the world. Banning novel physics experiments may be infeasible, but supposing it could be done, would it be wise given the risk? Should we ban physics? But these made-up numbers give an undue air of authority. Just debate the general rule of banning physics experiments without assigning specific probabilities.

Eliezer Yudkowsky does not always advocate using probabilities. Don’t make up verbal probabilities using a non-numerical procedure. Your brain is not so well-calibrated that you can pull a number out of thin air, call it a “probability”, perform deliberate Bayesian updates, and arrive at an accurate map. You may do better with your nonverbal gut feeling, for example when trying to catch a ball. You have evolved abilities to reason in the presence of uncertainty. The laws of probability theory govern, but often we can’t compute them; don’t bother trying if it won’t help.

23.11 Newcomb’s Problem and regret of rationality

A very famous problem in decision theory is Newcomb’s Problem. A superintelligence called Omega (who can perfectly predict you) asks you to pick a transparent box [Box A] with $1000 inside and/or an opaque one [Box B] that has $1 million inside only if you were predicted to pick it, Box B, alone (otherwise it contains nothing). Do you take both boxes, or only Box B? Causal decision theory (the traditional dominant view) says to take both, because the money is already in the boxes, and Omega has already left. But a rationalist should win the $1 million regardless of the algorithm needed, so you should take only Box B. It may appear to causal decision theorists as if the “rational” move of two-boxing is consistently punished, but that’s the wrong attitude to take. Rationalists should not envy others’ choices; the winning choice is the reasonable choice. Rationalists should win. If your particular ritual of cognition consistently fails to yield good results, change the ritual – don’t change the definition of winning. The winning Way is currently Bayescraft, but if it ever turns out that Bayes systematically fails relative to a superior alternative, then it has to go out the window. Pay attention to the money!

›

Interlude: The Twelve Virtues of Rationality

THIS ESSAY POETICALLY summarizes many of the lessons of Rationality: From AI to Zombies.

The first virtue is curiosity: a burning itch to relinquish your ignorance, and to know. The second virtue is relinquishment: not flinching from experiences that might destroy your beliefs, but evaluating beliefs before arriving at emotions. P.C. Hodgell said, “That which can be destroyed by the truth should be.” The third virtue is lightness: being quick to follow the evidence wherever it leads and surrendering to the truth. Let the winds of evidence blow you about as though you are a leaf, with no direction of your own.

The fourth virtue is evenness: not being selective about which arguments you inspect for flaws or attending only to favorable evidence. Use reason, not rationalization. The fifth virtue is argument: to strive for exact honesty rather than thinking it “fair” to balance yourself evenly between positions. Do not avoid arguing when truth is not handed out in equal portions before a debate. The sixth virtue is empiricism: asking not which beliefs to profess, but which experiences to anticipate. Base your knowledge in the roots of observation and the fruits of prediction.

The seventh virtue is simplicity: keeping additional specifications to a minimum. Each additional detail is another chance for the belief to be wrong or the plan to fail. The eighth virtue is humility: to take specific actions in anticipation of your own errors in your beliefs and plans. You are fallible, but do not boast of modesty. The ninth virtue is perfectionism: always seeking to do better so that you do not halt before taking your first steps, and settling for no less than the answer that is perfectly right. The more errors you correct in yourself, the more you notice, and the more you can advance.

The tenth virtue is precision: to shift your beliefs by exactly the right amount in response to each piece of evidence, in accordance with probability theory. Narrower statements expose themselves to a stricter test, and are more useful. The eleventh virtue is scholarship: studying many sciences and absorbing their power as your own, especially those that impinge upon rationality. Consume fields like decision theory, psychology, and more. Before these eleven virtues is the nameless virtue: to above all carry your map through to reflecting the territory, not by asking whether it is “the Way” to do this or that, or by believing the words of the Great Teacher, but by asking whether the sky is blue or green. Every step of your reasoning must cut through to the correct answer. If you fail to achieve a correct answer, it is futile to protest that you acted with propriety.

These then are twelve virtues of rationality: curiosity, relinquishment, lightness, evenness, argument, empiricism, simplicity, humility, perfectionism, precision, scholarship, and the void.

›

Next: The sixth (and final) book of rationality

Search This Blog

Omne ignotum pro magnifico

The penultimate book of rationality

Part V

Mere Goodness

Fake Preferences

Value Theory

Quantified Humanism

Interlude: The Twelve Virtues of Rationality

Comments

Post a Comment

Popular posts from this blog

The fourth book of rationality

The second book of rationality

The first book of rationality