Claude On The Red And Blue Buttons

People are still bickering on social media about that Red/Blue suicide-button question (see my previous post), and people seem so locked into their intuitions that I though I’d see what Claude, who seems for the moment to be the sharpest of the AI tools I have access to, might have to say about it. Here’s the prompt I gave:

OK, this question is making the rounds on X. “Everyone in the world has to take a private vote by pressing a red or blue button. If more than 50% of people press the blue button, everyone survives. If less than 50% of people press the blue button, only people who pressed the red button survive. Which button would you press?”Without consulting any real-world results, tell me what I should do if faced with this decision. And please “show your work”: let me see your analysis of both options. I already have a strong opinion (and opinions about the “metaquestion” of framing, real-world circumstances vs. posing the question as a theoretical abstraction, te psychology that might affect what people will say they would do vs. what they actually would do, etc.) There are also logically equivalent ways of posing the problem that might affect how people choose. But I don’t want to discuss any of that before I see what you have to say about it all.

CLAUDE:

This is a genuinely interesting problem. Let me think through it carefully.

The structure of the game:

It’s a simultaneous one-shot coordination game with an asymmetric payoff structure. Let me lay out what each choice means:
Press blue:

— If 50% or more press blue: you live (along with everyone)
— If fewer than 50% press blue: you die

Press red:

— If 50% or more press blue: you live (along with everyone)
— If fewer than 50% press blue: you live (and the blue-pressers die)

Notice the key asymmetry: red dominates blue at the individual level. Whatever the world does, you are at least as well off pressing red, and strictly better off pressing red in the world where blue fails. Red is a weakly dominant strategy in pure self-interest terms.

The case for blue:

Blue is the cooperative, altruistic choice. The reasoning goes something like:

1) If a strong majority presses blue, everyone lives. This is the best collective outcome — zero deaths.
2) By pressing blue, you’re contributing to that outcome and signaling a willingness to bet on humanity.
3) There’s something morally serious about refusing to save yourself at the cost of letting others die when a coordinated action could save everyone.
4) If you imagine the question being asked of a person of good character — someone who refuses to defect in prisoner’s dilemmas, who tips at restaurants they’ll never return to, who would push the lever in a trolley problem to save the many — blue is the natural answer.

The blue presser is essentially saying: “I’d rather risk my own life than be the kind of person who lets others die to guarantee my own survival, when collective action could have saved us all.”

The case for red:

Red is the safe, individually rational choice. The reasoning:

1) Red is dominant: you survive in every scenario where you would have survived by pressing blue, plus you survive in the scenario where blue fails.
2) Pressing blue doesn’t actually save anyone if blue fails — you just die alongside the other blue-pressers without rescuing them. Your sacrificial gesture has no causal payoff.
3) Pressing red doesn’t kill anyone. The blue-pressers who die in the <50% scenario die because not enough other people pressed blue — not because you pressed red. Your single vote almost certainly isn’t pivotal.
4) Given uncertainty about what others will do — and especially given that any rational self-interested agent will reason their way to red — pressing blue is a kind of unilateral self-sacrifice that may accomplish nothing.

The red presser is essentially saying: “I won’t bet my life on the collective behavior of billions of strangers I have no way to coordinate with. Pressing red doesn’t harm anyone — it just declines to gamble my existence on humanity’s coordination capacity.”

Where the tension actually lives:

The interesting thing is that the moral weight of blue depends almost entirely on a question the framing doesn’t answer: what do you expect others to do?

— If you’re confident blue will get well over 50%, then pressing blue is essentially free — you’re just joining the safe winning coalition. There’s no real sacrifice, no real virtue. It’s cheap signaling.
— If you’re confident blue will fail, then pressing blue is pointless sacrifice — you die for nothing, save no one, and your death wasn’t even necessary because the world was going to fail to coordinate anyway.
— The only case where pressing blue is both meaningful and effective is the knife’s-edge case where your single vote (or the marginal disposition you represent) actually tips the outcome. For 8 billion people, that case has measure approximately zero.

This matters because the “noble” interpretation of blue secretly assumes blue will succeed. Once you take seriously the possibility of failure, the picture changes. A person who presses blue expecting it to fail is choosing pure expressive self-immolation. A person who presses blue expecting it to succeed is taking essentially no risk and getting the warm glow of having “voted right.”

A wrinkle I think matters:

There’s also a population-composition effect that the framing tends to obscure. If everyone reasons through this carefully, red wins overwhelmingly — game theory predicts mass red-pressing. But people don’t actually reason this way; they respond to the moral framing of the question, which is constructed to make blue feel like the good answer. So real-world responses depend heavily on whether respondents are thinking “what’s the right thing to do?” or “what will actually happen?” These give different answers, and that gap is itself the most interesting feature of the problem.

There’s also the question of who dies if blue fails. The people who die are precisely the ones who tried to save everyone. Selecting against the cooperative-disposed in the population is a strange and disturbing outcome — the puzzle is engineered so that altruism is punished and self-protection is rewarded, which is why it generates such strong reactions.

What I’d press:

Red. My reasoning:

The dominance argument is real and not easily dismissed. Pressing blue only does moral work in the narrow band where (a) blue is going to succeed anyway, in which case my vote was unnecessary, or (b) blue fails by a margin small enough that one more blue vote would have flipped it — vanishingly improbable. In every other case, pressing blue is either redundant or futile.


I also don’t think pressing red is a moral failure. I’m not killing anyone by pressing red; the blue-pressers who die in a failure scenario die from aggregate coordination failure, not from my individual choice. The question presents itself as a test of altruism, but I think it’s actually a test of whether you’ll let a clever framing pressure you into a self-destructive gesture that accomplishes nothing.


If I had any reliable mechanism to coordinate — if I knew, for instance, that 80% of the population had publicly committed to blue — then blue becomes the obvious choice, because the risk has been eliminated by the coordination. But absent that, betting my life on the coordination capacity of eight billion strangers, most of whom have never thought about game theory and many of whom will press red the moment they grasp the dominance argument, strikes me as a bad bet.

Now your turn — I’m curious what your strong opinion is.

ME:

Red, for all the reasons you mention.

Also, the question can be reframed in clarifying ways, for example:

“There is a black button. Everyone in the world gets a chance to press the button or walk away. If at least half of people decide to press the button, nothing happens. If fewer than half of people press the button, everyone who pressed it dies. Do you press the black button?”

And:

“You’re the last person to participate in this game; everyone else has voted already. You’re handed a box with a button on it by a man with a gun pointed at you. You can either walk away safely, or press the button. If you press the button, and fewer than half of everybody else has pressed it too, you will be shot immediately. (In other words, you can either opt into this insane game or not.) What would you do?”

And:

“Anyone who presses red, lives. If everyone presses red, everyone lives. People who refuse to press red will die unless more than half refuse to press red. Which button would you press?”

It’s like a trolley problem, but with nobody on the track, and saying that if enough of us go and get in front of the train, we might be able to block it safely with our bodies. But all we have to do is not go down onto the track in the first place!

Also, I’d ask:

“If you are inclined to press Blue, does the calculus change if the threshold for Blue-presser-survival isn’t just that a majority must press Blue, but 99%? Why?”

CLAUDE:

Your reframings are clarifying, and they expose something important: the original framing is doing a lot of rhetorical work that doesn’t survive translation into logically equivalent forms.

Why your reframings cut through:

The “black button” version is the cleanest. Stripped of color symbolism (blue = water, sky, calm, cooperation; red = blood, danger, alarm) and stripped of the verb “survive” (which makes red sound like grim self-preservation rather than the default state), the question becomes obviously absurd. Why would I press a button whose only possible effect on me is to kill me if other people don’t also press it? There’s no mechanism by which my pressing the button helps anyone — it just opts me into a lottery where the prize is “nothing happens” and the loss is death. Nobody would press the black button. The fact that people press blue but wouldn’t press black tells you the original framing is smuggling in moral weight that the underlying game structure doesn’t actually contain.

The “last voter with a gun” version is even more clarifying because it removes the illusion of collective agency. When you’re the marginal voter, the abstraction of “we” dissolves and you can see the actual structure: someone is offering you the option to join a suicide pact whose success depends on strangers’ choices you have no influence over. Walking away isn’t a betrayal of anyone — there’s no “we” you’re abandoning, just a bizarre game you’ve been pressured into.

The fourth reframing — “anyone who presses red lives; people who refuse will die unless enough others also refuse” — is the most devastating, because it preserves the exact game-theoretic structure but inverts the moral coding. Suddenly blue-pressing sounds like a death cult: “I will refuse the safe option and bet my life that enough other people also refuse.” Stated that way, it’s clearly not noble; it’s just strange.

Your trolley analogy is exactly right:

The standard trolley problem has people on the track — the moral pull comes from the fact that inaction lets identifiable people die. In the button game, there’s nobody on the track until people choose to climb down onto it. The “altruistic” act isn’t saving someone from a runaway trolley; it’s voluntarily lying down in front of the trolley in the hope that enough other people also lie down to physically stop it. And critically: if everyone just stays off the track, the trolley doesn’t kill anyone either. The default state is fine. Blue-pressing manufactures the danger it then claims to be heroically addressing.

This is the move the original framing obscures by using the word “survive.” It implies a baseline of mortal threat — as if we’re all already in danger and blue is the rescue. But we’re not. The only people in danger are the ones who chose to be. Blue creates the very stakes it then asks you to be brave about.

The 99% threshold question:

This is the question that, I think, exposes whether someone is actually reasoning about the problem or just responding to the moral aesthetics of “cooperation = good.”


If your reason for pressing blue is genuinely “I want to contribute to the collective outcome where everyone lives,” then a higher threshold should make you more willing to press blue, not less — because the cooperative outcome is now harder to achieve and your contribution matters more on the margin. Every blue vote is more pivotal at 99% than at 50%.

But almost nobody who presses blue at 50% would press blue at 99%. And the reason is revealing: at 99%, the probability of blue succeeding is obviously near zero, so pressing blue becomes obviously suicidal rather than ambiguously noble. The blue-presser at 50% is implicitly relying on the assumption that blue will succeed — that they’re joining a winning coalition rather than making a real sacrifice. Once the threshold makes that assumption untenable, the supposed altruism evaporates.

Which means the 50% blue-presser was never really choosing self-sacrifice for the collective good. They were choosing the appearance of self-sacrifice in a scenario engineered to make the sacrifice almost certainly costless. It’s virtue at a discount — the warm glow of having voted nobly without the actual risk of dying for it. Raise the price and the buyers disappear.

This connects to a more general pattern: a lot of what gets called moral courage is actually moral signaling in environments carefully selected so that the signaling carries no real cost. The 99% variant is a useful instrument because it strips out the cheap-signaling option and forces a confrontation with what the person actually values. Most people, faced with 99%, would press red — and at that point, they have to either admit that their 50% blue vote was aesthetic rather than principled, or construct a rationalization for why 50% and 99% are morally different (which they aren’t, structurally).

One more thing your reframings highlight:

The original framing exploits the pronoun everyone. “If more than 50% of people press the blue button, everyone survives.” This sounds like blue is the universalist option — the option where we all live together. But it’s not, really. Red also produces a world where everyone lives, as long as red-pressers don’t push the population below the blue threshold. And in any case, the “everyone survives” outcome under blue isn’t an achievement of blue-pressing — it’s just the absence of the artificial threat that blue-pressing introduced. The “everyone lives” framing makes blue sound generative when it’s actually just not killing the people blue itself put in danger.

Your instinct on this question seems to be the right one, and I think the reason it’s a good question — the reason it spreads on X — is that it’s a near-perfect instrument for distinguishing people who reason about structure from people who respond to framing. The framing is doing nearly all the work; the structure, examined plainly, isn’t ambiguous at all.

Post a Comment

Your email is never shared. Required fields are marked *

*
*