A common argument against counter-terrorism measures is that more people are killed each year by road accidents than by terrorists. Whilst this statistic may be true, it is a false analogy and a red herring argument against counter-terrorism. It also ignores the fact that counter-terrorism deters and prevents more terrorist attacks than those that are eventually carried out.
This fallacious argument can be generalised as follows: ‘More people are killed by (fill-in-the-blank) than by terrorists, so why should we worry about terrorism?’ In recent media debates, the ‘blank’ has included not only road accidents, but also deaths from falling fridges and bathtub drownings. However, for current purposes let us assume that more people do die from road accidents than would have died from either prevented or successful terrorist attacks.
Whenever we travel in a car, almost everybody is aware that there is a small but finite risk of being injured or killed. Yet this risk does not keep us away from cars. We intuitively make an informal risk assessment that the level of this risk is acceptable in the circumstances. In other words, we consent to take the risk of travelling in cars, because we decide that the low level of risk of an accident does not outweigh the benefits of car transport.
On the other hand, in western countries we do not consent to take the risk of being murdered by terrorists, unless we deliberately decide to visit a terrorist-prone area like Syria, northern Iraq or the southern Philippines. A terrorist attack could occur anywhere in the West, so unlike the road accident analogy, there is no real choice a citizen can make to consent or not consent to the risk of a terrorist attack.
The Consent to risk fallacy omits this critical factor of choice from the equation, so the analogy between terrorism and road accidents is false.
Statistics is a useful tool for understanding the patterns in the world around us. But our intuition often lets us down when it comes to interpreting those patterns. In this series we look at some of the common mistakes we make and how to avoid them when thinking about statistics, probability and risk.
You don’t have to wait long to see a headline proclaiming that some food or behaviour is associated with either an increased or a decreased health risk, or often both. How can it be that seemingly rigorous scientific studies can produce opposite conclusions?
Nowadays, researchers can access a wealth of software packages that can readily analyse data and output the results of complex statistical tests. While these are powerful resources, they also open the door to people without a full statistical understanding to misunderstand some of the subtleties within a dataset and to draw wildly incorrect conclusions.
Here are a few common statistical fallacies and paradoxes and how they can lead to results that are counterintuitive and, in many cases, simply wrong.
What is it?
This is where trends that appear within different groups disappear when data for those groups are combined. When this happens, the overall trend might even appear to be the opposite of the trends in each group.
One example of this paradox is where a treatment can be detrimental in all groups of patients, yet can appear beneficial overall once the groups are combined.
How does it happen?
This can happen when the sizes of the groups are uneven. A trial with careless (or unscrupulous) selection of the numbers of patients could conclude that a harmful treatment appears beneficial.
Consider the following double blind trial of a proposed medical treatment. A group of 120 patients (split into subgroups of sizes 10, 20, 30 and 60) receive the treatment, and 120 patients (split into subgroups of corresponding sizes 60, 30, 20 and 10) receive no treatment.
The overall results make it look like the treatment was beneficial to patients, with a higher recovery rate for patients with the treatment than for those without it.
However, when you drill down into the various groups that made up the cohort in the study, you see in all groups of patients, the recovery rate was 50% higher for patients who had no treatment.
But note that the size and age distribution of each group is different between those who took the treatment and those who didn’t. This is what distorts the numbers. In this case, the treatment group is disproportionately stacked with children, whose recovery rates are typically higher, with or without treatment.
Base rate fallacy
What is it?
This fallacy occurs when we disregard important information when making a judgement on how likely something is.
If, for example, we hear that someone loves music, we might think it’s more likely they’re a professional musician than an accountant. However, there are many more accountants than there are professional musicians. Here we have neglected that the base rate for the number of accountants is far higher than the number of musicians, so we were unduly swayed by the information that the person likes music.
How does it happen?
The base rate fallacy occurs when the base rate for one option is substantially higher than for another.
Consider testing for a rare medical condition, such as one that affects only 4% (1 in 25) of a population.
Let’s say there is a test for the condition, but it’s not perfect. If someone has the condition, the test will correctly identify them as being ill around 92% of the time. If someone doesn’t have the condition, the test will correctly identify them as being healthy 75% of the time.
So if we test a group of people, and find that over a quarter of them are diagnosed as being ill, we might expect that most of these people really do have the condition. But we’d be wrong.
According to our numbers above, of the 4% of patients who are ill, almost 92% will be correctly diagnosed as ill (that is, about 3.67% of the overall population). But of the 96% of patients who are not ill, 25% will be incorrectly diagnosed as ill (that’s 24% of the overall population).
What this means is that of the approximately 27.67% of the population who are diagnosed as ill, only around 3.67% actually are. So of the people who were diagnosed as ill, only around 13% (that is, 3.67%/27.67%) actually are unwell.
Worryingly, when a famous study asked general practitioners to perform a similar calculation to inform patients of the correct risks associated with mammogram results, just 15% of them did so correctly.
Will Rogers paradox
What is it?
This occurs when moving something from one group to another raises the average of both groups, even though no values actually increase.
The name comes from the American comedian Will Rogers, who joked that “when the Okies left Oklahoma and moved to California, they raised the average intelligence in both states”.
Former New Zealand Prime Minister Rob Muldoon provided a local variant on the joke in the 1980s, regarding migration from his nation into Australia.
How does it happen?
When a datapoint is reclassified from one group to another, if the point is below the average of the group it is leaving, but above the average of the one it is joining, both groups’ averages will increase.
Consider the case of six patients whose life expectancies (in years) have been assessed as being 40, 50, 60, 70, 80 and 90.
The patients who have life expectancies of 40 and 50 have been diagnosed with a medical condition; the other four have not. This gives an average life expectancy within diagnosed patients of 45 years and within non-diagnosed patients of 75 years.
If an improved diagnostic tool is developed that detects the condition in the patient with the 60-year life expectancy, then the average within both groups rises by 5 years.
What is it?
Berkson’s paradox can make it look like there’s an association between two independent variables when there isn’t one.
How does it happen?
This happens when we have a set with two independent variables, which means they should be entirely unrelated. But if we only look at a subset of the whole population, it can look like there is a negative trend between the two variables.
This can occur when the subset is not an unbiased sample of the whole population. It has been frequently cited in medical statistics. For example, if patients only present at a clinic with disease A, disease B or both, then even if the two diseases are independent, a negative association between them may be observed.
Consider the case of a school that recruits students based on both academic and sporting ability. Assume that these two skills are totally independent of each other. That is, in the whole population, an excellent sportsperson is just as likely to be strong or weak academically as is someone who’s poor at sport.
If the school admits only students who are excellent academically, excellent at sport or excellent at both, then within this group it would appear that sporting ability is negatively correlated with academic ability.
To illustrate, assume that every potential student is ranked on both academic and sporting ability from 1 to 10. There are an equal proportion of people in each band for each skill. Knowing a person’s band in either skill does not tell you anything about their likely band in the other.
Assume now that the school only admits students who are at band 9 or 10 in at least one of the skills.
If we look at the whole population, the average academic rank of the weakest sportsperson and the best sportsperson are both equal (5.5).
However, within the set of admitted students, the average academic rank of the elite sportsperson is still that of the whole population (5.5), but the average academic rank of the weakest sportsperson is 9.5, wrongly implying a negative correlation between the two abilities.
Multiple comparisons fallacy
What is it?
This is where unexpected trends can occur through random chance alone in a data set with a large number of variables.
How does it happen?
When looking at many variables and mining for trends, it is easy to overlook how many possible trends you are testing. For example, with 1,000 variables, there are almost half a million (1,000×999/2) potential pairs of variables that might appear correlated by pure chance alone.
While each pair is extremely unlikely to look dependent, the chances are that from the half million pairs, quite a few will look dependent.
The Birthday paradox is a classic example of the multiple comparisons fallacy.
In a group of 23 people (assuming each of their birthdays is an independently chosen day of the year with all days equally likely), it is more likely than not that at least two of the group have the same birthday.
People often disbelieve this, recalling that it is rare that they meet someone who shares their own birthday. If you just pick two people, the chance they share a birthday is, of course, low (roughly 1 in 365, which is less than 0.3%).
However, with 23 people there are 253 (23×22/2) pairs of people who might have a common birthday. So by looking across the whole group you are testing to see if any one of these 253 pairings, each of which independently has a 0.3% chance of coinciding, does indeed match. These many possibilities of a pair actually make it statistically very likely for coincidental matches to arise.
For a group of as few as 40 people, it is almost nine times as likely that there is a shared birthday than not.
Many public conversations we have about science-related issues involve communicating risks: describing them, comparing them and trying to inspire action to avoid or mitigate them.
Just think about the ongoing stream of news and commentary on health, alternative energy, food security and climate change.
Good risk communication points out where we are doing hazardous things. It helps us better navigate crises. It also allows us to pre-empt and avoid danger and destruction.
But poor risk communication does the opposite. It creates confusion, helplessness and, worst of all, pushes us to actively work against each other even when it’s against our best interests to do so.
So what’s happening when risk communications go wrong?
People are just irrational and illogical
If you’re science-informed – or at least science-positive – you might confuse being rational with using objective, science-based evidence.
To think rationally is to base your thinking in reason or logic. But a conclusion that’s logical doesn’t have to be true. You can link flawed, false or unsubstantiated premises to come up with a logical-but-scientifically-unsubstantiated answer.
For example, in Australia a few summers back there was increase in the number of news reports of sharks attacking humans. This lead to some dramatic shark baiting and culling. The logic behind this reaction was something like:
there have been more reports of shark attacks this year than before
more reports means more shark attacks are happening
more shark attacks happening means the risk of shark attack has increased
we need to take new measures to keep sharks away from places humans swim to protect us from this increased risk.
You can understand the reasoning here, but it’s likely to have been based on flawed premises. Like not realising that one shark attack was not systematically linked to another (for example, some happened on different sides of the country). People here saw connections between events that probability suggests were actually random.
Prove it’s safe or we’ll say no
If people are already nervous about – or actively against – a risky proposition, one reaction is to demand proof of safety. But safety is a relative term and risk calculation doesn’t work that way.
To demand proof of safety is to demand certainty, and such a demand is scientifically impossible. Uncertainty is at the heart of the scientific method. Or rather, qualifying and communicating degrees of uncertainty is.
In reality, we live in a world where we have to agree on what constitutes acceptable risk, because we simply can’t provide proof of safety. To use an example I’ve noted before, we can’t prove orange juice is 100% safe, yet it remains defiantly on our supermarket shelves.
Don’t worry, this formula will calm your fears
You may have seen this basic risk calculation formula:
Risk (or hazard) = (the probability of something happening) × (the consequences of it happening)
This works brilliantly for insurance assessors and lab managers, but it quickly falls over when you use it to explain risk in the big bad world.
Everyday reactions to how bad a risk seems are more often ruled by the formula (hazard) × (outrage), where “outrage” is fuelled by non-technical, socially-driven matters.
Basically, the more outraged (horrified, frightened) we are by the idea of something happening, the more likely we are to consider it unacceptable, regardless of how statistically unlikely it might be.
The shark attack examples serves here, too. The consequences of being attacked by a shark are outrageous, and this horror colours our ability to keep the technical likelihood of an attack in perspective. The emotional reality of our feelings of outrage eclipse technical, detached risk calculations.
Significant means useful
Everyone who’s worked with statistics knows that statistical significance can be a confusing idea. For example, one study looked at potential links between taking aspirin everyday and the likelihood of having a heart attack.
Among the 22,000 people in the study, those who took daily aspirin were less likely to have a heart attack than those who didn’t, and the result was statistically significant.
Sounds like something worth paying attention to, until you discover that the difference in the likelihood of having a heart attack between those who were taking aspirin every day and those who weren’t was less than 1%.
Significance ain’t always significant.
Surely everyone understands percentages
It’s easy to appreciate that complex statistics and formulae aren’t the best tools for communicating risk beyond science-literate experts. But perhaps simple numbers – such as percentages – could help remove some of the confusion when talking about risk?
We see percentages everywhere – from store discounts, to weather forecasts telling you how likely it is to rain. But percentages can easily confuse, or at least slow people down.
Take this simple investment decision example. If you were offered a choice between the following three opportunities, which would you take?
have your bank balance raised by 50% and then cut by 50%
have your bank balance cut by 50% and then raised by 50%
have your bank balance remain where it is
You probably got this right. But perhaps you didn’t. Or perhaps it took you longer than you’d expected to think it through. Don’t feel bad. (The answer is at the end of this article.)
I have used this in the classroom, and even science-literate university students can get it wrong, especially if they are asked to decide quickly.
Now imagine if these basic percentages were all you had to make a real, life-or-death decision (while under duress).
Just a few simple numbers could be helpful, couldn’t they?
Well actually, not always. Research into a phenomenon known as anchoring and adjustment shows that the mere presence of numbers can affect how likely or common we estimate something might be.
In this study, people were asked one of the following two questions:
how many headaches do you have a month: 0, 1, 2?
how many headaches do you have a month: 5, 10, 15?
Estimates were higher for responses to the second question, simply because the numbers used in the question to prompt their estimates were higher.
At least the experts are evidence-based and rational
Well, not necessarily. It turns out experts can be just as prone to the influences of emotion and the nuances of language as we mere mortals.
In a classic study from 1982, participants were asked to imagine they had lung cancer and were told they would be given a choice of two therapies: radiation or surgery.
They were then informed either (a) that 32% of patients were dead one year after radiation, or (b) that 68% of patients were alive one year after radiation. After this they were asked to hypothetically choose a treatment option.
About 44% of the people who were told the survival statistic chose radiation, compared to only 18% of those who were told the death statistic, even though the percentages reflected the same story about surviving radiation treatment.
What’s most intriguing here is that these kinds of results were similar even when research participants were doctors.
So what can we do?
By now, science-prioritising, reason-loving, evidence-revering readers might be feeling dazed, even a little afraid.
If we humans, who rely on emotional reactions to assess risks, can be confused even by simple numbers, and are easily influenced by oddities of language, what hope is there for making serious progress when trying to talk about huge risky issues such as climate change?
First, don’t knock emotion-driven, instinct-based risk responses: they’re useful. If you’re surfing and you notice a large shadow lurking under your board, it might be better to assume it’s a shark and act accordingly.
Yes it was probably your board’s shadow, and yes you’ll feel stupid for screaming and bolting for land. But better to assume it was a shark and be wrong, than assume it was your shadow and be wrong.
But emotion-driven reactions to large, long-term risks are less useful. When assessing these risks, we should resist our gut reactions and try not to be immediately driven by how a risk feels.
We should step back and take a moment to assess our own responses, give ourselves time to respond in a way that incorporates where the evidence leads us. It’s easy to forget that it’s not just our audiences – be they friends or family, colleagues or clients – who are geared to respond to risks like a human: it’s us as well.
With a bit of breathing space, we can try and see how the tricks and traps of risk perception and communication might be influencing our own judgement.
Perhaps you’ve logically linked flawed premises, or have been overly influenced by a specific word or turn of phrase. It could be your statistical brain has been overwhelmed by outrage, or you tried to process some numbers a little too quickly.
If nothing else, at least be wary of shouting “Everyone’s gotta love apples!” if you’re trying to communicate with a room full of orange enthusiasts. Talking at cross-purposes or simply slamming opposing perspectives on a risk is probably the best way to destroy any risk communication effort – well before these other quirks of being human even get a chance to mess it up.
Answer: Assume you start with $100. Options 1 and 2 leave you with $75, option 3 leaves you with your original $100. Note that no option puts you in a better position.
It is often said that our approach to health and safety has gone mad. But the truth is that it needs to go scientific. Managing risk is ultimately linked to questions of engineering and economics. Can something be made safer? How much will that safety cost? Is it worth that cost?
Decisions under uncertainty can be explained using utility, a concept introduced by Swiss mathematician Daniel Bernoulli 300 years ago, to measure the amount of reward received by an individual. But the element of risk will still be there. And where there is risk, there is risk aversion.
Risk aversion itself is a complex phenomenon, as illustrated by psychologist John W. Atkinson’s 1950s experiment, in which five-year-old children played a game of throwing wooden hoops around pegs, with rewards based on successful throws and the varying distances the children chose to stand from the pegs.
The risk-confident stood a challenging but realistic distance away, but the risk averse children fell into two camps. Either they stood so close to the peg that success was almost guaranteed or, more perplexingly, positioned themselves so far away that failure was almost certain. Thus some risk averse children were choosing to increase, not decrease, their chance of failure.
So clearly high aversion to risk can induce some strange effects. These might be unsafe in the real world, as testified by author Robert Kelsey, who said that during his time as a City trader, “bad fear” in the financial world led to either “paralysis… or nonsensical leaps”. Utility theory predicts a similar effect, akin to panic, in a large organisation if the decision maker’s aversion to risk gets too high. At some point it is not possible to distinguish the benefits of implementing a protection system from those of doing nothing at all.
So when it comes to human lives, how much money should we spend on making them safe? Some people prefer not to think about the question, but those responsible for industrial safety or health services do not have that luxury. They have to ask themselves the question: what benefit is conferred when a safety measure “saves” a person’s life?
The answer is that the saved person is simply left to pursue their life as normal, so the actual benefit is the restoration of that person’s future existence. Since we cannot know how long any particular person is going to live, we do the next best thing and use measured historical averages, as published annually by the Office of National Statistics. The gain in life expectancy that the safety measure brings about can be weighed against the cost of that safety measure using the Judgement value, which mediates the balance using risk-aversion.
The Judgement (J) value is the ratio of the actual expenditure to the maximum reasonable expenditure. A J-value of two suggests that twice as much is being spent as is reasonably justified, while a J-value of 0.5 implies that safety spend could be doubled and still be acceptable. It is a ratio that throws some past safety decisions into sharp relief.
For example, a few years ago energy firm BNFL authorised a nuclear clean-up plant with a J-value of over 100, while at roughly the same time the medical quango NICE was asked to review the economic case for three breast cancer drugs found to have J-values of less than 0.05.
The Government of the time seemed happy to sanction spending on a plant that might just prevent a cancer, but wanted to think long and hard about helping many women actually suffering from the disease. A new and objective science of safety is clearly needed to provide the level playing field that has so far proved elusive.
Putting a price on life
Current safety methods are based on the “value of a prevented fatality” or VPF. It is the maximum amount of money considered reasonable to pay for a safety measure that will reduce by one the expected number of preventable premature deaths in a large population. In 2010, that value was calculated at £1.65m.
This figure simplistically applies equally to a 20-year-old and a 90-year-old, and is in widespread use in the road, rail, nuclear and chemical industries. Some (myself included) argue that the method used to reach this figure is fundamentally flawed.
In the modern industrial world, however, we are all exposed to dangers at work and at home, on the move and at rest. We need to feel safe, and this comes at a cost. The problems and confusions associated with current methods reinforce the urgent need to develop a new science of safety. Not to do so would be too much of a risk.