Combining Sources of Information: Base Rates and Likelihoods
People often make predictable errors about probabilities. Here's
a classic example:
"Bill is an unusually tall man in his early thirties. He's the star
of his team, which has an excellent chance of winning the state championship.
What is Bill's sport?"
Most people confidently say basketball, because Bill's description is just
what you would expect for the star of a successful basketball team.
One name for this kind of information is "likelihood."
Statistically, it's the probability that the star of a successful basketball
team fits Bill's description. Let's say this probability is about 85%,
while there's only a 40% probability that someone in any other organized
sport fits the description..
The problem is Bill's age. The vast majority of Americans beyond college
age who compete in organized team sports are bowlers. A conservative
estimate might be that seventy percent of people on an organized sports team
are on a bowling team, only five percent are on a basketball team, and 25
percent in all other organized team sports. This is called the "Base
rate."
Mathematically, given my rough input numbers the probability Bill is a bowler
is about 66% while the probably that he plays basketball is only about 10%.
We will see how this works in the next lecture.
The point is, you need to consider both likelihood and base rate. You
all know this already; for example, any unsolicited email offer that looks
just like a wonderful opportunity ends up in your trash file because you
know the base rate of unsolicited wonderful opportunities is vanishingly
low. Still, it is all too easy to forget. For example, the one
thing that every lottery winner has in common is that they all bought lottery
tickets; the likelihood of having a ticket given that you are a winner is
100%. If people didn't forget about the base rate, there would be no
lotteries!