Combining Sources of Information:  Base Rates and Likelihoods

People often make predictable errors about probabilities.  Here's a classic example:
"Bill is an unusually tall man in his early thirties.  He's the star of his team, which has an excellent chance of winning the state championship.
What is Bill's sport?"

Most people confidently say basketball, because Bill's description is just what you would expect for the star of a successful basketball team.  One name for this kind of information is "likelihood."  Statistically, it's the probability that the star of a successful basketball team fits Bill's description.  Let's say this probability is about 85%, while there's only a 40% probability that someone in any other organized sport fits the description..

The problem is Bill's age.  The vast majority of Americans beyond college age who compete in organized team sports are bowlers.  A conservative estimate might be that seventy percent of people on an organized sports team are on a bowling team, only five percent are on a basketball team, and 25 percent in all other organized team sports. This is called the "Base rate."

Mathematically, given my rough input numbers the probability Bill is a bowler is about 66% while the probably that he plays basketball is only about 10%.  We will see how this works in the next lecture.

The point is, you need to consider both likelihood and base rate.  You all know this already; for example, any unsolicited email offer that looks just like a wonderful opportunity ends up in your trash file because you know the base rate of unsolicited wonderful opportunities is vanishingly low.  Still, it is all too easy to forget.  For example, the one thing that every lottery winner has in common is that they all bought lottery tickets; the likelihood of having a ticket given that you are a winner is 100%.  If people didn't forget about the base rate, there would be no lotteries!