## Sunday, February 14, 2010

### Randomness and probability: quantify with caution

One of the main lessons (or reminders for those trained in statistics) found in the Drunkards Walk deals with the reliability of data. Mlodino makes the important point that people latch on to numerical values. He uses wine tasting rankings as an example of how the numerical ranking has a huge impact on price, yet under blind tests, the people ranking are next to useless at actually determining which wine they are drinking.

Let’s look at a recent example of people latching on to the quantity without thinking of the error. Lately, ABS data has shown that unemployment has fallen by 0.1% - what is the probability that in reality unemployment has actually risen?
To determine this, we should take a close look at the errors in the ABS sample. Shown below are the 95% confidence intervals for the most recently published labour statistics.

You will notice that this interval covers a spectrum from negative to positive. If we take away all the error that keeps the change in unemployment in negative territory, we get a 16% chance that unemployment actually increased. This is simply due to sampling error. The true error, which includes possible inaccuracies of the data itself, would be much greater.

In academia, this scale of error would lead to the conclusion that the change in unemployment is indistinguishable from zero.

Of course, over time we can smooth out some of this error. For example, the Oct-Nov 2009 change in unemployment figures were identical, which brings the chance that two consecutive data points are in the wrong sign down to 2.56%.

Movements in seasonally adjusted series between November 2009 and December 2009

Monthly change                             95% Confidence interval
Total Employment       35 200       -17 800 to 88 200
Total Unemployment  -10 600        -41 600 to 20 400
Unemployment rate    -0.1 pts        0.3 pts to 0.1 pts
Participation rate         0.0 pts         0.4 pts to 0.4 pts

The point of this exercise is to bear in mind that all measurements contain error. There must be thousands of useless statistics and rankings that have a surprisingly large impact on our lives, but in reality can become meaningless in light of inherent randomness.