Means Are Not Enough

If you keep a running log (which you should do), then one of the things that you may like to do with it is to compare your average pace in one training season to your average pace in another training season.

For example, suppose that you followed an identical running schedule in training seasons A and B, with season B following season A but with some sort of boot-camp, cross-training, or gym workouts coming between seasons A and B. And suppose that you had a mean pace of 11:05/mile in season A and a mean pace of 10:55/mile in season B.

Would it be fair to say that there is a statistically significant improvement in your mean pace in season B in comparison to season A?

After all, the supposition is that season A had an 11:05/mile mean and that season B had a 10:55/mile mean — an improvement of ten seconds per mile!

Unfortunately, it is not enough to look at two means and say that one is statistically different than the other.

Of course, season B’s mean pace could be better than season A’s mean pace in a statistically significant way. But we cannot tell from looking at the means alone.

The path to statistical significance requires us to know how many values went into each sample mean and to know the variability of those values.

The sample means from seasons A and B in this scenario would be written in statistical notation as x-bar-subscript-A and x-bar-subscript-B, respectively.
The “how many values” part is easy to identify in this scenario. It is simply the number of running workouts on the running schedule. Written in statistical notation to refer to seasons A and B, these two numbers would be written as n-subscript-A and n-subscript-B, respectively. And, because we originally supposed that you followed exactly the same schedule in those two seasons, we also have that n-subscript-A equals n-subscript-B, which equals the total number of runs prescribed by the schedule.
The “variability of those values” part is calculated as what is called the sample standard deviation and would be written in statistical notation as s-subscript-A and s-subscript-B for seasons A and B, respectively. You can search the Web for information on how to calculate a sample standard deviation. And some computer spreadsheet programs will calculate this for you.

With the sample means, sample sizes, and sample standard deviations in hand, you can calculate what is called a “two-sample t statistic” (the formula for which is taught in many textbooks and on many websites) and determine from this calculation whether season B truly was better than season A at a specific level of statistical significance (such as 95%).

Means Are Not Enough

Kirk Mahoney