Thursday, June 19, 2014

Comparing types of means

There are four basic types of average: the arithmetic (what most people know), the geometric, the harmonic, and the quadratic.

Everyone knows how to calculate the arithmetic mean: add up all the numbers, then divide by total number of items. The geometric mean is also fairly simple: multiply all the numbers, then root by the total number of items. Two numbers would be multiplied and square rooted. Three numbers would be multiplied and cube rooted. The quadratic mean, also known as the root mean square, is moderately straightforward: square each number, add them up, divide by the total number of items, then square root. An equivalent way to define the quadratic mean is as the square root of the arithmetic mean of the squared terms. (Another connection: the standard deviation in statistics is the root mean square of deviations from the arithmetic mean.) Finally, the harmonic mean is the most complicated: take the reciprocal of each number, add them up, divide by the total number of items, then find the reciprocal. The equivalent way to define the harmonic mean is as the reciprocal of the arithmetic mean of the reciprocals of the original terms.

I've seen the comparison between different means before. It is a well-known inequality that $H\le G\le A\le Q$ for the same set of data. What I haven't seen is a comparison within one type of mean between multiple data sets. I decided to look at two-item data sets so that they can be plotted on a two-dimensional coordinate grid. I plotted a dozen (x, y) points: the x and y coordinates represent the two values in the data sets.

Here, the lines represent constant arithmetic means. For example, (0, 1), (1, 0), and (0.5, 0.5) would all have the same arithmetic mean: 0.5. These lines are like contour lines on a topographical map: different data set but same mean just as different places can have the same elevation.

Here, the curves represent constant geometric means. Two points have equal geometric means if the products x*y are equal. This yields reciprocal functions for constant geometric means. Imagine one starts at a point where x=y. Then add a constant c to x and subtract c from y. The arithmetic mean would remain the same. But the geometric mean would decrease:

${x}^{2}>\left(x+c\right)\left(x-c\right)\phantom{\rule{0ex}{0ex}}$

That means that making the two terms more unbalanced will make the geometric mean smaller. One way to conceptualize it is that with unbalanced terms, the geometric mean will be pushed closer to the smallest term than the arithmetic mean will be. The geometric mean is more sensitive to small outliers.

The quadratic mean is the exact opposite:

Starting from a point where x=y, increasing x by c and decreasing y by c will actually increase the quadratic mean. This is because

${x}^{2}+{x}^{2}<\left(x+c{\right)}^{2}+\left(x-c{\right)}^{2}\phantom{\rule{0ex}{0ex}}$

With unbalanced terms, the quadratic mean is dragged closer to the largest term than the arithmetic mean will be.

The harmonic mean looks basically like the geometric mean, except that the contour lines are hyperbolas, not reciprocal functions.

The only difference is that unbalancing the terms decreases the harmonic mean even faster than it decreases the geometric mean. Thus, $H\le G\le A\le Q$.