I show my students histograms of more or less normally distributed real-life data. I have found it difficult, though, to get a Normal curve that fits nicely on top of the histogram. Is there a way to do a best-fit regression in this situation? I looked around and can't find one, so here's a procedure I came up with. I'm not sure if it's the best possible fit, but it's a good fit.
Using the fact that a normal distribution is given by the equation
data:image/s3,"s3://crabby-images/cc0ea/cc0eaf70fd769ff5aab4b8989f2d8f89174e92ba" alt=""
you can work backwards to see how to make the data linear. That is,
data:image/s3,"s3://crabby-images/97c54/97c5438078e84f7f41191e76f3b9bf9f13e9e2fb" alt=""
is a linear transformation of normally distributed data. Here is a histogram.
data:image/s3,"s3://crabby-images/63ec1/63ec1801d0f20885bedad450e908d4f2eddde489" alt=""
I used the midpoint of each bin as the
x data, and then I transformed the
y data as described.
data:image/s3,"s3://crabby-images/0471d/0471d69e4b97ef75834df99a82d65496c2b09fa3" alt=""
The one trick is that, for
x values below the mean, the transformed
y data points need to be negative. That can create a little ambiguity for the middle bin, but it's not too hard to tell here that 2.5 is a little below the mean. If in doubt, try both. I ran a linear regression on
data:image/s3,"s3://crabby-images/b6ccf/b6ccfcfd5e4b18d8bad7a2af6d0cc7bb77c9762c" alt=""
and found that
data:image/s3,"s3://crabby-images/ebfef/ebfef8d6f7ae2ae0714cb7a07dfbb3474901aa7c" alt=""
This can be transformed back into
data:image/s3,"s3://crabby-images/bfa27/bfa272bb61f98fe9329f54e82e7487e5215f15bd" alt=""
which is almost ready to graph. All it's missing is the leading coefficient. A little work shows that the standard deviation is 0.88, and therefore the final equation is
data:image/s3,"s3://crabby-images/8f340/8f340bd21704dbdd377895bae4f634afc2b6718c" alt=""
Here's the histogram, with the overlaid normal curve, which does not fit especially well.
data:image/s3,"s3://crabby-images/b0412/b0412de652a89594e4937ae4ca89ccfa7b2f616b" alt=""
However, this shows that this real-life data does not exactly follow a normal distribution, since this is about as well as we could hope it would fit.
No comments:
Post a Comment