Real World Stats...What They Don't Teach You in a Traditional Statistics Course
One of our first semester courses for the M.S. in Business Analytics program is an introduction to statistics course. The 5 week intensive crash-course is designed to review undergraduate statistics material and some more advanced statistics from a applied/big data perspective. We have covered all the basics: probability distributions, discrete and continuous random variables, confidence intervals, and hypothesis testing. But there is one problem with most traditional statistics course: They are too normal.
Real world statistical problems are not normally distributed. With big data, we can often assume normality because sample sizes are large enough that distributions become near normal. Other times, data categorization can help us break down a data set into intervals that fit a normal distribution better.
Still, in some situations, we can't apply statistical theories based on normality without more complicated mathematics. The frequency histograms below represent one of these instances.
These histograms come from a data project I have been working on for the last several months. In data sets such as these, mean values are not very meaningful, so a normal distribution that focuses on population mean μ is not very helpful. At first glance, it seems like a Poisson or Exponential distribution might be a better fit. However, large gaps with low frequency after 0 have sent me in search of a better distribution. Because, as we all know well, real life is never normal.
Stay tuned for more on my real world stats journey.