For Whom The Bell Curves
Metrics and analytics are the foundation of the statistical analysis of website data. Once applicable metrics have been established and a reliable and accurate analytic package chosen, one can begin to make inferences from the data every website generates. While most people in the eCommerce trade are aware of popular metrics such as conversion rate, average order value and value/click, the ability to distinguish among random, yet expected, metric fluctuations, signs of impending doom or unheralded success remains a more closely guarded secret. Every website will encounter variance in their reporting metrics, but determining whether the variance is acceptable or not is the key to eCommerce data analysis.
Let’s examine the simplest of metrics first: the number of website visitors. This is a metric even the most lay of eCommerce people are aware of; and what many place far too much stock in. The easiest thing in the world is bringing more visitors to a website; the difficult part of that is bringing qualified visitors to a website, but I digress. What we wish to examine in this article is how to determine if the daily or weekly change in visitors is the product of normal data distribution patterns or a signal that something on the website has changed your expected distribution.
In order to make this determination we are going to rely on a few long lost high school friends; the bell curve and standard deviation. Everyone remembers the days of teachers grading ‘on a curve,’ well, this ‘curve’ is the bell curve which is constantly used in statistical analysis. Let’s start by looking at the bell curve minus the statistical explanations.

As we can see, the bell curve shows that data points, when normally distributed, will follow a classic pattern. Using the high school analogy again, extremely unsuccessful (Grade F) and extremely successful (Grade A) outcomes will occur less frequently than very unsuccessful (Grade D) and very successful (Grade B) outcomes. The most common outcome will be somewhat successful (or somewhat unsuccessful) outcomes or the classic Grade C, or average outcome. But so far you haven’t learned anything you don’t already know, so how is any of this useful in the statistical analysis of websites?
To begin applying the bell curve to website data, we have to talk about a few statistical formulas, namely, variance and standard deviation. Variance is the mathematical/statistical measure of how data is distributed. Let’s look at an extremely simple example to see what variance is and how it can be used as a statistical tool.
Let’s assume our high school math class has 5 students who scored the following scores on a recent pop quiz {70, 73, 76, 79, 82}. To illustrate how to calculate variance let’s look at a table that shows how far each grade is from every other grade in the set:

Now let’s square the differences to get to all positive numbers and add up the values of each row, and then the sum of all rows:

Now we simply want to take ½ of the sum of the square of the differences (900) and divide by the number of elements in the set (20, we calculated 20 differences).
Hence (0.5*900)/20 = 22.5 = Variance of the set {70, 73, 76, 79, 82}.
So what does variance tell us? Basically variance is illustrating how tightly bunched the numbers in the set are to each other. A low variance indicates that the numbers in the set are tightly bunched, while a high variance would indicated that the numbers in the set are much more diverse or spaced out. Let’s illustrate this a bit more clearly.
Our original set of data {70, 73, 76, 79, 82} has an average or mean of 76. Let’s look at a second group of data with the same mean, namely [64,70,76,82,88}. Again we have an average, or mean, of 76. But let’s calculate the variance of this set of data.

Square of differences

Variance = (0.5*3600)/20 = 90
So while both groups have the same average/mean of 76, the first group has a variance of 22.5 while the second group has a variance of 90. This agrees with our definition of variance as the first group of data is tightly bunched, while the second set of data is more spread out.
Now let’s take the Bell Curve at full speed. The other statistical measurement we wanted to look at was standard deviation. Standard Deviation (often denoted SD or by the lower case Greek letter sigma “σ”) is simply the square root of Variance. Standard deviation is the tool that will help us identify when data points are unusually out of line as compared to data points that are simply fluctuating within acceptable amounts. In order to illustrate how Standard Deviation accomplishes this, we need to take a second look at the Bell Curve, this time with a small amount of statistics applied.

Remembering that σ denotes SD or Standard Deviation, we see the Bell Curve now divided into SD sections and u denotes the arithmetic mean. When data is normally distributed, 68.2% of all data points should fall between plus or minus 1 SD from the arithmetic mean. 95.4% of all data points should fall between plus/minus 2 SD of the mean, & 99.6% of all data points should fall within plus/minus 3 SD of the mean.
So going back to our 2 sets of data:

In both sets of data, 3 elements of the 5 grades fall between -1SD and 1SD. One element falls between -2SD and -1SD, & one element falls between 1SD and 2SD. Reverting back to the first diagram of the Bell Curve we can conclude that 73,76,79 from data set 1 are acceptable fluctuations, while the score 70 & 82 are slightly unusual. In data set 2, {70, 76, 82} are acceptable fluctuations, while 64 & 88 are slightly unusual scores. The extremely small data set size and the attempt to make these examples relatively simple has precluded the existence of highly unusual data points. However let’s look at a third data set to view these statistics in a more real world example.
Let’s say the scores of a Math Quiz are the following set:
{18,43,49,56,68,75,77,79,81,83,85,89,92,99}
Now instead of doing the giant table of differences and their squares, we use a tool everyone has at their disposal: Excel formulas! Excel has the formulas for Variance and Standard Deviation so it’s a simple matter of plug and chug to get the numbers we want. Both Variance & Standard Deviation formula are constructed the same way a Sum formula is done, so simply put your values in a column or row and Var(a1:a13) & STDEV(a1:a13) to get the answers.
In this case our answers are:
Mean/Avg = 71
Variance = 493.54
Standard Deviation = 22.22
-3SD = 4.34
-2SD = 26.56
-1SD = 48.78
u = 71
1SD = 93.22
2SD = 115.44
3SD = 137.66
So in our set,
- 1 element (18) falls between -3SD & -2SD about 7% of the data
- 2 elements (43,49) fall between -2SD & -1SD AND 1 element (99) falls between 1SD &2SD about 21.4% of the data
- 10 elements (56,68,75,77,79,81,83,85,89,92) about 71% of the data
The Empirical Rule is 68-95-99.7 which states what percentage of data should fall within 1SD, 2SDs, & 3SDs respectively. In this example, 71% falls within 1SD, 92% within 2 SD, & 100% within 3 SD.
Our conclusions from this set of grades should be:
- The student who scored 18 has serious issues as this is an extremely unsuccessful performance. This should be acted on and analyzed immediately.
- The students who score 43, 49 should be watched closely to see if this was just an aberration or a sign of problems. While the student scoring 99 has either mastered the material or could be cheating. These should be watched.
- Students with scores (56,68,75,77,79,81,83,85,89,92) are within the normal expected fluctuations of data. No actions need to be taken at this time.
Now that you understand Variance, Standard Deviation, & the Bell Curve, try applying these statistics to your daily or weekly metrics. By utilizing these statistical formulas and measurements, you can quickly identify days or weeks when the data is “out of whack”. Perhaps more importantly, these formulas can help identify normal fluctuation in data/metrics that will save you a lot of time spent trying to explain a drop or rise in a metric that is simply the case of normal distribution rather than any sign of impending disaster. By being able to separate true issues from random fluctuations you can utilize your analytical time more wisely and quickly identify true problems when they arise.
Post a Comment
About WebLinc
WebLinc is a leading provider of custom eCommerce solutions, web marketing, design and development. We have offices in Philadelphia and New York City. Learn more about our work and what we do at WebLinc.com.
Recent Posts
- Google Analytics Advanced Segmentation: The Allegory of the Cave By Joe Devlin
- Analyzing Social Network Traffic By Matt Slusser
- Landing Pages & Land Mines By Joe Devlin
- Google Ad Manager Beta- A Free Ad Server the Google Way By Joe Devlin
- Benchmarking Service Added to Google Analytics By Joe Devlin
Recent Comments
- Alexander Higgins commented on Benchmarking Service Added to Google Analytics
- Pete commented on E-commerce Photography - 5 Tips for Better Product Photos
- Tim commented on E-commerce Photography - 5 Tips for Better Product Photos
- Scott Brinker commented on Landing Pages & Land Mines
- bobby commented on AOL Disables Email Images










Comments