Back to chapter

4.10:

Chebyshev’s Theorem to Interpret Standard Deviation

JoVE Core
Statistics
A subscription to JoVE is required to view this content.  Sign in or start your free trial.
JoVE Core Statistics
Chebyshev’s Theorem to Interpret Standard Deviation

Languages

Share

Chebyshev's theorem helps interpret the value of a standard deviation. It is applicable for almost all the datasets with normal, unknown, or skewed distributions.

In contrast, the empirical rule only applies to normally distributed data.

Consider the dataset of the lifespan of animals in a zoo, with a mean of 13 years and a standard deviation of 1.5 years.

According to Chebyshev's theorem, the proportion of animal ages within K standard deviations is at least one minus one divided by K squared. Here, K is any positive number greater than one.

For K equal to two, at least 75 percent of the animals' ages are within two standard deviations of the mean. Similarly, for K equal to three, at least 89 percent of the animal's ages fall within three standard deviations of the mean.

Although Chebyshev's theorem has wide statistical applications, it only provides lower limit approximations for standard deviations greater than one. It's important to note that  Chebyshev's theorem provides only approximations.

4.10:

Chebyshev’s Theorem to Interpret Standard Deviation

Chebyshev’s theorem, also known as Chebyshev’s Inequality, states that the proportion of values of a dataset for K standard deviation is calculated using the equation:

Equation1

Here, K is any positive integer greater than one. For example, if K is 1.5, at least 56% of the data values lie within 1.5 standard deviations from the mean for a dataset. If K is 2, at least 75% of the data values lie within two standard deviations from the mean of the dataset, and if  K is equal to 3, then at least 89% of the data values lie within three standard deviations from the mean of that dataset.

Interestingly, Chebyshev’s theorem estimates the proportion of data that will fall inside (minimum proportion) and outside (maximum proportion) a given number of standard deviations. If K is equal to 2, then the rule suggests a possibility that 75% of the data values lie inside two standard deviations from the mean and 25 % of the data value lie outside the two standard deviations away from the mean. It is important to understand that this theorem provides only approximations and not exact answers.

One of the advantages of this theorem is that it can be applied to datasets having normal, unknown, or skewed distributions. In contrast, the empirical or three-sigma rule can only be used for datasets with a normal distribution.