Chapter 1: Research Methods

Back to chapter

1.18:

变量：正态分布、范围和标准偏差

JoVE Core
Social Psychology

A subscription to JoVE is required to view this content. Sign in or start your free trial.

JoVE Core Social Psychology

Variation: Normal Distribution, Range, and Standard Deviation

Previous Video
1.17: Measures of Central Tendency

Next Video
1.19: Statistical Significance

Languages

Share

English العربية 中文 Nederlands français Deutsch עברית italiano 日本語 한국어 português русский español Türkçe

During a data collection project, a student is interested in gathering the heights of adult men in their city. Upon returning to the classroom, the pupils graph the frequency of heights in the sample population. The resulting curve is bell-shaped, with a single peak at the center of which lies the mean. While a single data point, such as the mean, is crucial to the analysis of these results, so too is the variation. Defined as the dispersion of measurements within a data set, this quantity describes the spread of results, giving a sense of the distance between points. Additionally, the graph is symmetric, with half of the individuals demonstrating a stature taller than and half shorter than the average. This is referred to as a normal distribution or curve. To evaluate the variation, they first calculate the range of the results, which is the difference between the highest and lowest heights. Although the range describes the spread of data, it can be dramatically affected by outliers—like the school’s tallest basketball player—and doesn’t elucidate how measurements are positioned around the mean. To address this, the student uses an equation to compute a second gauge of variation, termed the standard deviation—the average amount that measurements differ from the mean. Here, the standard deviation is two-and-a-half inches, so males are—on average—two-and-a-half inches shorter or taller than the mean. Based on the properties of a normal distribution, within this one negative and positive standard deviation, 68% of individuals will fall. This number will increase to 95% for two standard deviations—here five inches above or below the average height—and 99.7% for three standard deviations. Importantly, the lower the standard deviation, the more tightly results cluster around the mean, which produces a tall and narrow normal curve. So, if a data set has a small standard deviation, it will have low variation. Thus, means for measurements with low variability are more likely to be a reliable representation of the sample population than those derived from results with high variation, which may be disproportionally affected by outliers.

1.18:

变量：正态分布、范围和标准偏差

In the field of psychology, there are several ways to organize measurements of a trait, feature, or characteristic (i.e., variables). Qualitative data, such as ethnicity, can be tabulated into a frequency count to provide information about the proportion, as well as the variety of groups in a sample or population. On the other hand, researchers can perform a wider set of calculations on quantitative data. The mean, mode, and median, for instance, are central tendency measures to identify a typical value of a variable within a given numerical data set. Likewise, there are also a few approaches to estimating the distance of scores from each other, referred to as variability or variation, including range, variance, and standard deviation.

Range

The range calculates the distance or difference between only the highest and lowest scores of a variable but provides no details about the scores in between. A high value denotes a wider spread of scores, but outliers may result in misinterpretations. For these reasons, the range is considered a less precise method to measure variation.

Variance

Researchers typically use variance to estimate the average distance of all scores in the sample or population around the mean. First, the mean is determined by dividing the sum of all the raw scores of a specific variable by the total number of scores in the sample. Subtracting the mean from each of the raw scores produces a set of deviation scores that will comprise of both positive as well as negative integers, depending on whether the scores are higher or lower than the mean. Attempting to compute the mean of deviation scores will be insufficient, because the positive integers will cancel negative integers leading to a sum of zero. Squaring the deviations converts the negative deviation scores into positive scores, while still providing a reasonable estimate of the distance between the mean and each data point. Totaling the squared deviations forms the sum of squares (SS).

The SS is divided by either the total number of data points (N) or degrees of freedom (N-1), if the variance is computed for a sample or estimated for a population of scores, respectively. Dividing the sum of squared deviations provides an aggregate estimate of the general distance between the scores and the mean.

Standard Deviation

The square root of the variance is the standard deviation. This arithmetic step serves to counterbalance the squaring of deviations in the preceding step of the variance formula. The standard deviation not only describes the general spread of scores in a population or sample set, but it is also used to assess the distance between a particular score from the mean. If the scores follow a normal curve, the location of the score relative to the center of the curve can relate to its likelihood of occurrence (probability).

Implications of Variation

A reduced range of scores in a sample or population corresponds to a decrease in variance. For example, data from females exhibit a low spread in characteristics such as verbal performance, math performance, and height when compared with males. In these cases, understanding sources of the variation among males, such as environmental or biological factors, is as important as recognizing between-group differences.

Tags

Variation Normal Distribution Range Standard Deviation Data Collection Heights Adult Men Sample Population Bell-shaped Curve Mean Dispersion Spread Of Results Distance Between Points Symmetric Graph Normal Distribution Or Curve Highest And Lowest Heights Outliers Basketball Player Average Amount That Measurements Differ From The Mean