Back to chapter

11.2:

Coefficient of Correlation

JoVE Core
Statistics
A subscription to JoVE is required to view this content.  Sign in or start your free trial.
JoVE Core Statistics
Coefficient of Correlation

Languages

Share

Consider the height and weight of 5 athletes. As the height of athletes increases, their weight also increases. So, height and weight are positively correlated.

The scatter plot of the athlete's weight vs. height shows a linear pattern, which needs to be confirmed using a quantitative measure. 

The linear correlation coefficient, denoted by r, provides a quantitative measure of the strength of such a linear correlation between two variables.

For such a dataset with n scatter points whose x and y values are known, r can be calculated.

The value of r always lies between -1 and +1. The higher the modulus of r, the stronger the correlation between the variables.

If the value of x or y is swapped, or one of the variables is converted to a different scale, the value of is not affected.

The coefficient of correlation is strongly affected by outliers. Hence, if such data points are known to be errors, they can be removed to improve the accuracy of the value of r.

11.2:

Coefficient of Correlation

The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of strength and direction of the linear association between the independent variable x and the dependent variable y.

If you suspect a linear relationship between x and y, then r can measure how strong the linear relationship is.

What the VALUE of r tells us:

The value of r is always between –1 and +1: –1 ≤ r ≤ 1.

The size of the correlation r indicates the strength of the linear relationship between x and y. Values of r close to –1 or to +1 indicate a stronger linear relationship between x and y.

If r = 0, there is likely no linear correlation. It is important to view the scatterplot because data that exhibit a curved or horizontal pattern may have a correlation of 0.

If r = 1, there is a perfect positive correlation. If r = –1, there is a perfect negative correlation. In both these cases, all of the original data points lie in a straight line. Of course, in the real world, this will not generally happen.

What the SIGN of r tells us

A positive value of r means that when x increases, y tends to increase, and when x decreases, y tends to decrease (positive correlation).

A negative value of r means that when x increases, y tends to decrease, and when x decreases, y tends to increase (negative correlation).

The sign of r is the same as the sign of the slope, b, of the best-fit line.

This text is adapted from Openstax, Introductory Statistics, Section 12.3, The Regression Equation