Linear Correlation Coefficient

From CheLabWiki

Jump to: navigation, search

…from CheLabWiki, an online resource for chemical-engineering laboratories located at www.chelabwiki.org; Site Revision #831; 6 January 2009.

Contents

It is conventional to quantify the linearity of x-y plots by computing a linear correlation coefficient, r, which is defined by


r \ \equiv \ \frac{\sum_i^N(x_i-x_m)(y_i - y_m)}{\sqrt{\sum_i^N (x_i - x_m)^2} \sqrt{\sum_i^N (y_i - y_m)^2}}


Here xm is the mean of the N values of the independent variable xi and ym is the mean of the N measured values of the dependent variable yi. Values of r lie on [—1, 1]. A value of r = 1 implies a perfectly straight line with positive slope; r = —1 indicates a straight line with negative slope; r = 0 implies no linearity in the relation between x and y.


A Value for |r| < 1 Does Not Imply a Unique Relation Between x and y

Data having values of | r | close to unity are more linear than data with | r | close to zero. However, many kinds of nonlinearities are possible and the coefficient r cannot distinguish among them.[1] For example, different kinds of nonlinearities can produce the same value for r, as in Figure 1. So at best, the linear correlation coefficient is only a crude measure of linearity.

Figure 1. The linear correlation coefficient only crudely measures nonlinearity. Here, for example, are three sets of data that differ in their nonlinearities, although all three sets have the same value of r = 0.950.
Figure 1. The linear correlation coefficient only crudely measures nonlinearity. Here, for example, are three sets of data that differ in their nonlinearities, although all three sets have the same value of r = 0.950.[1]


Weak Nonlinearity Does Not Imply Lack of Correlation

If the relation between x and y is nearly linear, then the value of r measures the strength of the correlation. As | r | decreases from 1 toward 0, the evidence for a linear correlation weakens; this is illustrated in Figure 2. However, when r = 0 we can only conclude that there is no linear correlation; x and y might be strongly correlated by some nonlinear relation; this is shown in Figure 3. In other words, r can only measure the strength of linear correlations; if your data are nonlinear, then try linearizing the data before computing r.

Figure 2. Strength of linear correlation as measured by the linear correlation coefficient r. Closed circles have r = +1; squares have r = 0.96; open circles have r = 0.90.
Figure 2. Strength of linear correlation as measured by the linear correlation coefficient r. Closed circles have r = +1; squares have r = 0.96; open circles have r = 0.90.
Figure 3. These six points are exactly correlated by a quadratic polynomial, even though the linear correlation coefficient r = 0.
Figure 3. These six points are exactly correlated by a quadratic polynomial, even though the linear correlation coefficient r = 0.

The lesson here is that if, before performing the experiment, you already know that a significant linear correlation exists between x and y, then the linear correlation coefficient r is a good way to measure its strength. However, if you do not know whether x and y are correlated, then r is a poor way to test for a correlation.[2]

Rather than use the linear correlation coefficient r to test for a correlation, it is usually better to rank the measured x and y values and then test for a correlation between the two rankings by computing the rank-order correlation coefficient ρ.

But we caution that the existence of a correlation does not necessarily imply existence of a causal connection; see Correlation vs Connection.

See Also

References

  1. 1.0 1.1 J. M. Haile, Analysis of Data, Macatea Productions, Central, SC, 2003. ISBN 0-9728602-0-7.
  2. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes, Cambridge University Press, Cambridge, 1986. ISBN 0521308119
Personal tools