Linear Correlation Coefficient
From CheLabWiki
…from CheLabWiki, an online resource for chemical-engineering laboratories located at www.chelabwiki.org; Site Revision #831; 6 January 2009.
Community Portal → Data Analysis → Linear Correlation Coefficient
|
It is conventional to quantify the linearity of x-y plots by computing a linear correlation coefficient, r, which is defined by

Here xm is the mean of the N values of the independent variable xi and ym is the mean of the N measured values of the dependent variable yi. Values of r lie on [—1, 1]. A value of r = 1 implies a perfectly straight line with positive slope; r = —1 indicates a straight line with negative slope; r = 0 implies no linearity in the relation between x and y.
A Value for |r| < 1 Does Not Imply a Unique Relation Between x and y
Data having values of | r | close to unity are more linear than data with | r | close to zero. However, many kinds of nonlinearities are possible and the coefficient r cannot distinguish among them.[1] For example, different kinds of nonlinearities can produce the same value for r, as in Figure 1. So at best, the linear correlation coefficient is only a crude measure of linearity.
Weak Nonlinearity Does Not Imply Lack of Correlation
If the relation between x and y is nearly linear, then the value of r measures the strength of the correlation. As | r | decreases from 1 toward 0, the evidence for a linear correlation weakens; this is illustrated in Figure 2. However, when r = 0 we can only conclude that there is no linear correlation; x and y might be strongly correlated by some nonlinear relation; this is shown in Figure 3. In other words, r can only measure the strength of linear correlations; if your data are nonlinear, then try linearizing the data before computing r.
The lesson here is that if, before performing the experiment, you already know that a significant linear correlation exists between x and y, then the linear correlation coefficient r is a good way to measure its strength. However, if you do not know whether x and y are correlated, then r is a poor way to test for a correlation.[2]
Rather than use the linear correlation coefficient r to test for a correlation, it is usually better to rank the measured x and y values and then test for a correlation between the two rankings by computing the rank-order correlation coefficient ρ.
But we caution that the existence of a correlation does not necessarily imply existence of a causal connection; see Correlation vs Connection.
See Also
References
- ↑ 1.0 1.1 J. M. Haile, Analysis of Data, Macatea Productions, Central, SC, 2003. ISBN 0-9728602-0-7.
- ↑ W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes, Cambridge University Press, Cambridge, 1986. ISBN 0521308119



