Rank-Order Correlation Coefficient
From CheLabWiki
…from CheLabWiki, an online resource for chemical-engineering laboratories located at www.chelabwiki.org; Site Revision #843; 6 January 2009.
Community Portal → Data Analysis → Rank-Order Correlation Coefficient
|
Rather than use the linear correlation coefficient r to test for a correlation, it is usually better to rank the measured x and y values and then test for a correlation between the two rankings. The test is applied by computing the rank-order correlation coefficient ρ.
Procedure for Computing ρ
For a set of N pairs of measured (x, y) values, the procedure is as follows:
- 1. Sort the xi in either increasing or decreasing order and assign a rank Ri to each, i = 1, 2, . . . , N.
- 2. Likewise, sort the yi in the same order and assign a rank Qi to each of them.
| k | xk | Rk | |
|---|---|---|---|
| 1 | 7 | 4 | |
| 2 | 4 | (2+3)/2 = 2.5 | |
| 3 | 3 | 1 | |
| 4 | 4 | (2+3)/2 = 2.5 | |
| 5 | 8 | 5 |
| xk | Rk |
|---|---|
| 3 | 1 |
| 4 | 2.5 |
| 4 | 2.5 |
| 7 | 4 |
| 8 | 5 |
- 3. If any two or more x (or y) values are the same, give each the same rank computed as the mean of the ranks they would have if their values differed slightly.[1][2] This mean will be either an integer or a half integer. For example, consider the five x values in Table 1. Of these five, the value x = 4 appears twice. Those two values should have taken ranks 2 and 3, but since they are the same, they are both given the mean rank 2.5. When ordered by rank, the values appear as in Table 2.
- 4. Determine the mean of each ranking. Let Rm be the mean of the ranks Ri and let Qm be the mean of the ranks Qi. These means are always the same and are given by

- 5. Now test for a correlation between the ranks of x and y by computing the rank-order coefficient,

- 6. If x and y are correlated monotonically, then ρ ≅ 1; if not, then ρ ≅ 0.
Advantages to Using ρ
| N | 5% | 1% |
|---|---|---|
| 5 | 1.0 | . . . |
| 6 | 0.89 | 1.0 |
| 7 | 0.79 | 0.93 |
| 8 | 0.74 | 0.88 |
| 9 | 0.68 | 0.83 |
| 10 | 0.65 | 0.79 |
| 12 | 0.59 | 0.78 |
| 14 | 0.54 | 0.72 |
| 16 | 0.51 | 0.66 |
| 18 | 0.48 | 0.62 |
| 20 | 0.45 | 0.59 |
| 25 | 0.40 | 0.53 |
| 30 | 0.36 | 0.48 |
The first advantage to using ρ instead of the linear coefficient r is that, for ρ to apply, the relation between x and y need only be monotonic; it need not be linear. This suggests that ρ is more general than r.
The second advantage is that ρ is a more robust measure of correlation than is r: it is less sensitive to uncertainties in values measured for x and y. This is because the ranks are less sensitive. Consequently, the rank-order coefficient ρ is often unaffected by small statistical errors or modest changes in x or y.[2] Even systematic errors in x and y may not affect the rankings R and Q.
The third advantage is that we know the distributions for the ranks R and Q:[2] each is uniform on [1, N]. Therefore we can assign a significance to our computed ρ. If the significance is 5%, then the correlation between x and y is said to be significant; if it is 1%, then the correlation is said to be highly significant. The latter means that if the experiment were repeated 100 times, then in only one of those experiments would chance be responsible for the value computed for ρ. The significance depends on the number of measurements N, as well as on the value of ρ. Table 3 gives the minimum values for ρ needed for 5% and 1% significance.
Disadvantage to Using ρ
The principal disadvantage to using ρ is that we lose information when we replace x and y with their ranks [15]. Specifically, from ρ we can only learn whether or not x and y are correlated. If they are correlated, the ranks cannot help us identify a function that could model the relation. Nevertheless, in many situations, simply knowing whether or not a correlation exists can be valuable.
But we caution that the existence of a correlation does not necessarily imply existence of a causal connection; see Correlation vs Connection.
See Also
References
- ↑ J. R. Taylor, An Introduction to Error Analysis, University Science Books, Mill Valley, CA, 1982.
- ↑ 2.0 2.1 2.2 W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes, Cambridge University Press, Cambridge, 1986. ISBN 0521308119
- ↑ B. J. Underwood, C. P. Duncan, J. T. Spence, and J. W. Cotton, Elementary Statistics, Appleton-Crofts, New York, 1954.

