Residuals

From CheLabWiki

Jump to: navigation, search

…from CheLabWiki, an online resource for chemical-engineering laboratories located at www.chelabwiki.org; Site Revision #793; 6 January 2009.

Contents

Given a set of x-y data, let’s assume we have linearized the data and have fit a least-squares line to the linear form. Although we now have an approximate representation of the data, that representation will not be exact: there are errors in the measurements, and the relation between x and y may be more complicated than our (relatively) simple straight line. To explore these possibilities further, we need to amplify the remaining nonlinearities.[1] This is often done by computing residuals, which are the differences between the measured y-values and the corresponding fitted y-values:


(1)
\Delta y_i \ = \ y_{i,mea} - y_{i,fit}


Example of Residuals

Figure 1. Experimental data for product concentration c at discrete times t during a chemical reaction at 35°C.
Figure 1. Experimental data for product concentration c at discrete times t during a chemical reaction at 35°C.
Figure 2. Same data as in Figure 1 but on log-log axes; line is least-squares fit (2).
Figure 2. Same data as in Figure 1 but on log-log axes; line is least-squares fit (2).

To illustrate, consider Figure 1, which shows product concentration at discrete times during a chemical reaction. These data can be linearized by plotting on log-log axes, as in Figure 2. Assuming equally weighted points and fitting a power law, we obtain (with t in minutes and c in moles/liter)

Figure 3. Residuals computed from (1) using the points and line in Figure 2.
Figure 3. Residuals computed from (1) using the points and line in Figure 2.


(2)
c \ = \ 2.096\,t^{0.3375}


Using the data and line from Figure 2, we form the residuals, which are plotted in Figure 3. Note that since we fitted ln(c) to obtain the line in Figure 2, the residuals in Figure 3 are computed from


(3)
\Delta c \ = \ \ln{c_{mea}} - \ln{c_{fit}}


From a plot such as Figure 3, we seek to answer these kind of questions:[2]

  1. What is the range of the residuals?
  2. How are they distributed about zero?
  3. How do they change with x? Are there any patterns?
  4. Are some residuals unusual compared to the others?

Answers to these can help us judge the quality of the data and the reliability of the linearizing function; see Table 1.

Possible Interpretations of Residuals

Table 1. Possible interpretations of behavior of residuals when plotted against x, as in Figure 3. Note these are not definitive interpretations, but suggestions to help start your thinking.
Feature Behavior Possible Meaning
Range (a) wide range compared to mean y-value
(a) large errors in data or poor choice of linearizing function
Range (b) narrow range compared to mean y-value (b) small statistical errors, but systematic error could be present
Distribution (a) residuals randomly scattered about zero (a) error probably dominated by statistical errors, but constant systematic error could be present
Distribution (b) nonuniform scatter about zero (b) bias in least-squares fit due to unequal weights or error in solving least-squares problem
Distribution (c) systematic variation (c) systematic error in data or some nonlinearity not captured by linearizing function
Distribution (d) magnitudes of residuals small except at large x or small x or both (d) large systematic error at one or both extremes in x or linearizing function fails at extremes
Patterns (a) magnitudes of residuals increase with y (a) possible constant relative error; check fractional residuals
Patterns (b) residuals oscillate about zero (b) periodic systematic error or poor linearizing function
Unusual Magnitudes (a) a few residuals have relatively large magnitudes (a) possible outliers
Unusual Magnitudes (b) magnitudes of some residuals, closely grouped, differ from others (b) anomaly in measurement over small range of x values


Fractional Residuals

Figure 4. Fractional residuals (3) computed from the line (fit) and points (measured) in Figure 3.
Figure 4. Fractional residuals (3) computed from the line (fit) and points (measured) in Figure 3.

In Figure 3 the magnitudes of the residuals are increasing with time; hence, by Figure 2, they are increasing with concentration. This behavior is common; that is, the magnitudes of residuals often increase as the magnitude of the dependent variable increases. This may be caused by a constant relative error in the data. To test for this, you should compute fractional residuals,


(4)
\frac{\Delta y}{y} \ = \ \frac{y_{mea} - y_{fit}}{y_{mea}}


For the data in Figure 3, the fractional residuals are plotted in Figure 4. Compared with the absolute residuals in Figure 3, those in Figure 4 show a more uniform distribution about zero, implying a constant relative error. Further, Figure 4 suggests that the measurements at t = 20 and 30 minutes are outliers, but it is less certain whether that at 50 minutes is also one.

Use of Residuals to Aid Visual Interpretations

Plots of residuals can sometimes clarify behavior and trends that might be overlooked or misinterpreted from simple x-y plots. To illustrate, consider the x-y plot in Figure 5; in the figure, the line is an unweighted least-squares fit to the points. Comparing the points with the line, we see more scatter and larger deviations in the points at large x relative to those at small x values. However, the human eye has trouble judging vertical distances between points and a nearly vertical line (x < 2 in the figure). To compensate for this, we plot in Figure 6 the residuals between the points and the line. Figure 6 clearly shows that the points at small x are, in fact, farther from the line than are the points at large x.

Figure 5. Simulated x-y data with least-squares fit (line). Points at small x (x < 2) appear closer to the line than points at large x.
Figure 5. Simulated x-y data with least-squares fit (line). Points at small x (x < 2) appear closer to the line than points at large x.
Figure 6. Plot of the residuals (1) from Figure 5 showing that points at small x, in fact, deviate more from the line than points at large x.
Figure 6. Plot of the residuals (1) from Figure 5 showing that points at small x, in fact, deviate more from the line than points at large x.

See Also

References

  1. J. M. Haile, Analysis of Data, Macatea Productions, Central, SC, 2003. ISBN 0-9728602-0-7.
  2. J. W. Tukey, Exploratory Data Analysis, Addison-Wesley, Reading, MA, 1977. ISBN 0201076160
Personal tools