Residuals
From CheLabWiki
…from CheLabWiki, an online resource for chemical-engineering laboratories located at www.chelabwiki.org; Site Revision #793; 6 January 2009.
Community Portal → Data Analysis → Residuals
|
Given a set of x-y data, let’s assume we have linearized the data and have fit a least-squares line to the linear form. Although we now have an approximate representation of the data, that representation will not be exact: there are errors in the measurements, and the relation between x and y may be more complicated than our (relatively) simple straight line. To explore these possibilities further, we need to amplify the remaining nonlinearities.[1] This is often done by computing residuals, which are the differences between the measured y-values and the corresponding fitted y-values:

Example of Residuals
To illustrate, consider Figure 1, which shows product concentration at discrete times during a chemical reaction. These data can be linearized by plotting on log-log axes, as in Figure 2. Assuming equally weighted points and fitting a power law, we obtain (with t in minutes and c in moles/liter)

Using the data and line from Figure 2, we form the residuals, which are plotted in Figure 3. Note that since we fitted ln(c) to obtain the line in Figure 2, the residuals in Figure 3 are computed from

From a plot such as Figure 3, we seek to answer these kind of questions:[2]
- What is the range of the residuals?
- How are they distributed about zero?
- How do they change with x? Are there any patterns?
- Are some residuals unusual compared to the others?
Answers to these can help us judge the quality of the data and the reliability of the linearizing function; see Table 1.
Possible Interpretations of Residuals
| Feature | Behavior | Possible Meaning |
|---|---|---|
| Range | (a) wide range compared to mean y-value | (a) large errors in data or poor choice of linearizing function |
| Range | (b) narrow range compared to mean y-value | (b) small statistical errors, but systematic error could be present |
| Distribution | (a) residuals randomly scattered about zero | (a) error probably dominated by statistical errors, but constant systematic error could be present |
| Distribution | (b) nonuniform scatter about zero | (b) bias in least-squares fit due to unequal weights or error in solving least-squares problem |
| Distribution | (c) systematic variation | (c) systematic error in data or some nonlinearity not captured by linearizing function |
| Distribution | (d) magnitudes of residuals small except at large x or small x or both | (d) large systematic error at one or both extremes in x or linearizing function fails at extremes |
| Patterns | (a) magnitudes of residuals increase with y | (a) possible constant relative error; check fractional residuals |
| Patterns | (b) residuals oscillate about zero | (b) periodic systematic error or poor linearizing function |
| Unusual Magnitudes | (a) a few residuals have relatively large magnitudes | (a) possible outliers |
| Unusual Magnitudes | (b) magnitudes of some residuals, closely grouped, differ from others | (b) anomaly in measurement over small range of x values |
Fractional Residuals
In Figure 3 the magnitudes of the residuals are increasing with time; hence, by Figure 2, they are increasing with concentration. This behavior is common; that is, the magnitudes of residuals often increase as the magnitude of the dependent variable increases. This may be caused by a constant relative error in the data. To test for this, you should compute fractional residuals,

For the data in Figure 3, the fractional residuals are plotted in Figure 4. Compared with the absolute residuals in Figure 3, those in Figure 4 show a more uniform distribution about zero, implying a constant relative error. Further, Figure 4 suggests that the measurements at t = 20 and 30 minutes are outliers, but it is less certain whether that at 50 minutes is also one.
Use of Residuals to Aid Visual Interpretations
Plots of residuals can sometimes clarify behavior and trends that might be overlooked or misinterpreted from simple x-y plots. To illustrate, consider the x-y plot in Figure 5; in the figure, the line is an unweighted least-squares fit to the points. Comparing the points with the line, we see more scatter and larger deviations in the points at large x relative to those at small x values. However, the human eye has trouble judging vertical distances between points and a nearly vertical line (x < 2 in the figure). To compensate for this, we plot in Figure 6 the residuals between the points and the line. Figure 6 clearly shows that the points at small x are, in fact, farther from the line than are the points at large x.
See Also
References
- ↑ J. M. Haile, Analysis of Data, Macatea Productions, Central, SC, 2003. ISBN 0-9728602-0-7.
- ↑ J. W. Tukey, Exploratory Data Analysis, Addison-Wesley, Reading, MA, 1977. ISBN 0201076160







