Least Squares:Generalized Fits
From CheLabWiki
…from CheLabWiki, an online resource for chemical-engineering laboratories located at www.chelabwiki.org; Site Revision #762; 6 January 2009.
Community Portal → Data Analysis → Least Squares:Generalized Fits
|
The normal least-squares equations are used to fit a straight line to x-y data. Here we show how that procedure generalizes to many nonlinear relations between x and y. That is, the requirement for applying the normal equations is that the fitting function be linear in the unknowns (the slope and intercept): the relation between x and y need not be linear. This is another reason why we search for linearizing plots of our data.[1]
In general, we only need a relation of the form

To perform the fit, we still use the normal equations, but now we replace y with f(y) and x with g(x),
For example, if we find that a semilog plot linearizes the data, then the functional form is exponential,

We linearize by taking the log of both sides,

and we perform the least squares fit on ln(y) vs x. The results from that fit are the slope m and the intercept ln(A).
Example: Vapor Pressures of Water
To illustrate, we consider vapor pressures for water from its triple point to its critical point. On another page of this site we found that the data could be linearized as shown in Figure 1. The figure shows that the data approximately obey

This implies we are to perform a least-squares fit using y = ln(Ps) and x = 1/T. For Ps in bar and T in Kelvin, the normal equations yield the line shown in Figure 2,[1]

At 100°C = 373.15 K, the fitted correlation (5) gives a vapor pressure of 0.926 bar. But at this temperature, we know that the vapor pressure of water is 1.013 bar, so the correlation is in error by almost 9%. Note that Figure 5 is, in a real sense, deceptive: the logarithmic and reciprocal scales fail to give us a sense that the line and points disagree by 9% near 1/T = 0.0027.
A 9% error is understandable, considering that the fit was done over a temperature range of almost 375 K degrees. Whether or not a 9% error is tolerable depends on the use to be made of the fit. If it is unacceptable, then we must either
- find a nonlinear representation of the data or
- refit (4) to a more restricted range of temperatures.
Using Uncertainties to Guide Choice of Function to Fit
Since least-squares fits are relatively easy to perform, it is tempting to apply them indiscriminately; that is, it is often easy to simply choose a nonlinear function that appears to correlate the data, then force the fit by solving the least-squares problem. Such activities are common, but they are not sound engineering. Engineering requires judgment, and in correlating data, the judgment occurs in choosing the proper function and in assigning appropriate weights to the data. These decisions must be made before a least-squares calculation can be done.[1]
When choosing a function, we should be guided, not only by the pattern appearing on an x-y plot, but also by the uncertainties assigned to the measurements. For example, consider the five points plotted in the top panel of Figure 3. With five points we can compute values for five unknowns; hence, we can force a fourth-order polynomial to pass exactly through those five points. That curve is also shown at the top of Figure 3. But note that the polynomial must oscillate if it is to exactly reproduce the original data. (Such oscillations are common to high-order polynomial fits.) Unless we have information that supports the presence of such oscillations, we can only justify the least complex function that reproduces the data within their uncertainties. For the data and uncertainties in Figure 3, that function is a straight line, as at the bottom of the figure.
See Also
References
- ↑ 1.0 1.1 1.2 1.3 J. M. Haile, Analysis of Data, Macatea Productions, Central, SC, 2003. ISBN 0-9728602-0-7.


