Least Squares Fits:Comments
From CheLabWiki
…from CheLabWiki, an online resource for chemical-engineering laboratories located at www.chelabwiki.org; Site Revision #756; 6 January 2009.
Community Portal → Data Analysis → Least Squares Fits:Comments
Contents |
Test for Correct Fit
One of the normal equations forces the sum of the (weighted) deviations to be exactly zero (within round-off error),

This provides a convenient check on whether the least-squares calculations have been done correctly. Note that (1) is a necessary but not sufficient test for correctness.[1]
Increasing Number of Points to Fit
The least-squares line has a smaller sum of squares of deviations than any other line that could be drawn through the given N points. However, if you perform more measurements, increasing the number of points, then your calculated line may not be the best representation of all the data: the least-squares solution should be recomputed using all the data.
Uncertainties in the Slope and Intercept
Once the fit has been performed, then the procedure for determining the propagation of uncertainties can be applied to find the uncertainties in the fitted slope and intercept.[2] The results of that procedure give the uncertainty in the slope as

and the uncertainty in the intercept as

Here, S, Sxx, and Δ are defined from the derivation of the normal equations as


From these we can show that both uncertainties are inversely proportional to the root of the number of measurements N,

This is an example of the law of large numbers: if we want to decrease the uncertainties in the slope and intercept by a factor of two, then we must increase the number of measurements by a factor of four.
Proper Use of Linear Correlation Coefficient, r
In reporting the least squares line, many people also report the value of the linear correlation coefficient r. But note that the definition for r contains nothing about the least-squares line; hence, r can be computed before the fit is done. In this way, r can be used to help judge whether you have adequately linearized the data.[1]
Ends of Curves and Outliers
When all points are weighted equally, the least-squares procedure tends to place more emphasis on points at the ends of the range of the fit. But in many experiments, extreme values are measured less reliably than other values. Therefore, it may be important to give special attention to extreme values and weight their deviations accordingly.
Similar comments apply to outliers. Outliers have anomalously large deviations, and since least squares is trying to minimize the sum of squares of deviations, the least-squares calculation tends to unduly emphasize the importance of outliers. So if you have a probable outlier, but can’t reach a decision to ignore it completely, then consider weighting it less than other points.
See Also
References
- ↑ 1.0 1.1 J. M. Haile, Analysis of Data, Macatea Productions, Central, SC, 2003. ISBN 0-9728602-0-7.
- ↑ W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes, Cambridge University Press, Cambridge, 1986. ISBN 0521308119

