Repeated Measurements

From CheLabWiki

Jump to: navigation, search

…from CheLabWiki, an online resource for chemical-engineering laboratories located at www.chelabwiki.org; Site Revision #422; 6 January 2009.


Contents

Table 1. Ten measured values for flow rate of water through a pipe.
Run # Flow Rate Run # Flow Rate
1 5.5 6 5.6
2 5.85 7 5.75
3 5.55 8 5.65
4 5.8 9 5.4
5 5.9 10 5.7

We repeat measurements to quantify uncertainties, but other information can also be extracted from repeated measurements. The activities introduced here are simple but they are not unimportant. We illustrate using the following example:

The steady flow of water through a pipe has been measured ten times; the results in gallons per minute (gpm) are given in Table 1. We explore these data by determining three characteristics: central values, variability, and distribution.

Central Values

Central values indicate the magnitudes of individual measurements. Two kinds of central values are in common use: the mean and the median. For the data in Table 1, the mean (a.k.a. the average) is found to be 5.67 gpm. To obtain the median, it is helpful to list the values in increasing order:

5.4, 5.5, 5.55, 5.6, 5.65, 5.7, 5.75, 5.8, 5.85, 5.9

For an odd number of measurements, the median is the one in the middle. For an even number, it is the average of the two in the middle. So for our ten values, we find

(1)

median = \tfrac{1}{2}(5.65 + 5.7) = 5.68 gpm


This median (5.68) is close to the mean (5.67), suggesting that the distribution is roughly symmetric about its center. If the two had not been close, then the distribution would be asymmetric.

Note that the median is less sensitive to small changes in the measurements than is the mean. For example, if the largest value had been 10.0 rather than 5.9, the median would not have changed, but the mean would have increased. For this reason, the median is said to be a more robust measure of the center than the mean.

Variability

We have two simple measures of variability; one is the range, which is the difference between the largest and smallest values:


(2)

range = 5.9 – 5.4 = 0.5 gpm


A narrow range centered near the median suggests precise measurements symmetrically distributed about the central value. For the data in Table 1, the range is about 9% of the median, and the center of the range is 5.65 gpm, which is close to the median at 5.68 gpm.

Table 2. Volumetric flow rates y from Table 1, including devia- tions and squares of deviations from their mean, ym = 5.67
Run # y(gpm) δy(gpm) (δy)2(gpm)2
1 5.5 –0.17 0.029
2 5.85 0.18 0.032
3 5.55 –0.12 0.014
4 5.8 0.13 0.017
5 5.9 0.23 0.053
6 5.6 –0.07 0.005
7 5.75 0.08 0.006
8 5.65 –0.02 0.0004
9 5.4 –0.27 0.073
10 5.7 0.03 0.0009

Variability is also measured by the standard deviation, defined by


(3)
s \ = \ \frac {1}{\sqrt{N-1}} \sqrt{ \sum_i^N (y_i - y_m)^2 }


For our flow example, calculation of the standard deviation is outlined in Table 2, which contains the deviation from the mean and squared deviation for each measured value. Note that the deviations sum to zero, while the squared deviations sum to 0.23. Then from (3), we find s = (0.23/9)1/2 = 0.16 gpm, which is 3% of the mean.

The range is a more robust measure of variability than is the standard deviation; however, the range provides less information. The standard deviation measures the dispersion of values about their mean: a small s implies clustering of values near the mean. Because contributions to the sum in s are squared, the value for s is strongly influenced by those data that fall far from the mean.

Frequency Distribution

Figure 1. Examples of common frequency distributions (uniform, normal, bimodal, and skewed), shown here (in each case) for ten measurements of the same quantity.
Figure 1. Examples of common frequency distributions (uniform, normal, bimodal, and skewed), shown here (in each case) for ten measurements of the same quantity.[1]

As a third characteristic, we plot the frequency distribution: a histogram for the numbers of times each value appears in the set of measurements. We want to know the general shape of this distribution; that is, whether it is approximately normal (Gaussian), uniform, clustered, skewed, etc. Examples are shown in Figure 1. We also want to identify possible outliers: values that seem to be far removed from the others. Outliers may be artifacts—values caused by some blunder in the measurement. Or they may be real, caused by some unsuspected phenomenon or by a heightened sensitivity to a manipulated or controlled variable.

For the flow rates in Table 2, the distribution is roughly uniform, with a possible outlier:

To check whether the point at 5.4 is indeed an outlier, we apply the one-point test: does the removal of that one point significantly change our impression of the distribution, its mean, or its median? Ignoring that point, the mean of the nine remaining points is 5.7 and their median is also 5.7: the mean, median, and overall distribution are little affected. This suggests that the point is not an outlier; an additional test for outliers is discussed on the page for discarding data.

Note that the shape of a frequency distribution is influenced by the width of the bins used to construct the histogram. In other words, it is affected by the number of significant figures used for the data. For example, the histogram shown above is based on three significant figures for the data in Table 2. But if we round to two significant figures, then the frequency distribution appears like this:



This is an example of coarse-graining, and we must be aware that our interpretations are influenced by the scale at which we make observations. The simple tests applied to repeated measurements are summarized in the following table.


Table 3. Simple tests to apply to repeated measurements of the same quantity.[1]
Test Implication
Compare median with mean Median not near mean suggests strong asymmetry in the distribution.
Compare range with median Small range suggests high precision, large range suggests low precision.
Compare standard deviation with mean Small standard deviation suggests tight clustering about mean.
Compare range with standard deviation Range much larger than standard deviation suggests presence of an outlier.
Plot and identify the distribution Strong clustering or skewedness suggests something changed during the measurements.
Identify any outliers Outliers might be caused by blunders in performing some measurements.

Reference

  1. 1.0 1.1 J. M. Haile, Analysis of Data, Macatea Productions, Central, SC, 2003. ISBN 0-9728602-0-7.
Personal tools