The Goodness-of-Fit Operator

The goodnessOfFit operator requires a summary function, relative frequency distribution or a sample vector as its left operand and a relational function as its right operand.

Most of the time the right operand will be the relational function =, although < or > may be used in certain cases. The left argument is an optional parameter list which applies only when the left operand is a distribution function. The right argument is a sample vector. There are several types of goodnessOfFit tests. These include:

ChiSquare goodnessOfFIt - used for discrete distributions
Kolmogorov Test - Continuous Distributions with parameters
Lillefors Test - Continuous Distributions without parameters; specifically normal and exponential
Smirnov Test - Comparing two samples with each other to determine if they come from the same distribution

The following flow chart shows how TamStat determines which type of test to perform:

The syntax of the goodnessOfFit operator is:

[ConfLevel] report [Parameters] distributionFunction|relativeFrequency|SampleVector goodnessOfFIt relationalFunction SampleVector

Some examples follow:

Chi-Square Goodnees-of-Fit Test

To test To test whether a sample is from a particular distribution, we perform a goodness-of-fit test. Let’s open a package of regular M&M’s and count the number of each color. Suppose there are 15 brown M&M’s, 13 yellow, 12 red, 32 blue 20 orange, and 16 green. First we create a list of colors:

'COLORS ← 'Brown,Yellow,Red,Blue,Orange,Green'

Then we create a list of the corresponding counts:

FREQ ← 15 13 12 32 20 16

We will test if the sample came from a uniform distribution; that is that the manufacturer produces the same number of each color:

report uniform goodnessOfFit = COLORS FREQ

─────────────────────────────────────────────────────
  Value   Observed  Expected  Difference  ChiSquare  
  Brown         15        18          -3    0.5      
  Yellow        13        18          -5    1.3889   
  Red           12        18          -6    2        
  Blue          32        18          14   10.889    
  Orange        20        18           2    0.22222  
  Green         16        18          -2    0.22222  
  Total        108       108           0   15.222    
                                                     
                                                     
                                                     
  H₀:  Uniform   H₁: not  Uniform                    
 ┌─────────────────┬───────────────────┐             
 │Test Statistic:  │P-Value:           │             
 │χ²=15.22222222   │p=0.00945          │             
 ├─────────────────┼───────────────────┤             
 │Critical Value:  │Significance Level:│             
 │χ²(α;df=5)=11.070│α=0.05             │             
 └─────────────────┴───────────────────┘             
  Conclusion:  Reject H₀                             
─────────────────────────────────────────────────────

We reject the null hypothesis since the p-Value is less than 0.05 and the test statistic is greater than the critical value. It is evident that the colors are not uniformly distributed.

M&M/Mars used to publish the proportions of each color on the internet. They were 13% brown 14% yellow, 13% red, 24% blue, 20% orange, and 16% green. To test if the M&M’s are still distributed this we say we perform a multinomial goodness of fit test. First we set the proportions, making sure that they total 1:

PROP ← 0.13 0.14 0.13 0.24 0.2 0.16

sum PROP

1

Then we run the test and display the report showing a much better fit:

report COLORS PROP multinomial goodnessOfFit = COLORS FREQ

─────────────────────────────────────────────────────
  Value   Observed  Expected  Difference  ChiSquare  
  Brown         15     14.04        0.96   0.065641  
  Yellow        13     15.12       -2.12   0.29725   
  Red           12     14.04       -2.04   0.29641   
  Blue          32     25.92        6.08   1.4262    
  Orange        20     21.6        -1.6    0.11852   
  Green         16     17.28       -1.28   0.094815  
  Total        108    108           0      2.2988    
                                                     
                                                     
                                                     
  H₀:  Multinomial   H₁: not  Multinomial            
 ┌─────────────────┬───────────────────┐             
 │Test Statistic:  │P-Value:           │             
 │χ²=2.298806132   │p=0.80644          │             
 ├─────────────────┼───────────────────┤             
 │Critical Value:  │Significance Level:│             
 │χ²(α;df=5)=11.070│α=0.05             │             
 └─────────────────┴───────────────────┘             
  Conclusion:  Fail to reject H₀                     
─────────────────────────────────────────────────────

Kolmogorov Test

Five children were selected from a class at random and timed in a short race. The times were 6.3, 4.2 4.7, 6 and 5.7 seconds. The previous race times were uniformly distributed between 4 and 8 seconds. Test whether the race time distribution has improved.

       report 4 8 rectangular goodnessOfFit < 6.3 4.2 4.7 6 5.7
────────────────────────────────────────
kolmogorov Test
i Value S(x)   F(x)      T+     T-
1    4.2   0.2 0.05    0.05   0.15 *
2    4.7   0.4 0.175 -0.025 0.225
3    5.7   0.6 0.425   0.025 0.175
4    6     0.8 0.5    -0.1    0.3
5    6.3   1    0.575 -0.225 0.425 *

   Mean: 5.38    Sample Size: N =   5

H₀:F(x)≥F*(x) H₁:F(x)<F*(x)
┌────────────────┬───────────────────┐
│Test Statistic: │P-Value:           │
│T=0.425         │p=0.12367          │
├────────────────┼───────────────────┤
│Critical Value: │Significance Level:│
│T(α)=0.509      │α=0.05             │
└────────────────┴───────────────────┘
Conclusion: Fail to reject H₀
────────────────────────────────────────

The report above shows that there has not been a significant improvement in race times, and that the current race times are still distributed uniformly between 4 and 8 seconds. Note that we use the “rectangular” distribution in TamStat which is the continuous analog of the discrete uniform distribution. Also note that the largest positive and largest negative differences are flagged in the report.

Automobile emissions from a previous year have been measured and were normally distributed with a mean of 5.6 and a standard deviation of 1.2. Twelve cars were randomly selected, and the following emissions measurements taken:

4.8, 6.2, 6.0, 5.9, 6.6, 5.5, 5.8, 5.9, 6.3, 6.6, 6.2, 5.0

Do the current emissions have the same distribution as the previous year?

X ← 4.8 6.2 6 5.9 6.6 5.5 5.8 5.9 6.3 6.6 6.2 5

     report 5.6 1.2 normal goodnessOfFit = X
──────────────────────────────────────────────────────
kolmogorov Test
i Value      S(x)     F(x)         T+         T-
1    4.8 0.083333 0.25249   0.25249   -0.16916
2    5    0.16667   0.30854   0.2252    -0.14187
3    5.5 0.25      0.46679   0.30013   -0.21679
4    5.8 0.33333   0.56618   0.31618   -0.23285 *
5    5.9 0.5       0.59871   0.18204   -0.098706
6    5.9 0.5       0.59871   0.18204   -0.098706
7    6    0.58333   0.63056   0.13056   -0.047225
8    6.2 0.75      0.69146   0.024796   0.058537
9    6.2 0.75      0.69146   0.024796   0.058537
10    6.3 0.83333   0.72017 -0.029834   0.11317
11    6.6 1         0.79767 -0.11899    0.20233 *
12    6.6 1         0.79767 -0.11899    0.20233

   Mean: 5.9    Sample Size: N =   12

H₀:F(x)=F*(x) H₁:F(x)≠F*(x)
┌────────────────┬───────────────────┐
│Test Statistic: │P-Value:          │
│T=0.3161839595 │p=0.14499          │
├────────────────┼───────────────────┤
│Critical Value: │Significance Level:│
│T(α)=0.375      │α=0.05             │
└────────────────┴───────────────────┘
Conclusion: Fail to reject H₀
──────────────────────────────────────────────────────

Lillefors Test

When the exact distribution is unknown, one can test whether the data come from a family of distributions. For example, if the data appear bell-shaped, we can test whether the sample comes from a normal distribution. The student survey contains the weights of students. Let us test whether the student weights are normally distributed:

   report normal goodnessOfFit = Weight
─────────────────────────────────────────────────────────────────────────────
Lillefors Test
i    Xi        Zi     S(Zi)     F(Zi)         T+          T-
1 100    -1.6545   0.026316 0.049017   0.049017 -0.022702
2 105    -1.5359   0.052632 0.062281   0.035965 -0.0096497
3 115    -1.2988   0.078947 0.097008   0.044376 -0.01806
4 115    -1.2988   0.10526   0.097008   0.01806    0.0082556
5 120    -1.1802   0.13158   0.11895    0.013689   0.012626
...............................................................
8 139.5 -0.71788 0.21053   0.23642    0.052206 -0.02589   *
...............................................................
22 165    -0.11325 0.57895   0.45492   -0.097716   0.12403   *
...............................................................
34 220     1.1908   0.89474   0.88314    0.014722   0.011594
35 225     1.3094   0.92105   0.9048    0.010064   0.016252
36 245     1.7836   0.94737   0.96276    0.041705 -0.015389
37 260     2.1393   0.97368   0.98379    0.036425 -0.010109
38 280     2.6135   1         0.99552    0.021835 0.0044811

   Mean: 169.7763158 Standard Deviation: 42.17478069    Sample Size: N =   38

H₀:normal H₁:not normal
┌────────────────┬───────────────────┐
│Test Statistic: │P-Value:           │
│T=0.1240315761 │p=0.16469          │
├────────────────┼───────────────────┤
│Critical Value: │Significance Level:│
│T(α)=0.156      │α=0.05             │
└────────────────┴───────────────────┘
Conclusion: Fail to reject H₀
────────────────────────────────────────────────────────────────────────────

Note when the number of observations is large, the report only displays the first and last 5 observations as well as the observations containing the largest and smallest differences.

Smirnov Test

A random sample of 9 packages from a delivery service is taken and each parcel is weighed. A random sample of 12 packages from another delivery service is taken and those packages are also weighed. Are the distributions of weights for each delivery service the same?

⍝ Weights for delivery service X

X ← 7.6 8.4 8.6 8.7 9.3 9.9 10.1 10.6 11.2

⍝ Weights for delivery service Y

Y ← 5.2 5.7 5.9 6.5 6.8 8.2 9.1 9.8 10.8 11.3 11.5 12.3 12.5 13.4 14.6

    report X goodnessOfFit = Y
────────────────────────────────────────
smirnov Test
i    X    Y S1        S2     S1-S2
1 0    5.2   0 0.066667 0.066667
2 0    5.7   0 0.13333   0.13333
3 0    5.9   0 0.2       0.2
4 0    6.5   0 0.26667   0.26667
5 0    6.8   0 0.33333   0.33333
......................................
18 11.2 0     1 0.6       0.4      *
......................................
20 0   11.5   1 0.73333   0.26667
......................................
21 0   12.3   1 0.8       0.2
22 0   12.5   1 0.86667   0.13333
23 0   13.4   1 0.93333   0.066667
24 0   14.6   1 1         0        *
24 0   14.6   1 1         0        *

Sample Size: N = 9 M = 15

H₀:F(x)=G(x) H₁:F(x)≠G(x)
┌────────────────┬───────────────────┐
│Test Statistic: │P-Value:           │
│T=0.4           │p=0.33060          │
├────────────────┼───────────────────┤
│Critical Value: │Significance Level:│
│T(α)=0.573      │α=0.05             │
└────────────────┴───────────────────┘
Conclusion: Fail to reject H₀
────────────────────────────────────────

It appears that the distribution of weights for the delivery services are the same.