Class Lilliefors

java.lang.Object
  extended by Lilliefors

public class Lilliefors
extends java.lang.Object

Lilliefors - a Java utility to perform a test for normality on a data set.

The Lilliefors test is an extension the Kolmogorov-Smirnov test for normality. Both are tests of the null hypothesis that a sample of data points is normally distributed.

The K-S test is used when the population mean and variance are known or assumed. The Lilliefors test is used when the population mean and variance are not known but are estimated from the sample data.

The procedure for calculating the test statistic is the same for both tests, except that the Lilliefors test uses the sample mean and sample standard deviation in building the observed (i.e., sample) distribution against which the expected (i.e., normal) distribution is compared. Because of this, the test statistic obtained with the Lilliefors procedure is more conservative, thus leading to a greater likelihood of accepting the null hypothesis. To compensate for this and to maintain the desired alpha, the Lilliefors test uses a more stringent set of critical values in assessing the test statistic.

As with the chi-square goodness-of-fit test, the Lilliefors test compares observed and expected values. However, the comparisons are not between the counts or frequencies among nominal-scale attributes of people, systems, interaction techniques, or such. Instead, the comparisons are between the observed and expected cumulative frequencies in two distributions. The expected distribution is the normal distribution, as represented in standardized z scores. The observed distribution is developed from the sample data.

In the image below – gleaned from Wikipedia (click here) – the expected cumulative frequency (ECF) distribution is shown in red while the observed cumulative frequency (OCF) distribution is shown in blue.

The test statistic (shown in black) is the largest vertical difference between the ECF and the OCF. If it exceeds a threshold – the critical value – the null hypothesis of normality is rejected. If the test statistic is less than the critical value, the null hypothesis is retained.

As an example, consider an experiment where participants interact with a touchscreen device. The experiment is called FittsTouch and engages 16 participants to perform touch-based target selection tasks. Each task (aka trial) involves moving a finger toward a target and tapping it. The trials are organized as sequences of 20 selections, with each sequence corresponding to a test condition. There is an assumption that the selection coordinates along the axis of approach to the target are normally distributed, with some selections in or near the center of the target and some selections on either side. For one such sequence, the following selection coordinates are logged:

-29.41, -27.16, -19.91, -16.6, -15.71, -8.91, -1.83, 3.84, 4.34, 7.09, 8.84, 18.34, 21.34, 22.09, 26.13, 31.09, 44.96, 45.83, 63.09, 71.09
The coordinate system for the data above is adjusted such that a selection in the center of the target is at coordinate 0.0. Selections on the near side are negative, while selections on the far side are positive. Although the data above are sorted in ascending order, during testing the selections may have occurred in any order. To test the hypothesis that the selection coordinates are normally distributed, the Lilliefors utility may be used:


      PROMPT>type fittstouchdata1.txt
      -29.41, -27.16, -19.91, -16.6, -15.71, -8.91, -1.83, 3.84, 4.34, 7.09, 8.84, 18.34, 21.34, 22.09, 26.13, 31.09, 44.96, 45.83, 63.09, 71.09
      
      PROMPT>java Lilliefors fittstouchdata1.txt -t
      File: fittstouchdata1.txt
      =====================================================================
         A        B          C          D          E             F
         x       OCF         z         ECF     |OCFi-ECFi|  |OCFi-1-ECFi|
      ---------------------------------------------------------------------
      -29.41    0.05000   -1.45253    0.07318    0.02318      0.07318
      -27.16    0.10000   -1.37441    0.08466    0.01534      0.03466
      -19.91    0.15000   -1.12270    0.13078    0.01922      0.03078
      -16.60    0.20000   -1.00778    0.15678    0.04322      0.00678
      -15.71    0.25000   -0.97688    0.16431    0.08569      0.03569
       -8.91    0.30000   -0.74079    0.22941    0.07059      0.02059
       -1.83    0.35000   -0.49499    0.31031    0.03969      0.01031
        3.84    0.40000   -0.29813    0.38280    0.01720      0.03280
        4.34    0.45000   -0.28077    0.38944    0.06056      0.01056
        7.09    0.50000   -0.18529    0.42650    0.07350      0.02350
        8.84    0.55000   -0.12454    0.45045    0.09955      0.04955
       18.34    0.60000    0.20529    0.58133    0.01867      0.03133
       21.34    0.65000    0.30945    0.62151    0.02849      0.02151
       22.09    0.70000    0.33549    0.63137    0.06863      0.01863
       26.13    0.75000    0.47575    0.68287    0.06713      0.01713
       31.09    0.80000    0.64796    0.74149    0.05851      0.00851
       44.96    0.85000    1.12950    0.87066    0.02066      0.07066
       45.83    0.90000    1.15971    0.87692    0.02308      0.02692
       63.09    0.95000    1.75896    0.96071    0.01071      0.06071
       71.09    1.00000    2.03671    0.97916    0.02084      0.02916
       =====================================================================
       NOTE: OCF = Observed Cumulative Freq, ECF = Expected Cumulative Freq
       n = 20 (20 unique values)
       mean = 12.43
       sd = 28.8029
       m1 = 0.0996 (=max(E))
       m2 = 0.0732 (=max(F))
       M = 0.0996 (=max(m1, m2))
       Test statistic = 0.0996
       Critical value = 0.1920 (alpha .05, n = 20)
       Result: Null hypothesis (normality) is NOT REJECTED
 
With the -t option, the utility outputs a table demonstrating the calculations leading to the test statistic. The test statistic is M = 0.0996. The critical value is retrieved from a lookup table. The value cited (CV = 0.1920) corresponds to an alpha of .05 and a sample size of 20. Since M did not exceed the CV, the null hypothesis is retained, which is to say, not rejected (M = 0.0996, p > .05, n = 20). The conclusion is that the selection coordinates are normally distributed.

The calculations and table were modelled after those in Sheskin's Handbook of Parametric and Nonparametric Statistical Procedures (5th ed.), 2011, CRC Press (Test #7, pp. 261-275).

The following image shows the expected (red) and observed (blue) cumulative frequency distributions for the data in the example above. Can you spot the sample point that yields the test statistic?

As another example, consider the following data from a sequence of trials in the same experiment:

-26.64, -24.54, -15.17, -14.61, -13.89, -13.88, -13.51, -12.2, -11.57, -10.69, -7.53, -6.69, -1.52, 1.93, 2.43, 17.29, 21.79, 23.68, 33.07, 36.75
The null hypothesis of normality is tested as follows:


      PROMPT>type fittstouchdata-2.txt
      -26.64, -24.54, -15.17, -14.61, -13.89, -13.88, -13.51, -12.2, -11.57, -10.69, -7.53, -6.69, -1.52, 1.93, 2.43, 17.29, 21.79, 23.68, 33.07, 36.75
       
      PROMPT>java Lilliefors fittstouchdata-2.txt -t
      File: fittstouchdata2.txt
      =====================================================================
         A        B          C          D          E             F
         x       OCF         z         ECF     |OCFi-ECFi|  |OCFi-1-ECFi|
      ---------------------------------------------------------------------
      -26.64    0.05000   -1.33923    0.09025    0.04025      0.09025
      -24.54    0.10000   -1.22613    0.11008    0.01008      0.06008
      -15.17    0.15000   -0.72146    0.23531    0.08531      0.13531
      -14.61    0.20000   -0.69130    0.24469    0.04469      0.09469
      -13.89    0.25000   -0.65252    0.25703    0.00703      0.05703
      -13.88    0.30000   -0.65198    0.25721    0.04279      0.00721
      -13.51    0.35000   -0.63205    0.26368    0.08632      0.03632
      -12.20    0.40000   -0.56149    0.28723    0.11277      0.06277
      -11.57    0.45000   -0.52756    0.29890    0.15110      0.10110
      -10.69    0.50000   -0.48016    0.31556    0.18444      0.13444
       -7.53    0.55000   -0.30997    0.37829    0.17171      0.12171
       -6.69    0.60000   -0.26472    0.39561    0.20439      0.15439
       -1.52    0.65000    0.01373    0.50548    0.14452      0.09452
        1.93    0.70000    0.19955    0.57908    0.12092      0.07092
        2.43    0.75000    0.22648    0.58959    0.16041      0.11041
       17.29    0.80000    1.02685    0.84775    0.04775      0.09775
       21.79    0.85000    1.26922    0.89782    0.04782      0.09782
       23.68    0.90000    1.37101    0.91481    0.01481      0.06481
       33.07    0.95000    1.87676    0.96972    0.01972      0.06972
       36.75    1.00000    2.07497    0.98101    0.01899      0.03101
      =====================================================================
      NOTE: OCF = Observed Cumulative Freq, ECF = Expected Cumulative Freq
      n = 20 (20 unique values)
      mean = -1.78
      sd = 18.5666
      m1 = 0.2044 (=max(E))
      m2 = 0.1544 (=max(F))
      m = 0.2044 (=max(m1, m2))
      Test statistic = 0.2044
      Critical value = 0.1920 (alpha = .05, n = 20)
      Result: Null hypothesis (normality) is REJECTED
 
For this sequence, the test statistic is M = 0.2044 which exceeds the critical value of 0.1920. Therefore, the null hypothesis is rejected (M = 0.2044, p < .05, n = 20). The anomaly in the distribution is evident in the following chart showing the observed and expected cumulative frequency distributions:

The Lilliefors utility reads sample data from a text file. Each line in the file is assumed to hold the samples from a single data set (comma or space delimited). If the data file contains multiple data sets on multiple lines, each line is processed as in the examples above. Blank lines or lines beginning with "#" are ignored. If the -t option is omitted, the output is a single-line summary for each test. Here's an abbreviated excerpt of one such analysis:


      PROMPT>type dxdata.txt
      # ============
      # File: FittsTilt-P01-S01-B01-G01-C01-1D.sd1
      -13.406849,-14.906796,1.0931833,-9.656806,-25.90678,10.912159,-32.65685,-2.4068189,-34.406815,-14.322453,-20.656765,16.343199,-4.020278,17.343168,6.093225,21.618008,-29.906813,-1.9068128,-19.906826,-3.156789
      11.622174,24.663057,26.898283,23.398296,18.398281,36.920197,41.898308,38.398296,27.14831,38.248386,39.398304,35.1483,12.8982725,29.398277,19.398333,15.648286,18.398277,29.398315,21.148266,11.398319
      -9.203405,21.13783,-8.703413,22.602818,-4.9534187,13.546619,6.296603,41.269993,2.04658,47.79662,-11.061947,18.592436,0.2966088,20.546589,-3.203395,19.700209,5.2965693,-4.703401,29.54656,29.796608
      -6.4067583,10.593187,-10.656808,26.093187,-9.629451,8.093193,6.4638014,-14.383714,22.843225,22.09317,28.593164,43.093147,13.593171,33.958378,-10.906848,0.3431504,-22.40684,-5.906815,-25.656794,22.59317
      14.046619,30.994177,7.796592,28.27775,17.796604,29.796589,1.7965907,58.796597,26.296593,67.79662,4.6070786,36.215733,-8.203432,13.910416,-3.2034361,5.0466084,20.796587,22.296598,-22.436363,21.425077
      9.799696,37.898308,-11.601676,38.148327,1.8753699,33.164165,11.398289,41.126015,3.6483347,32.39829,-0.601696,21.648312,-7.974852,26.99983,-12.1016865,20.445698,-10.101694,27.898289,4.648325,15.898285
      
      # ============
      # File: FittsTilt-P01-S01-B01-G01-C02-2D.sd1
      -0.39720023,26.360846,-10.212834,26.615536,-8.005627,29.986546,8.108524,19.799362,-8.562092,0.2688715,-1.1016916,8.377718,7.031287,-6.7654243,1.4195294,-9.081079,6.5680585,-5.4352174,12.937191,-12.190426
      ...    
      
      PROMPT>java Lilliefors dxdata.txt
      ...
      n= 20,  mean= -4.30,  sd= 26.12,  0.18035 > 0.19200 ?  NOT REJECTED
      n= 20,  mean=-11.44,  sd= 23.16,  0.11793 > 0.19200 ?  NOT REJECTED
      n= 20,  mean= -7.71,  sd= 27.23,  0.14461 > 0.19200 ?  NOT REJECTED
      n= 20,  mean= -5.91,  sd= 17.22,  0.12051 > 0.19200 ?  NOT REJECTED
      n= 20,  mean= 13.63,  sd= 16.95,  0.14221 > 0.19200 ?  NOT REJECTED
      n= 20,  mean=  7.10,  sd= 28.99,  0.19811 > 0.19200 ?  REJECTED
      n= 20,  mean= -6.63,  sd= 20.23,  0.17045 > 0.19200 ?  NOT REJECTED
      n= 20,  mean= 14.61,  sd= 26.60,  0.11878 > 0.19200 ?  NOT REJECTED
      n= 20,  mean=  4.13,  sd= 35.54,  0.16664 > 0.19200 ?  NOT REJECTED
      ...
      n= 20,  mean= -1.77,  sd= 18.57,  0.20444 > 0.19200 ?  REJECTED
      n= 20,  mean=-17.97,  sd= 13.51,  0.12083 > 0.19200 ?  NOT REJECTED
      n= 20,  mean=-13.17,  sd= 14.59,  0.16437 > 0.19200 ?  NOT REJECTED
      n= 20,  mean= -5.84,  sd= 20.82,  0.08892 > 0.19200 ?  NOT REJECTED
      n= 20,  mean=-23.17,  sd= 32.45,  0.11948 > 0.19200 ?  NOT REJECTED
      n= 20,  mean=-15.41,  sd= 24.75,  0.11732 > 0.19200 ?  NOT REJECTED
      Data sets=1080, Rejected= 73, Not_rejected= 1007
 
For details and discussion on the calculation of the test statistic, consult the source code (and added comments) or the Sheskin reference cited above. One tricky aspect of the calculation is the method for handling repeated values in the raw data. The method is implemented, but the details are not explored here.

Author:
Scott MacKenzie, 2013-2014

Constructor Summary
Lilliefors()
           
 
Method Summary
static boolean isNormal(double[] data)
           
static void main(java.lang.String[] args)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Lilliefors

public Lilliefors()
Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
Throws:
java.io.IOException

isNormal

public static boolean isNormal(double[] data)