## Class Lilliefors

```java.lang.Object
Lilliefors
```

`public class Lillieforsextends java.lang.Object`

Lilliefors - a Java utility to perform a test for normality on a data set.

The Lilliefors test is an extension the Kolmogorov-Smirnov test for normality. Both are tests of the null hypothesis that a sample of data points is normally distributed.

The K-S test is used when the population mean and variance are known or assumed. The Lilliefors test is used when the population mean and variance are not known but are estimated from the sample data.

The procedure for calculating the test statistic is the same for both tests, except that the Lilliefors test uses the sample mean and sample standard deviation in building the observed (i.e., sample) distribution against which the expected (i.e., normal) distribution is compared. Because of this, the test statistic obtained with the Lilliefors procedure is more conservative, thus leading to a greater likelihood of accepting the null hypothesis. To compensate for this and to maintain the desired alpha, the Lilliefors test uses a more stringent set of critical values in assessing the test statistic.

As with the chi-square goodness-of-fit test, the Lilliefors test compares observed and expected values. However, the comparisons are not between the counts or frequencies among nominal-scale attributes of people, systems, interaction techniques, or such. Instead, the comparisons are between the observed and expected cumulative frequencies in two distributions. The expected distribution is the normal distribution, as represented in standardized z scores. The observed distribution is developed from the sample data.

In the image below – gleaned from Wikipedia (click here) – the expected cumulative frequency (ECF) distribution is shown in red while the observed cumulative frequency (OCF) distribution is shown in blue.

The test statistic (shown in black) is the largest vertical difference between the ECF and the OCF. If it exceeds a threshold – the critical value – the null hypothesis of normality is rejected. If the test statistic is less than the critical value, the null hypothesis is retained.

As an example, consider an experiment where participants interact with a touchscreen device. The experiment is called `FittsTouch` and engages 16 participants to perform touch-based target selection tasks. Each task (aka trial) involves moving a finger toward a target and tapping it. The trials are organized as sequences of 20 selections, with each sequence corresponding to a test condition. There is an assumption that the selection coordinates along the axis of approach to the target are normally distributed, with some selections in or near the center of the target and some selections on either side. For one such sequence, the following selection coordinates are logged:

-29.41, -27.16, -19.91, -16.6, -15.71, -8.91, -1.83, 3.84, 4.34, 7.09, 8.84, 18.34, 21.34, 22.09, 26.13, 31.09, 44.96, 45.83, 63.09, 71.09
The coordinate system for the data above is adjusted such that a selection in the center of the target is at coordinate 0.0. Selections on the near side are negative, while selections on the far side are positive. Although the data above are sorted in ascending order, during testing the selections may have occurred in any order. To test the hypothesis that the selection coordinates are normally distributed, the `Lilliefors` utility may be used:

``````
PROMPT>type fittstouchdata1.txt
-29.41, -27.16, -19.91, -16.6, -15.71, -8.91, -1.83, 3.84, 4.34, 7.09, 8.84, 18.34, 21.34, 22.09, 26.13, 31.09, 44.96, 45.83, 63.09, 71.09

PROMPT>java Lilliefors fittstouchdata1.txt -t
File: fittstouchdata1.txt
=====================================================================
A        B          C          D          E             F
x       OCF         z         ECF     |OCFi-ECFi|  |OCFi-1-ECFi|
---------------------------------------------------------------------
-29.41    0.05000   -1.45253    0.07318    0.02318      0.07318
-27.16    0.10000   -1.37441    0.08466    0.01534      0.03466
-19.91    0.15000   -1.12270    0.13078    0.01922      0.03078
-16.60    0.20000   -1.00778    0.15678    0.04322      0.00678
-15.71    0.25000   -0.97688    0.16431    0.08569      0.03569
-8.91    0.30000   -0.74079    0.22941    0.07059      0.02059
-1.83    0.35000   -0.49499    0.31031    0.03969      0.01031
3.84    0.40000   -0.29813    0.38280    0.01720      0.03280
4.34    0.45000   -0.28077    0.38944    0.06056      0.01056
7.09    0.50000   -0.18529    0.42650    0.07350      0.02350
8.84    0.55000   -0.12454    0.45045    0.09955      0.04955
18.34    0.60000    0.20529    0.58133    0.01867      0.03133
21.34    0.65000    0.30945    0.62151    0.02849      0.02151
22.09    0.70000    0.33549    0.63137    0.06863      0.01863
26.13    0.75000    0.47575    0.68287    0.06713      0.01713
31.09    0.80000    0.64796    0.74149    0.05851      0.00851
44.96    0.85000    1.12950    0.87066    0.02066      0.07066
45.83    0.90000    1.15971    0.87692    0.02308      0.02692
63.09    0.95000    1.75896    0.96071    0.01071      0.06071
71.09    1.00000    2.03671    0.97916    0.02084      0.02916
=====================================================================
NOTE: OCF = Observed Cumulative Freq, ECF = Expected Cumulative Freq
n = 20 (20 unique values)
mean = 12.43
sd = 28.8029
m1 = 0.0996 (=max(E))
m2 = 0.0732 (=max(F))
M = 0.0996 (=max(m1, m2))
Test statistic = 0.0996
Critical value = 0.1920 (alpha .05, n = 20)
Result: Null hypothesis (normality) is NOT REJECTED
``````
``` ``` With the `-t` option, the utility outputs a table demonstrating the calculations leading to the test statistic. The test statistic is M = 0.0996. The critical value is retrieved from a lookup table. The value cited (CV = 0.1920) corresponds to an alpha of .05 and a sample size of 20. Since M did not exceed the CV, the null hypothesis is retained, which is to say, not rejected (M = 0.0996, p > .05, n = 20). The conclusion is that the selection coordinates are normally distributed.

The calculations and table were modelled after those in Sheskin's Handbook of Parametric and Nonparametric Statistical Procedures (5th ed.), 2011, CRC Press (Test #7, pp. 261-275).

The following image shows the expected (red) and observed (blue) cumulative frequency distributions for the data in the example above. Can you spot the sample point that yields the test statistic?

As another example, consider the following data from a sequence of trials in the same experiment:

-26.64, -24.54, -15.17, -14.61, -13.89, -13.88, -13.51, -12.2, -11.57, -10.69, -7.53, -6.69, -1.52, 1.93, 2.43, 17.29, 21.79, 23.68, 33.07, 36.75
The null hypothesis of normality is tested as follows:

``````
PROMPT>type fittstouchdata-2.txt
-26.64, -24.54, -15.17, -14.61, -13.89, -13.88, -13.51, -12.2, -11.57, -10.69, -7.53, -6.69, -1.52, 1.93, 2.43, 17.29, 21.79, 23.68, 33.07, 36.75

PROMPT>java Lilliefors fittstouchdata-2.txt -t
File: fittstouchdata2.txt
=====================================================================
A        B          C          D          E             F
x       OCF         z         ECF     |OCFi-ECFi|  |OCFi-1-ECFi|
---------------------------------------------------------------------
-26.64    0.05000   -1.33923    0.09025    0.04025      0.09025
-24.54    0.10000   -1.22613    0.11008    0.01008      0.06008
-15.17    0.15000   -0.72146    0.23531    0.08531      0.13531
-14.61    0.20000   -0.69130    0.24469    0.04469      0.09469
-13.89    0.25000   -0.65252    0.25703    0.00703      0.05703
-13.88    0.30000   -0.65198    0.25721    0.04279      0.00721
-13.51    0.35000   -0.63205    0.26368    0.08632      0.03632
-12.20    0.40000   -0.56149    0.28723    0.11277      0.06277
-11.57    0.45000   -0.52756    0.29890    0.15110      0.10110
-10.69    0.50000   -0.48016    0.31556    0.18444      0.13444
-7.53    0.55000   -0.30997    0.37829    0.17171      0.12171
-6.69    0.60000   -0.26472    0.39561    0.20439      0.15439
-1.52    0.65000    0.01373    0.50548    0.14452      0.09452
1.93    0.70000    0.19955    0.57908    0.12092      0.07092
2.43    0.75000    0.22648    0.58959    0.16041      0.11041
17.29    0.80000    1.02685    0.84775    0.04775      0.09775
21.79    0.85000    1.26922    0.89782    0.04782      0.09782
23.68    0.90000    1.37101    0.91481    0.01481      0.06481
33.07    0.95000    1.87676    0.96972    0.01972      0.06972
36.75    1.00000    2.07497    0.98101    0.01899      0.03101
=====================================================================
NOTE: OCF = Observed Cumulative Freq, ECF = Expected Cumulative Freq
n = 20 (20 unique values)
mean = -1.78
sd = 18.5666
m1 = 0.2044 (=max(E))
m2 = 0.1544 (=max(F))
m = 0.2044 (=max(m1, m2))
Test statistic = 0.2044
Critical value = 0.1920 (alpha = .05, n = 20)
Result: Null hypothesis (normality) is REJECTED
``````
``` ``` For this sequence, the test statistic is M = 0.2044 which exceeds the critical value of 0.1920. Therefore, the null hypothesis is rejected (M = 0.2044, p < .05, n = 20). The anomaly in the distribution is evident in the following chart showing the observed and expected cumulative frequency distributions:

The `Lilliefors` utility reads sample data from a text file. Each line in the file is assumed to hold the samples from a single data set (comma or space delimited). If the data file contains multiple data sets on multiple lines, each line is processed as in the examples above. Blank lines or lines beginning with "#" are ignored. If the `-t` option is omitted, the output is a single-line summary for each test. Here's an abbreviated excerpt of one such analysis:

``````
PROMPT>type dxdata.txt
# ============
# File: FittsTilt-P01-S01-B01-G01-C01-1D.sd1
-13.406849,-14.906796,1.0931833,-9.656806,-25.90678,10.912159,-32.65685,-2.4068189,-34.406815,-14.322453,-20.656765,16.343199,-4.020278,17.343168,6.093225,21.618008,-29.906813,-1.9068128,-19.906826,-3.156789
11.622174,24.663057,26.898283,23.398296,18.398281,36.920197,41.898308,38.398296,27.14831,38.248386,39.398304,35.1483,12.8982725,29.398277,19.398333,15.648286,18.398277,29.398315,21.148266,11.398319
-9.203405,21.13783,-8.703413,22.602818,-4.9534187,13.546619,6.296603,41.269993,2.04658,47.79662,-11.061947,18.592436,0.2966088,20.546589,-3.203395,19.700209,5.2965693,-4.703401,29.54656,29.796608
-6.4067583,10.593187,-10.656808,26.093187,-9.629451,8.093193,6.4638014,-14.383714,22.843225,22.09317,28.593164,43.093147,13.593171,33.958378,-10.906848,0.3431504,-22.40684,-5.906815,-25.656794,22.59317
14.046619,30.994177,7.796592,28.27775,17.796604,29.796589,1.7965907,58.796597,26.296593,67.79662,4.6070786,36.215733,-8.203432,13.910416,-3.2034361,5.0466084,20.796587,22.296598,-22.436363,21.425077
9.799696,37.898308,-11.601676,38.148327,1.8753699,33.164165,11.398289,41.126015,3.6483347,32.39829,-0.601696,21.648312,-7.974852,26.99983,-12.1016865,20.445698,-10.101694,27.898289,4.648325,15.898285

# ============
# File: FittsTilt-P01-S01-B01-G01-C02-2D.sd1
-0.39720023,26.360846,-10.212834,26.615536,-8.005627,29.986546,8.108524,19.799362,-8.562092,0.2688715,-1.1016916,8.377718,7.031287,-6.7654243,1.4195294,-9.081079,6.5680585,-5.4352174,12.937191,-12.190426
...

PROMPT>java Lilliefors dxdata.txt
...
n= 20,  mean= -4.30,  sd= 26.12,  0.18035 > 0.19200 ?  NOT REJECTED
n= 20,  mean=-11.44,  sd= 23.16,  0.11793 > 0.19200 ?  NOT REJECTED
n= 20,  mean= -7.71,  sd= 27.23,  0.14461 > 0.19200 ?  NOT REJECTED
n= 20,  mean= -5.91,  sd= 17.22,  0.12051 > 0.19200 ?  NOT REJECTED
n= 20,  mean= 13.63,  sd= 16.95,  0.14221 > 0.19200 ?  NOT REJECTED
n= 20,  mean=  7.10,  sd= 28.99,  0.19811 > 0.19200 ?  REJECTED
n= 20,  mean= -6.63,  sd= 20.23,  0.17045 > 0.19200 ?  NOT REJECTED
n= 20,  mean= 14.61,  sd= 26.60,  0.11878 > 0.19200 ?  NOT REJECTED
n= 20,  mean=  4.13,  sd= 35.54,  0.16664 > 0.19200 ?  NOT REJECTED
...
n= 20,  mean= -1.77,  sd= 18.57,  0.20444 > 0.19200 ?  REJECTED
n= 20,  mean=-17.97,  sd= 13.51,  0.12083 > 0.19200 ?  NOT REJECTED
n= 20,  mean=-13.17,  sd= 14.59,  0.16437 > 0.19200 ?  NOT REJECTED
n= 20,  mean= -5.84,  sd= 20.82,  0.08892 > 0.19200 ?  NOT REJECTED
n= 20,  mean=-23.17,  sd= 32.45,  0.11948 > 0.19200 ?  NOT REJECTED
n= 20,  mean=-15.41,  sd= 24.75,  0.11732 > 0.19200 ?  NOT REJECTED
Data sets=1080, Rejected= 73, Not_rejected= 1007
``````
``` ``` For details and discussion on the calculation of the test statistic, consult the source code (and added comments) or the Sheskin reference cited above. One tricky aspect of the calculation is the method for handling repeated values in the raw data. The method is implemented, but the details are not explored here.

Author:
Scott MacKenzie, 2013-2014

Constructor Summary
`Lilliefors()`

Method Summary
`static boolean` `isNormal(double[] data)`

`static void` `main(java.lang.String[] args)`

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

### Lilliefors

`public Lilliefors()`
Method Detail

### main

```public static void main(java.lang.String[] args)
throws java.io.IOException```
Throws:
`java.io.IOException`

### isNormal

`public static boolean isNormal(double[] data)`