

PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 
java.lang.Object Lilliefors
public class Lilliefors
Lilliefors  a Java utility to perform a test for normality on a data set.
The Lilliefors test is an extension the KolmogorovSmirnov test for normality. Both are tests of the null hypothesis that a sample of data points is normally distributed.
The KS test is used when the population mean and variance are known or assumed. The Lilliefors test is used when the population mean and variance are not known but are estimated from the sample data.
The procedure for calculating the test statistic is the same for both tests, except that the Lilliefors test uses the sample mean and sample standard deviation in building the observed (i.e., sample) distribution against which the expected (i.e., normal) distribution is compared. Because of this, the test statistic obtained with the Lilliefors procedure is more conservative, thus leading to a greater likelihood of accepting the null hypothesis. To compensate for this and to maintain the desired alpha, the Lilliefors test uses a more stringent set of critical values in assessing the test statistic.
As with the chisquare goodnessoffit test, the Lilliefors test compares observed and expected values. However, the comparisons are not between the counts or frequencies among nominalscale attributes of people, systems, interaction techniques, or such. Instead, the comparisons are between the observed and expected cumulative frequencies in two distributions. The expected distribution is the normal distribution, as represented in standardized z scores. The observed distribution is developed from the sample data.
In the image below – gleaned from Wikipedia (click here) – the expected cumulative frequency (ECF) distribution is shown in red while the observed cumulative frequency (OCF) distribution is shown in blue.
The test statistic (shown in black) is the largest vertical difference between the ECF and the OCF. If it exceeds a threshold – the critical value – the null hypothesis of normality is rejected. If the test statistic is less than the critical value, the null hypothesis is retained.
As an example, consider an experiment where participants interact with a touchscreen device. The
experiment is called FittsTouch
and engages 16 participants to perform touchbased
target selection tasks. Each task (aka trial) involves moving a finger toward a target and
tapping it. The trials are organized as sequences of 20 selections, with each sequence
corresponding to a test condition. There is an assumption that the selection coordinates along
the axis of approach to the target are normally distributed, with some selections in or near the
center of the target and some selections on either side. For one such sequence, the following
selection coordinates are logged:
29.41, 27.16, 19.91, 16.6, 15.71, 8.91, 1.83, 3.84, 4.34, 7.09, 8.84, 18.34, 21.34, 22.09, 26.13, 31.09, 44.96, 45.83, 63.09, 71.09The coordinate system for the data above is adjusted such that a selection in the center of the target is at coordinate 0.0. Selections on the near side are negative, while selections on the far side are positive. Although the data above are sorted in ascending order, during testing the selections may have occurred in any order. To test the hypothesis that the selection coordinates are normally distributed, the
Lilliefors
utility may be used:
PROMPT>type fittstouchdata1.txt
29.41, 27.16, 19.91, 16.6, 15.71, 8.91, 1.83, 3.84, 4.34, 7.09, 8.84, 18.34, 21.34, 22.09, 26.13, 31.09, 44.96, 45.83, 63.09, 71.09
PROMPT>java Lilliefors fittstouchdata1.txt t
File: fittstouchdata1.txt
=====================================================================
A B C D E F
x OCF z ECF OCFiECFi OCFi1ECFi

29.41 0.05000 1.45253 0.07318 0.02318 0.07318
27.16 0.10000 1.37441 0.08466 0.01534 0.03466
19.91 0.15000 1.12270 0.13078 0.01922 0.03078
16.60 0.20000 1.00778 0.15678 0.04322 0.00678
15.71 0.25000 0.97688 0.16431 0.08569 0.03569
8.91 0.30000 0.74079 0.22941 0.07059 0.02059
1.83 0.35000 0.49499 0.31031 0.03969 0.01031
3.84 0.40000 0.29813 0.38280 0.01720 0.03280
4.34 0.45000 0.28077 0.38944 0.06056 0.01056
7.09 0.50000 0.18529 0.42650 0.07350 0.02350
8.84 0.55000 0.12454 0.45045 0.09955 0.04955
18.34 0.60000 0.20529 0.58133 0.01867 0.03133
21.34 0.65000 0.30945 0.62151 0.02849 0.02151
22.09 0.70000 0.33549 0.63137 0.06863 0.01863
26.13 0.75000 0.47575 0.68287 0.06713 0.01713
31.09 0.80000 0.64796 0.74149 0.05851 0.00851
44.96 0.85000 1.12950 0.87066 0.02066 0.07066
45.83 0.90000 1.15971 0.87692 0.02308 0.02692
63.09 0.95000 1.75896 0.96071 0.01071 0.06071
71.09 1.00000 2.03671 0.97916 0.02084 0.02916
=====================================================================
NOTE: OCF = Observed Cumulative Freq, ECF = Expected Cumulative Freq
n = 20 (20 unique values)
mean = 12.43
sd = 28.8029
m1 = 0.0996 (=max(E))
m2 = 0.0732 (=max(F))
M = 0.0996 (=max(m1, m2))
Test statistic = 0.0996
Critical value = 0.1920 (alpha .05, n = 20)
Result: Null hypothesis (normality) is NOT REJECTED
With the t
option, the utility outputs a table demonstrating the calculations
leading to the test statistic. The test statistic is M = 0.0996. The critical value is retrieved
from a lookup table. The value cited (CV = 0.1920) corresponds to an alpha of .05 and a sample
size of 20. Since M did not exceed the CV, the null hypothesis is retained, which is to say, not
rejected (M = 0.0996, p > .05, n = 20). The conclusion is that the
selection coordinates are normally distributed.
The calculations and table were modelled after those in Sheskin's Handbook of Parametric and Nonparametric Statistical Procedures (5th ed.), 2011, CRC Press (Test #7, pp. 261275).
The following image shows the expected (red) and observed (blue) cumulative frequency distributions for the data in the example above. Can you spot the sample point that yields the test statistic?
As another example, consider the following data from a sequence of trials in the same experiment:
26.64, 24.54, 15.17, 14.61, 13.89, 13.88, 13.51, 12.2, 11.57, 10.69, 7.53, 6.69, 1.52, 1.93, 2.43, 17.29, 21.79, 23.68, 33.07, 36.75The null hypothesis of normality is tested as follows:
PROMPT>type fittstouchdata2.txt
26.64, 24.54, 15.17, 14.61, 13.89, 13.88, 13.51, 12.2, 11.57, 10.69, 7.53, 6.69, 1.52, 1.93, 2.43, 17.29, 21.79, 23.68, 33.07, 36.75
PROMPT>java Lilliefors fittstouchdata2.txt t
File: fittstouchdata2.txt
=====================================================================
A B C D E F
x OCF z ECF OCFiECFi OCFi1ECFi

26.64 0.05000 1.33923 0.09025 0.04025 0.09025
24.54 0.10000 1.22613 0.11008 0.01008 0.06008
15.17 0.15000 0.72146 0.23531 0.08531 0.13531
14.61 0.20000 0.69130 0.24469 0.04469 0.09469
13.89 0.25000 0.65252 0.25703 0.00703 0.05703
13.88 0.30000 0.65198 0.25721 0.04279 0.00721
13.51 0.35000 0.63205 0.26368 0.08632 0.03632
12.20 0.40000 0.56149 0.28723 0.11277 0.06277
11.57 0.45000 0.52756 0.29890 0.15110 0.10110
10.69 0.50000 0.48016 0.31556 0.18444 0.13444
7.53 0.55000 0.30997 0.37829 0.17171 0.12171
6.69 0.60000 0.26472 0.39561 0.20439 0.15439
1.52 0.65000 0.01373 0.50548 0.14452 0.09452
1.93 0.70000 0.19955 0.57908 0.12092 0.07092
2.43 0.75000 0.22648 0.58959 0.16041 0.11041
17.29 0.80000 1.02685 0.84775 0.04775 0.09775
21.79 0.85000 1.26922 0.89782 0.04782 0.09782
23.68 0.90000 1.37101 0.91481 0.01481 0.06481
33.07 0.95000 1.87676 0.96972 0.01972 0.06972
36.75 1.00000 2.07497 0.98101 0.01899 0.03101
=====================================================================
NOTE: OCF = Observed Cumulative Freq, ECF = Expected Cumulative Freq
n = 20 (20 unique values)
mean = 1.78
sd = 18.5666
m1 = 0.2044 (=max(E))
m2 = 0.1544 (=max(F))
m = 0.2044 (=max(m1, m2))
Test statistic = 0.2044
Critical value = 0.1920 (alpha = .05, n = 20)
Result: Null hypothesis (normality) is REJECTED
For this sequence, the test statistic is M = 0.2044 which exceeds the critical value of 0.1920.
Therefore, the null hypothesis is rejected (M = 0.2044, p < .05, n = 20).
The anomaly in the distribution is evident in the following chart showing the observed and
expected cumulative frequency distributions:
The Lilliefors
utility reads sample data from a text file. Each line in the file is
assumed to hold the samples from a single data set (comma or space delimited). If the data file
contains multiple data sets on multiple lines, each line is processed as in the examples above.
Blank lines or lines beginning with "#" are ignored. If the t
option is omitted,
the output is a singleline summary for each test. Here's an abbreviated excerpt of one such
analysis:
PROMPT>type dxdata.txt
# ============
# File: FittsTiltP01S01B01G01C011D.sd1
13.406849,14.906796,1.0931833,9.656806,25.90678,10.912159,32.65685,2.4068189,34.406815,14.322453,20.656765,16.343199,4.020278,17.343168,6.093225,21.618008,29.906813,1.9068128,19.906826,3.156789
11.622174,24.663057,26.898283,23.398296,18.398281,36.920197,41.898308,38.398296,27.14831,38.248386,39.398304,35.1483,12.8982725,29.398277,19.398333,15.648286,18.398277,29.398315,21.148266,11.398319
9.203405,21.13783,8.703413,22.602818,4.9534187,13.546619,6.296603,41.269993,2.04658,47.79662,11.061947,18.592436,0.2966088,20.546589,3.203395,19.700209,5.2965693,4.703401,29.54656,29.796608
6.4067583,10.593187,10.656808,26.093187,9.629451,8.093193,6.4638014,14.383714,22.843225,22.09317,28.593164,43.093147,13.593171,33.958378,10.906848,0.3431504,22.40684,5.906815,25.656794,22.59317
14.046619,30.994177,7.796592,28.27775,17.796604,29.796589,1.7965907,58.796597,26.296593,67.79662,4.6070786,36.215733,8.203432,13.910416,3.2034361,5.0466084,20.796587,22.296598,22.436363,21.425077
9.799696,37.898308,11.601676,38.148327,1.8753699,33.164165,11.398289,41.126015,3.6483347,32.39829,0.601696,21.648312,7.974852,26.99983,12.1016865,20.445698,10.101694,27.898289,4.648325,15.898285
# ============
# File: FittsTiltP01S01B01G01C022D.sd1
0.39720023,26.360846,10.212834,26.615536,8.005627,29.986546,8.108524,19.799362,8.562092,0.2688715,1.1016916,8.377718,7.031287,6.7654243,1.4195294,9.081079,6.5680585,5.4352174,12.937191,12.190426
...
PROMPT>java Lilliefors dxdata.txt
...
n= 20, mean= 4.30, sd= 26.12, 0.18035 > 0.19200 ? NOT REJECTED
n= 20, mean=11.44, sd= 23.16, 0.11793 > 0.19200 ? NOT REJECTED
n= 20, mean= 7.71, sd= 27.23, 0.14461 > 0.19200 ? NOT REJECTED
n= 20, mean= 5.91, sd= 17.22, 0.12051 > 0.19200 ? NOT REJECTED
n= 20, mean= 13.63, sd= 16.95, 0.14221 > 0.19200 ? NOT REJECTED
n= 20, mean= 7.10, sd= 28.99, 0.19811 > 0.19200 ? REJECTED
n= 20, mean= 6.63, sd= 20.23, 0.17045 > 0.19200 ? NOT REJECTED
n= 20, mean= 14.61, sd= 26.60, 0.11878 > 0.19200 ? NOT REJECTED
n= 20, mean= 4.13, sd= 35.54, 0.16664 > 0.19200 ? NOT REJECTED
...
n= 20, mean= 1.77, sd= 18.57, 0.20444 > 0.19200 ? REJECTED
n= 20, mean=17.97, sd= 13.51, 0.12083 > 0.19200 ? NOT REJECTED
n= 20, mean=13.17, sd= 14.59, 0.16437 > 0.19200 ? NOT REJECTED
n= 20, mean= 5.84, sd= 20.82, 0.08892 > 0.19200 ? NOT REJECTED
n= 20, mean=23.17, sd= 32.45, 0.11948 > 0.19200 ? NOT REJECTED
n= 20, mean=15.41, sd= 24.75, 0.11732 > 0.19200 ? NOT REJECTED
Data sets=1080, Rejected= 73, Not_rejected= 1007
For details and discussion on the calculation of the test statistic, consult the source code (and
added comments) or the Sheskin reference cited above. One tricky aspect of the calculation is the
method for handling repeated values in the raw data. The method is implemented, but the details
are not explored here.
Constructor Summary  

Lilliefors()

Method Summary  

static boolean 
isNormal(double[] data)

static void 
main(java.lang.String[] args)

Methods inherited from class java.lang.Object 

equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait 
Constructor Detail 

public Lilliefors()
Method Detail 

public static void main(java.lang.String[] args) throws java.io.IOException
java.io.IOException
public static boolean isNormal(double[] data)


PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 