## How to Report an

I. Scott MacKenzieF-Statistic

Dept. of Electrical Engineering and Computer Science

York University

Toronto, Ontario, Canada M3J 1P3

mack@cse.yorku.ca

Last update: 11/11/2013

## Background

Human-computer interaction research often involves experiments with human participants to test one or more hypotheses. One of the most common statistical tools for hypothesis testing is the analysis of variance (ANOVA). The ANOVA result is reported as anF-statistic and its associated degrees of freedom andp-value. This research note does not explain the analysis of variance, or even theF-statistic itself. Rather, we explain only the proper way to report anF-statistic. "Proper way" refers to the formatting of the statistic and to the construction of a dialog to present it. Simple as this seems,F-statistics are often improperly formatted and poorly presented in research papers. Let's get to it.

## Discussion

Assume you conducted an experiment with ten participants to compare natural and abstract icons. The independent variable isIcon Typewith two levels,naturalandabstract. Participants completed a timed task where they associated the meaning of icons with icon images, grouped according to type. The dependent variable is task completion time, in seconds.After conducting the experiment, you have the following data:

Using your favourite statistics program, you run an analysis of variance on the data and obtain the following:

Because

pis less than .05, the result is statistically significant. This might be conveyed in a research paper as follows:

There was a significant effect of Icon Type on task completion time (F_{1,9}= 33.4,p< .005).

And that's about it. Of course, there are many ways to craft a dialog to report the result – this is just a simple example. Importantly, the sentence captures the relationship between the independent variable and the dependent variable; i.e., "There was a significant effect of [independent variable] on [dependent variable]".

Note in the sentence above that the ANOVA results are presented in parentheses. Bear in mind that the results of statistical tests are

notthe results per se. Statistical tests play a supporting role. They allow conclusions to be drawn with the weight of probabilities. The results lie in the observations and measurements. The measurements are best articulated in terms of the mean scores across test conditions and the effect size. (The "effect size" is the difference in mean scores.) Given this, a more informative presentation of the results might appears as follows:

The grand mean for task completion time was 724 seconds. By Icon Type, the means were 698 seconds for natural icons and 750 seconds for abstract icons. As evident in the means, the recongition task took 7.5% longer for the abstract icons compared to the natural icons. The difference was statistically significant (F_{1,9}= 33.4,p< .005).

Of particular importance here is the presentation of the ANOVA result, including the

F-statistic. The key points are as follows:

There are a few minor and accepted variations to the formatting points above, but stick to these rules and you're in good shape.

- Set in parentheses
- Uppercase for
F- Lowercase for
p- Italics for
FandpF-statistic rounded to three (maybe four) significant digitsF-statistic followed by a comma, then a space- Space on both sides of equal sign and both sides of less than sign
- Degrees of freedom set as subscript, plain, smaller font
- No space following the comma in the degrees of freedom
- Exact value of
pnot reportedprounded up to a more conservative value from the set {.05, .01, .005, .001, .0005, .0001}- No zero before the decimal point for
p(because it is constrained between 0 and 1)If

pis above .05, the result is not statistically significant. In this case, there are two possibilities. Ifpis greater than .05 andFis greater than 1, report the result something like this:

There was no significant effect of Icon Type on task completion time (F_{1,9}= 2.34,p> .05).

If

pis greater than .05 andFis less than 1, report the result something like this:

There was no significant effect of Icon Type on task completion time (F_{1,9}= 0.876, ns).

Statistical significance is impossible if

Fis less than 1; hence, the convention of reporting the probability simply as "ns" for "not significant".Good luck!

More discussion on the ANOVA and other statistical tools commonly used in HCI research are found in Chapter 6 in Human-Computer Interaction: An Empirical Research Perspective.

## Acknowledgement

The motivation to put together this research note came from discussions with Wendy MacKay atCHI 2008in Florence. The hypothetical experiment and data are from Dix et al.'s Human Computer Interaction (Prentice Hall, 2004, 3rd ed., p. 337).