When comparing a continuous outcome between two groups, the \(t\) test is the basic test used. But, what do you do when you have more than two groups to compare? A foundational technique for this purpose is analysis of variance (ANOVA). This post is a conceptual overveiw of ANOVA, leading to example analyses aimed at developing intuitions about how ANOVA works. To get the most out of this post, you should work through a textbook chapter on one way ANOVA ahead of time.

## New Notation

The following table includes new notation we need to understand ANOVA. Note that sometimes you will see the \(E\) subscript replace the \(W\) subscript as the within subject variation is what is not explained by the model and is therefore captured in the error term in these models. In other words, the sum of squared deviations with subjects (\(SS_W\)) is the same as the sum of squared errors (\(SS_E\)). Also note that for now, we are assuming that the

Symbol | Description |
---|---|

\(k\) | number of groups |

\(n_j\) | number in each group |

\(N\) | total number of participants (\(N = n_j \times k\)) |

\(df_{B}\) | between group degrees of freedom |

\(df_E\) or \(df_{W}\) | error or within-group degrees of freedom |

\(SS_{B}\) | sum of squares between group |

\(SS_W\) | sum of squares error |

\(SS_T\) | sum of squares total |

\(MS_{B}\) | mean square between group |

\(MS_W\) | mean square error |

## Assumptions of One Way ANOVA

**Normality** - We assume a normal population distribution for the dependent variable for each group. ANOVA is generally robust to violations of normality.

**Equality of Variances** - We assume that the population variance is the same for each group. ANOVA is generally robust to mild violations of this assumption.

**Statistical Independence** - We assume that the scores are independent of each other. ANOVA is NOT robust to violations of independence. So, if this assumptions is questionable, your results could be very misleading.

**Random Sampling** - We assume that individuals in each group were randomly sampled from their respective populations. ANOVA is also NOT very robust to violation of this assumption. If you sample does not represent your population of interest, you must really be careful with the inferences you make.

## Formulae

This section introduces the formulae needed to calculate the test statistic associated with ANOVA

### Degrees of freedom

Recall that degrees of freedom are closely related to the number of parameters being estimated, and the number of data points used to estimate these parameters. The degrees of freedom between groups, \(df_B\), is the number of groups, \(k\), minus 1.

\[ df_{B} = k - 1, \]

The within groups degrees of freedom, \(df_W\), is the total sample size, \(N\), minus the number of groups, \(k\). \[ df_W = N - k \]

Add these two together (\(df_B + df_W\)), and you get the total degrees of freedom, which can also be calculated by taking the total number of cases, \(N\), and subtracting 1. \[ df_T = df_{B} + df_W = N - 1 \]

### \(F\) statistic

When we used the \(t\) test, we used the \(t\) distribution, which like the normal distribution captures all values between negative and positive infinity. This makes sense because mean differences, for which the \(t\) test evaluates, can be positive or negative. But, for ANOVA, we are evaluating the variance of means instead of mean differences. And variances cannot be negative, so we need a different test distribution, namely, the \(F\) distribution. This distribution contains all values from 0 to positive infinity, which include all possible values for a variance. Just like the shape of the \(t\) distribution depends on the degrees freedom of the \(t\) test, the shape of the \(F\) distribution depends on the degrees of freedom associated with the \(F\) test.

The \(F\) statistic can be thought of as the ratio of the variance of the means, or the between group variance, and the typical variability scores among individuals within the groups, or the within group variance. We call the variance between groups the mean square between groups, or \(MS_B\), and the variance within groups the mean square within groups, or \(MS_W\). So, conceptually, the \(F\) statistic is given as

\[ F = \frac{\text{variance between groups}}{{\text{variance within groups}}} = \frac{MS_{B}}{MS_W}. \]

To calculate the \(F\) statistics, we first need to calculate \(MS_B\) and \(MS_W\).

### Statistical hypotheses

#### Null

\[ H_0: \sigma^2_{\mu} = 0. \]

#### Alternative

\[
H_a: \sigma^2_\mu > 0.
\]

## Conceptual Example 1: Substantial Between Subjects Variance, with Negligible Within Group Variance

group | score | \(\text{score}^2\) |
---|---|---|

Group A | ||

### Hand Calculations

Of course you can calculate ANOVA by hand. Below, I give calculations using the method found in Chapter 12 of Privitera (2015).

#### Stage 1: Preliminary Calculations

group | n | mean | sd | var | sum.x | sum.x2 | |
---|---|---|---|---|---|---|---|

1 | A | 5 | 1.00 | 0.10 | 0.01 | 5.00 | 5.04 |

\(N = n \times k =\) 15

\(df_{B} = k -1 =\) 2

\(df_W = N - k =\) 12

\(df = N - 1=\) 14

\(\Sigma{x_T} =\) 30

\(\Sigma{x^2_T} =\) 70.12

\((\Sigma{x_T})^2 =\) 900

\(\Sigma{\frac{x^2}{n}}=\) 70

### Stage 2: Intermediate Calculations

#### Correction factor

\([1]\text{correction factor} = \frac{(\Sigma{x_T})^2}{N}= \frac{900}{15}=\) 60

#### Average sum of squares

\([2]\Sigma{\frac{x^2}{n}}=\) 70

#### Restate the sum of squared scores

\([3]\Sigma{x^2_T}=\) 70.12

### Stage 3: Calculating Sum of Squares (SS)

#### Sum of Squares between group

\(SS_{B} = [2] - [1]= 70 - 60 =\) 10

\(SS_T = [3] - [1] =\) 70.12 - 60 = 10.12

\(SS_W = SS_T - SS_{B} =\) 10.12 - 10 = 0.12

### Stage 4: Completing the \(F\) table

#### Mean square between groups

\(MS_{B} = \frac{SS_{B}}{df_{B}} = \frac{10}{2}=\) 5

\(MS_W = \frac{SS_W}{df_W} = \frac{0.12}{12}=\) 0.01

\(F_{obt} = \frac{MS_{B}}{MS_W}=\frac{5}{0.01} =\) 500

The following table summarizes the results of these calculations. The large \(F\) value of 500 and the very small \(p\) value suggest that the observed variation among the means would be extremely unlikely if the model assumptions are met and with the further assumption that the null hypothesis is true.

Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|

group | 2 | 10.00 | 5.00 | 500.00 | 0.0000 |

### Software Calculations

#### R

It is much easier to calculate ANOVA with statistical software. In R this can be done as follows. Given the dataframe object named `d`

, which contains the variables `score`

and `group`

containing the values give in the example data table above, the `aov`

function can be used as follows:

`summary(aov(score ~ group, d))`

```
Df Sum Sq Mean Sq F value Pr(>F)
group 2 10.00 5.00 500 2.8e-12 ***
Residuals 12 0.12 0.01
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

This gives the same results as the hand calculations.

#### SPSS

Similary, in SPSS, with a data set containing the relevant variables you can calculate an ANOVA with the following code

`anova score by group.`

which give the same results.

#### Stata

In stata you can execute ANOVA as follows:

```
. oneway score group
```

Which give you:

```
Analysis of Variance
Source SS df MS F Prob > F
------------------------------------------------------------------------
Between groups 10 2 5 500.00 0.0000
Within groups .119999914 12 .009999993
------------------------------------------------------------------------
Total 10.1199999 14 .722857137
Bartlett's test for equal variances: chi2(2) = 0.0000 Prob>chi2 = 1.000
```

## Conceptual Example 2: Negligible Between Subject Variance, with Substantial Within Group Variance

group | score | score2 |
---|---|---|

Group A | ||

### Stage 1: Preliminary Calculations

group | n | mean | sd | var | sum.x | sum.x2 | |
---|---|---|---|---|---|---|---|

1 | A | 5 | 1.00 | 1.00 | 1.00 | 5.00 | 9.00 |

\(N=\) 15

\(df_{B} = k -1 =\) 2

\(df_W = N - k =\) 12

\(df = N - 1=\) 14

\(\Sigma{x_T} =\) 16.5

\(\Sigma{x^2_T} =\) 30.25

\((\Sigma{x_T})^2 =\) 272.25

\(\Sigma{\frac{x^2}{n}}=\) 18.25

### State 2: Intermediate Calculations

#### Correction factor

\([1]\text{correction factor} = \frac{(\Sigma{x_T})^2}{N}= \frac{900}{15}=\) 18.15

#### Average sum of squares

\([2]\Sigma{\frac{x^2}{n}}=\) 18.25

#### Restate the sum of squared scores

\([3]\Sigma{x^2_T}=\) 30.25

### Stage 3: Calculating Sum of Squares (SS)

#### Sum of Squares between group

\(SS_{B} = [2] - [1]= 18.25 - 18.15 =\) 0.1

\(SS_T = [3] - [1] =\) 30.25 - 18.15 = 12.1

\(SS_W = SS_T - SS_{B} =\) 12.1 - 0.1 = 12

### Stage 4: Completing the \(F\) table

#### Mean square between groups

\(MS_{B} = \frac{SS_{B}}{df_{B}} = \frac{0.1}{2}=\) 0.05

\(MS_W = \frac{SS_W}{df_W} = \frac{12}{12}=\) 1

\(F_{obt} = \frac{MS_{B}}{MS_W}=\frac{0.05}{1} =\) 0.05

Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|

group | 2 | 0.10 | 0.05 | 0.05 | 0.9514 |

## Conceptual Example 3: Post-Hoc Comparisons

group | score | \(\text{score}^2\) | |
---|---|---|---|

Group A | |||

### Stage 1: Preliminary Calculations

group | n | mean | sd | var | sum.x | sum.x2 | |
---|---|---|---|---|---|---|---|

1 | A | 5 | 1.00 | 0.10 | 0.01 | 5.00 | 5.04 |

\(N = n \times k =\) 15

\(df_{B} = k -1 =\) 2

\(df_W = N - k =\) 12

\(df = N - 1=\) 14

\(\Sigma{x_T} =\) 20

\(\Sigma{x^2_T} =\) 30.12

\((\Sigma{x_T})^2 =\) 400

\(\Sigma{\frac{x^2}{n}}=\) 30

### State 2: Intermediate Calculations

#### Correction factor

\([1]\text{correction factor} = \frac{(\Sigma{x_T})^2}{N}= \frac{900}{15}=\) 26.67

#### Average sum of squares

\([2]\Sigma{\frac{x^2}{n}}=\) 30

#### Restate the sum of squared scores

\([3]\Sigma{x^2_T}=\) 30.12

### Stage 3: Calculating Sum of Squares (SS)

#### Sum of Squares between group

\(SS_{B} = [2] - [1]= 30 - 26.67 =\) 3.33

\(SS_T = [3] - [1] =\) 30.12 - 26.67 = 3.45

\(SS_W = SS_T - SS_{B} =\) 3.45 - 3.33 = 0.12

### Stage 4: Completing the \(F\) table

#### Mean square between groups

\(MS_{B} = \frac{SS_{B}}{df_{B}} = \frac{3.33}{2}=\) 1.67

\(MS_W = \frac{SS_W}{df_W} = \frac{0.12}{12}=\) 0.01

\(F_{obt} = \frac{MS_{B}}{MS_W}=\frac{1.67}{0.01} =\) 166.67

Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|

group | 2 | 3.33 | 1.67 | 166.67 | 0.0000 |

## Pairwise Comparisons

It is important to control the **experimentwise alpha level** so that the probability of rejecting the null, assuming the null is true is equal to or less than the pre-established alpha level (e.g. \(\alpha = .05\)). I discuss two methods for doing this below.

### Fisher’s Least Significant Difference (LSD) Test

Fisher’s LSD is the most liberal post-hoc test generally accepted in published research.

\[ \text{Fisher's LSD}: t_\alpha \sqrt{MS_W \bigg( \frac{1}{n_1} + \frac{1}{n_2}\bigg)}, \] where \(t_\alpha\) is the critical value for a two-tailed \(t\) test at \(\alpha = .05\).

For our example 3 data, we calculate Fisher’s LSD as follows:

\[ \text{Fisher's LSD} =1.78 \sqrt{0.01 \bigg(\frac{1}{5} + \frac{1}{5}\bigg)} = 0.11 , \] where \(t_\alpha =\) 1.78 is the \(t\) value with degrees of freedom (df = 12) from the error degrees of freedom found in the \(F\) table above.

### Tukey’s Honestly Significant Difference (HSD) Test

Tukey’s HSD is more conservative than Fisher’s LSD.

\[
\text{Tukey's HSD}: q_\alpha \sqrt{\frac{MS_W}{n}},
\] where \(q_\alpha\) is the **studentized range statistics** which can be found using table B.4 in the appendix of the textbook.

For our example 3 data, Tukey’s HSD is calculated as

\[ \text{Tukey's HSD} = 3.77 \sqrt{\frac{0.01}{5}} = 0.17. \]

## References

Privitera, G. J. (2015). Statistics for the behavioral sciences. Los Angeles: Sage.