- Homework 3 is out
- For the next two Fridays, the first hour of class will be a lecture
- Read
(Chapter 5: T- and Wilcoxon tests)*Introduction to Statistics with R*

Patrick D. Schloss, PhD (microbialinformatics.github.io)

Department of Microbiology & Immunology

- Homework 3 is out
- For the next two Fridays, the first hour of class will be a lecture
- Read
(Chapter 5: T- and Wilcoxon tests)*Introduction to Statistics with R*

- Human are poor sources of random numbers!
- Can perform simulations using random number generators
- Binomial distribution is a discrete distribution
- Several
`xbinom`

functions in R

- Describe the different types of hypotheses
- How to test associations between tabular data
- How to generate tabular data

- What is a hypothesis?
- Testable
- Falsifiable
- Leads to predictions
- Cannot prove a hypothesis is true!

- What's are null and alternative hypotheses?
- Null hypothesis is the hypothesis you are trying to reject
- Cannot prove a null hypothesis true, can only reject

- Errors
- Type I: Falsely rejecting a null hypothesis
- Type II: Falsely supporting a null hypothesis

- P-value
- The probability that you would see something as extreme or more so if the null hypothesis is true (Type I)
- Generally willing to accept a Type I error of 0.05

- Power
- The ability to reject a null hypothesis if false (Type II)
- Generally desire a power of 0.80

- Single tailed
- Probability that the difference is greater than expected
- Example: Are there more males than females?

- Two-tailed:
- Probability that the difference is greater or less than expected
- Example: Is there a difference in the number of males and females?

- How would we know whether we had a even sampling of M & F in the previous example?
- How to test?
- Binomial distribution

```
pbinom(2, 10, 0.5)
2 * pbinom(2, 10, 0.5)
binom.test(2, 10, 0.5)
binom.test(2, 10, 0.5, alternative = "less")
```

```
metadata <- read.table(file = "wild.metadata.txt", header = T)
rownames(metadata) <- metadata$Group
metadata <- metadata[, -1]
```

- Will convert your observations (individuals by variables) into tables (varaibles by variables)

```
table(metadata$Sex)
table(metadata$Sex, metadata$SP)
table(metadata$Sex, metadata$SP, metadata$Repro)
```

```
sex.sp <- table(metadata$Sex, metadata$SP)
margin.table(sex.sp)
margin.table(sex.sp, 1)
margin.table(sex.sp, 2)
```

```
prop.table(sex.sp)
prop.table(sex.sp, 1)
prop.table(sex.sp, 2)
```

```
##
## PL PMG
## F 24 36
## M 34 17
```

- What would you expect if there was no difference in the ratio?

```
species.sums <- margin.table(sex.sp, 2)
binom.test(species.sums)
```

```
##
## Exact binomial test
##
## data: species.sums
## number of successes = 58, number of trials = 111, p-value = 0.7044
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.4256 0.6182
## sample estimates:
## probability of success
## 0.5225
```

```
sex.sums <- margin.table(sex.sp, 1)
binom.test(sex.sums)
```

```
##
## Exact binomial test
##
## data: sex.sums
## number of successes = 60, number of trials = 111, p-value = 0.4478
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.4433 0.6355
## sample estimates:
## probability of success
## 0.5405
```

- What would be expected?

```
frac.male <- sex.sums["M"]/sum(sex.sums)
frac.female <- 1 - frac.male
frac.sex <- c(F = frac.female, M = frac.male)
frac.pl <- species.sums["PL"]/sum(species.sums)
frac.pmg <- 1 - frac.pl
frac.species <- c(PL = frac.pl, PMG = frac.pmg)
expected <- frac.sex %*% t(frac.species)
expected <- expected * sum(sex.sp)
expected
```

```
## PL.PL PMG.PL
## [1,] 31.35 28.65
## [2,] 26.65 24.35
```

\[\chi^2 =\sum \frac{(O_{ij}-E_{ij})^2}{E_{ij}}\]

- Test against a ChiSquared distribution with: \[df=(n_{row}-1)(n_{col}-1)\]

- The distribution of a sum of the squares of k indepdentently sampled normal random variables, where k is the degrees of freedom
- Procedure to create an empirical distribution
- Draw k random variables from a normal distribution with mean 0.0 and standard deviation of 1.0
- Square each of them
- Sum them
- Repeat many times and keep track of how many times you see each value

- Interested in the proportion of the distribution larger than our test statistic

```
chi.sq <- sum((expected - sex.sp)^2/expected)
df <- (nrow(sex.sp) - 1) * (ncol(sex.sp) - 1)
plot(seq(0, 20, 0.05), dchisq(seq(0, 20, 0.05), df = df), type = "l", xlab = "ChiSquared Statistic",
ylab = "Probability with 1 degree of freedom")
arrows(x0 = chi.sq, x1 = chi.sq, y0 = 0.4, y1 = 0.05, lwd = 2, col = "red")
```

```
chisq.test(sex.sp)
```

```
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: sex.sp
## X-squared = 6.825, df = 1, p-value = 0.00899
```

- Imagine a 2x2 table and the cells are a, b, c, d going from left to right and top-down
- The probability of seeing a specific table format...

\[P=\frac{(a+b)!(c+d)!(a+c)!(b+d)!}{a!b!c!d!}\]

- Need to then find probability of more extreme tables via permutation
- Can be computationally intensive

```
fisher.test(sex.sp)
```

```
##
## Fisher's Exact Test for Count Data
##
## data: sex.sp
## p-value = 0.007411
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.1418 0.7765
## sample estimates:
## odds ratio
## 0.3368
```

- Null: the fraction of males is the same in both species

```
males <- sex.sp["M", ]
prop.test(males, species.sums)
```

```
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: males out of species.sums
## X-squared = 6.825, df = 1, p-value = 0.00899
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.06891 0.46199
## sample estimates:
## prop 1 prop 2
## 0.5862 0.3208
```

- Generally need at least 5 observations per cell of your table (rule of thumb)
- Test possible with more that two levels per factor, but may be necessary use simulations to build distributions
- Fisher exact test is best approach, but can be computationally demanding