# Microbial Informatics

## Lecture 07

Patrick D. Schloss, PhD (microbialinformatics.github.io)
Department of Microbiology & Immunology

## Announcements

• Will use first hour tomorrow to cover subject material and second hour to help people with assignment
• Emphasis on data analysis
• Due 10/24/2104 (friday)
• Feel free to come to office hours to discuss project ideas
• I have some ideas for microbial ecology analysis projects

## Review

• Everything in R is some form of a vector - even output
• R has a rich set of descriptive statistics that can be used to simplify datasets

• Histograms
• Box plots
• Bar plots
• Strip charts

## Histograms

``````metadata <- read.table(file = "wild.metadata.txt", header = T)
``````

## Data visualization

• This is a huge area of explorataion
• R is tremendously powerful for generating plots and data visualizaiton tools
• Can generally tell someone used MS Excel by how bad the plots look
• Can certainly generate crap in R, but upside is greater
• Numerous packages available, but we will focus on base package until the end of the semester:
• Lattice
• ggplot2
• rgl

## Histograms

• Good for summarizing continuous data where you want to break it into discrete classes
• What do these two sets of commands do?
``````par(mfrow = c(2, 1))  # make two plotting windows

par(mfrow = c(2, 1))
hist(metadata\$Weight[metadata\$Sex == "F"], breaks = 10, ylim = c(0, 20), xlim = c(0,
30))
hist(metadata\$Weight[metadata\$Sex == "M"], breaks = 10, ylim = c(0, 20), xlim = c(0,
par(mfrow = c(1, 1))
``````

## Merging plots with `add=T`

``````hist(metadata\$Weight[metadata\$Sex == "F"], breaks = 10, ylim = c(0, 20), xlim = c(0,
30), col = "pink")
``````

## Universal options

• Axis labels
``````hist(metadata\$Weight, xlab = "Weights of Peromyscus spp.")
``````
• Plot title
``````hist(metadata\$Weight, main = "Distribution of Peromyscus spp. weights")
``````
• Putting it together
``````hist(metadata\$Weight, main = "Distribution of Peromyscus spp. weights", xlab = "Weights of Peromyscus spp.")
box()
``````

## What is the output from `hist`?

``````m.hist <- hist(metadata\$Weight[metadata\$Sex == "F"], breaks = 10, ylim = c(0,
20), xlim = c(0, 30), col = "pink")
``````

## Boxplots

• Multiple histograms gets pretty cludgey
• Box plots allows you to plot out the results of the summary command
``````boxplot(metadata\$Weight)  #\tmin, 25%tile, Median, 75%tile, max, outliers
"M"])
``````

## Barplots

``````avg.weights <- aggregate(Weight~Sex, data=metadata, mean)
barplot(height=avg.weights\$Weight, names.arg=avg.weights\$Sex)

counts
barplot(counts)
barplot(counts, beside=T)
``````

## Stripcharts

• Perhaps we don't have a ton of points and we want to see all of the data (think of animal experiments)
``````stripchart(metadata\$Weight)
vertical = T)
vertical = T, pch = 19)
``````

## What's happening here?

``````stripchart(metadata\$Weight ~ metadata\$Sex, method = "jitter", jitter = 0.02,
vertical = T, pch = c(18, 19), col = c("red", "blue"))
``````

## For Friday

• Assignment due Friday
• Read Introduction to Statistics with R (Chapter 8)