Microbial Informatics

Lecture 07

Patrick D. Schloss, PhD (microbialinformatics.github.io)
Department of Microbiology & Immunology

Announcements

  • When you upload your assignments, upload the README.Rmd and README.md files generated by RStudio/knitr
  • Will use first hour tomorrow to cover subject material and second hour to help people with assignment
  • Start thinking about your project:
    • Emphasis on data analysis
    • Due 10/24/2104 (friday)
    • Feel free to come to office hours to discuss project ideas
    • I have some ideas for microbial ecology analysis projects

Review

  • Everything in R is some form of a vector - even output
  • R has a rich set of descriptive statistics that can be used to simplify datasets

Learning objectives

  • Histograms
  • Box plots
  • Bar plots
  • Strip charts

Histograms

metadata <- read.table(file = "wild.metadata.txt", header = T)
rownames(metadata) <- metadata$Group
metadata <- metadata[, -1]

Data visualization

  • This is a huge area of explorataion
  • R is tremendously powerful for generating plots and data visualizaiton tools
  • Can generally tell someone used MS Excel by how bad the plots look
  • Can certainly generate crap in R, but upside is greater
  • Numerous packages available, but we will focus on base package until the end of the semester:
    • Lattice
    • ggplot2
    • rgl

Histograms

  • Good for summarizing continuous data where you want to break it into discrete classes
  • What do these two sets of commands do?
par(mfrow = c(2, 1))  # make two plotting windows
hist(metadata$Weight[metadata$Sex == "F"])
hist(metadata$Weight[metadata$Sex == "M"])
par(mfrow = c(1, 1))  # return to one plotting window

par(mfrow = c(2, 1))
hist(metadata$Weight[metadata$Sex == "F"], breaks = 10, ylim = c(0, 20), xlim = c(0, 
    30))
hist(metadata$Weight[metadata$Sex == "M"], breaks = 10, ylim = c(0, 20), xlim = c(0, 
    30), add = T)
par(mfrow = c(1, 1))

Merging plots with add=T

hist(metadata$Weight[metadata$Sex == "F"], breaks = 10, ylim = c(0, 20), xlim = c(0, 
    30), col = "pink")
hist(metadata$Weight[metadata$Sex == "M"], breaks = 10, col = "blue", add = T)

Universal options

  • Axis labels
hist(metadata$Weight, xlab = "Weights of Peromyscus spp.")
  • Plot title
hist(metadata$Weight, main = "Distribution of Peromyscus spp. weights")
  • Putting it together
hist(metadata$Weight, main = "Distribution of Peromyscus spp. weights", xlab = "Weights of Peromyscus spp.")
box()

What is the output from hist?

m.hist <- hist(metadata$Weight[metadata$Sex == "F"], breaks = 10, ylim = c(0, 
    20), xlim = c(0, 30), col = "pink")
f.hist <- hist(metadata$Weight[metadata$Sex == "M"], breaks = 10, col = "blue", 
    add = T)

Boxplots

  • Multiple histograms gets pretty cludgey
  • Box plots allows you to plot out the results of the summary command
boxplot(metadata$Weight)  #\tmin, 25%tile, Median, 75%tile, max, outliers
boxplot(metadata$Weight[metadata$Sex == "F"], metadata$Weight[metadata$Sex == 
    "M"])
boxplot(metadata$Weight ~ metadata$Sex)

Barplots

avg.weights <- aggregate(Weight~Sex, data=metadata, mean)
barplot(height=avg.weights$Weight, names.arg=avg.weights$Sex)

counts <- table(metadata$Sex,metadata$SP)
counts
barplot(counts)
barplot(counts, beside=T)

Stripcharts

  • Perhaps we don't have a ton of points and we want to see all of the data (think of animal experiments)
stripchart(metadata$Weight)
stripchart(metadata$Weight, method = "jitter")
stripchart(metadata$Weight ~ metadata$Sex, method = "jitter")
stripchart(metadata$Weight ~ metadata$Sex, method = "jitter", vertical = T)
stripchart(metadata$Weight ~ metadata$Sex, method = "jitter", jitter = 0.02, 
    vertical = T)
stripchart(metadata$Weight ~ metadata$Sex, method = "jitter", jitter = 0.02, 
    vertical = T, pch = 19)

What's happening here?

stripchart(metadata$Weight ~ metadata$Sex, method = "jitter", jitter = 0.02, 
    vertical = T, pch = c(18, 19), col = c("red", "blue"))

For Friday

  • Assignment due Friday
  • Read Introduction to Statistics with R (Chapter 8)

Questions?