Microbial Informatics

Lecture 01

Patrick D. Schloss, PhD (microbialinformatics.github.io)
Department of Microbiology & Immunology

Learning objectives

  • Understand the concepts of reproducible research
  • Learn what's wrong with how we currently do data analysis
  • Gain exposure to tools that will help us to do reproducible research


  • Need for...

    • electron/paper trail
    • programming
    • statistics
  • Datasets are not getting any smaller

  • Great demand for numeracy skills

Objectives for semester

  • Develop skills that will foster more reproducible data analysis
  • Learn to adopt the appropriate statistical test for different applications
  • Create scripts that will perform custom data analysis in a reproducible manner

Reproducible data analysis

  • "ultimate product of academic research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research" - Wikipedia
  • Notable problems (see Nature special issue)
    • Inability to replicate analyses
    • Inability to replicate experiments
  • Not claiming any impropriety
  • Ever tried to replicate someone else's analysis?

What would it take to replicate someone's analysis?

  • Raw data
  • Software versions
  • Detailed data processing steps

What would be required of you?

  • Making raw data available from a centralized location
    • Laboratory server / dropbox
    • GenBank
    • FigShare
  • A digital notebook
    • Executable documentation?
    • Provided from a centralized location
    • GitHub

What do you currently do?

  • How well could you tell me how you manipulate data in Excel or Prism?
  • Click buttons without writing anything down
  • Digital notebook? Pfffft.

A digital notebook

Embed source code in documents

x <- runif(20)
y <- runif(20)
plot(x, y, xlab="Random X Value", ylab="Random Y Value", main="", col="blue", pch=19, cex=1.25)

Embed source code in documents

plot of chunk unnamed-chunk-2