Introduction

This is the GitHub repository for a graduate level course in bioinformatics at the University of Michigan. The goals of the course include using the R programming language to teach the following concepts:

  • The practices of reproducible research
  • Statistical analysis
  • Computer programming

The course is offered in the fall of even years.

Course description. Increasingly, microbiologists are generating large and varied datasets that must be integrated with data from traditional approaches to test hypotheses and identify new avenues of research. This course will give microbiologists the background they need to design robust experiments, implement traditional statistical approaches for small and large datasets, and utilize the R statistical programming software to perform well-documented and reproducible results. The R statistical software language will be used throughout the course to introduce students to statistical techniques and computer programming. R is a powerful open source programming language that anyone can contribute code to. This has resulted in its widespread use, especially in the field of biostatistics and bioinformatics. In addition to a comprehensive suite of statistical resources, R is capable of producing highly customizable data visualizations.

Course structure. Microbial Informatics will consist of two hours of lecture per week plus a two-hour hands on computer session. The lectures will present the theory behind concepts and the computer sessions will allow students to complete exercises in pairs. The first half of the course will emphasize R’s statistical foundation and the second half will introduce students to computer programming using R. In each half, students will complete a project. For the first half, students will analyze some element of data from their research using concepts that were covered in class. In the second half, students will create an R-based program to address a biological question; it is hoped that the biological question is related to the students’ research. Students are expected to provide their own laptop computers – Mac and Windows are acceptable.


Texts

Required:

  • Introductory Statistics with R by Peter Dalgaard [ pdf, Amazon ]
  • The Art of R programming by Norman Matloff [ pdf, Amazon ]

Supplemental:

  • Bioinformatics data skills by Vincent Buffalo [ pdf, O’Reilly ]
  • ProGit by Scott Chacon [ pdf, Amazon ]

Assignments

The following links go to github repositories that have the homework assignments. Homeworks account for 50% of the grade:

  • Assignment 01: Exploring github and markdown (Due 9/12/2014)
  • Assignment 02: Manipulating data structures in R (Due 9/26/2014)
  • Assignment 03: Plotting, randomness, and tabular data (Due 10/10/2014)
  • Assignment 04: Programming, functions, and data structures (Due 10/31/2014)
  • Assignment 05: Functions, simulation, reproducible research, collaboration (Due 11/26/2014)

Projects

The following links go to github repositories that have the two projects for the course. Each project accounts for 25% of the grade:


Lectures

The following links go to html-based slide stacks that have embedded links to other resources. They are best viewed in the Chrome browser. If you want to see the slides in “presenter mode”, then add ?presentme=true at the end of the url. If you want to see the presenter notes for each slide press p on each slide. I’ve also added the number of slides in each deck. If you don’t see the correct number of slides, please hit refresh until you do. If this does not work, then save the slide deck as a pdf. Even if you only see 4 slides in the deck, all of the slides will go into the pdf.

Class Date Topic Reading
1 9/2 Introduction to reproducible research (29 slides)  
2 9/4 Introduction to RStudio, R Markdown, and Git (35 slides) ProGit 1, 2
  9/5 Computer lab: Exploring github and markdown f
3 9/9 Git best practices, knitr, and Introduction to R (31 slides) ISwR 1, 2
4 9/12 Introduction to R: Variables and containers (20 slides) ISwR 1, 2
  9/12 Computer lab  
5 9/16 Introduction to R: Containers (21 slides) ISwR 1, 2
  9/18 Pat traveling  
  9/19 Computer lab: Data structures / Pat traveling  
6 9/23 Descriptive statistics (15 slides) ISwR 4
7 9/25 Basic plotting (16 slides) ISwR 4
8 9/26 Basic plotting and randomness (14 slides) ISwR 3 and 4
  9/26 Computer lab: Plotting, randomness, tabular data  
9 9/30 Randomness (23 slides) ISwR 3 and 4
10 10/2 Tests on tabular data (24 slides) ISwr 8
11 10/3 T- and Wilcoxon tests (28 slides) ISwR 5
  10/3 Computer lab  
12 10/7 Power analysis (27 slides) ISwR 9
13 10/9 ANOVA and Kruskal-Wallis (30 slides) ISwR 7
14 10/10 Regression and correlation (31 slides) ISwR 6
  10/10 Computer lab  
  10/14 Fall Study Period (no class)  
15 10/16 Introduction to programming (30 slides) TAoRP 1
16 10/17 Data structures: vectors and matrices (19 slides) TAoRP 2, 3
  10/17 Computer lab  
  10/21 Pat traveling  
17 10/23 Data structures: lists, data frames, and factors (29 slides) TAoRP 4, 5, 6
  10/24 Computer lab  
18 10/28 Control statements (28 slides) TAoRP 7
19 10/30 Vectorizing (20 slides) TAoRP 7
  10/31 Computer lab [no class]  
20 11/4 Input and output (27 slides) TAoRP 10
21 11/6 String manipulation I (26 slides) TAoRP 11
  11/7 Computer lab  
22 11/11 String manipulation II (22 slides) TAoRP 11
23 11/13 String manipulation III (25 slides) TAoRP 11
  11/14 Computer lab  
24 11/18 Ruzzle cheat (12 slides)  
25 11/20 Ruzzle cheat  
  11/21 Computer lab  
26 11/25 Ruzzle cheat (39 slides)  
  11/27 Thanksgiving (no class)  
  11/28 Thanksgiving (no class)  
27 12/2 Testing (27 slides)  
28 12/4 Testing / Variable scoping (22 slides) TAoRP 7
29 12/5 Variable scoping / Licenses / Conclusion (20 slides) TAoRP 7
30 12/9 No class - work on projects