Introduction
This is the GitHub repository for a graduate level course in bioinformatics at the University of Michigan. The goals of the course include using the R programming language to teach the following concepts:
- The practices of reproducible research
- Statistical analysis
- Computer programming
The course is offered in the fall of even years.
Course description. Increasingly, microbiologists are generating large and varied datasets that must be integrated with data from traditional approaches to test hypotheses and identify new avenues of research. This course will give microbiologists the background they need to design robust experiments, implement traditional statistical approaches for small and large datasets, and utilize the R statistical programming software to perform well-documented and reproducible results. The R statistical software language will be used throughout the course to introduce students to statistical techniques and computer programming. R is a powerful open source programming language that anyone can contribute code to. This has resulted in its widespread use, especially in the field of biostatistics and bioinformatics. In addition to a comprehensive suite of statistical resources, R is capable of producing highly customizable data visualizations.
Course structure. Microbial Informatics will consist of two hours of lecture per week plus a two-hour hands on computer session. The lectures will present the theory behind concepts and the computer sessions will allow students to complete exercises in pairs. The first half of the course will emphasize R’s statistical foundation and the second half will introduce students to computer programming using R. In each half, students will complete a project. For the first half, students will analyze some element of data from their research using concepts that were covered in class. In the second half, students will create an R-based program to address a biological question; it is hoped that the biological question is related to the students’ research. Students are expected to provide their own laptop computers – Mac and Windows are acceptable.
Texts
Required:
- Introductory Statistics with R by Peter Dalgaard [ pdf, Amazon ]
- The Art of R programming by Norman Matloff [ pdf, Amazon ]
Supplemental:
- Bioinformatics data skills by Vincent Buffalo [ pdf, O’Reilly ]
- ProGit by Scott Chacon [ pdf, Amazon ]
Assignments
The following links go to github repositories that have the homework assignments. Homeworks account for 50% of the grade:
- Assignment 01: Exploring github and markdown (Due 9/12/2014)
- Assignment 02: Manipulating data structures in R (Due 9/26/2014)
- Assignment 03: Plotting, randomness, and tabular data (Due 10/10/2014)
- Assignment 04: Programming, functions, and data structures (Due 10/31/2014)
- Assignment 05: Functions, simulation, reproducible research, collaboration (Due 11/26/2014)
Projects
The following links go to github repositories that have the two projects for the course. Each project accounts for 25% of the grade:
- Project 01: Data analysis (Due 10/24/2014)
- Project 02: Software engineering (Due 12/16/2014)
Lectures
The following links go to html-based slide stacks that have embedded links to other resources. They are best viewed in the Chrome browser. If you want to see the slides in “presenter mode”, then add ?presentme=true
at the end of the url. If you want to see the presenter notes for each slide press p
on each slide. I’ve also added the number of slides in each deck. If you don’t see the correct number of slides, please hit refresh until you do. If this does not work, then save the slide deck as a pdf. Even if you only see 4 slides in the deck, all of the slides will go into the pdf.