Microbial Informatics

Lecture 27

Patrick D. Schloss, PhD (microbialinformatics.github.io)
Department of Microbiology & Immunology

Announcements

  • Final project (due 12/16/2014)
    • Email me your idea by Tuesday
    • Should be a program that others can use to do something useful (I have ideas if you need one)
    • Create a public repository with documentation in README file and license
  • Will have class on Friday, but not next Tuesday

Review

  • We've talked a lot about the R programming language and how we can do it to do useful things and help with our analyses
  • The tools you have now will enable you to do many many things
  • Let's spend the next two lectures talking about advanced programming concepts

Learning objectives

  • Define "Test-driven development (TDD)"
  • See how to implement TDD to address HW4 (the paper assignment)

Think about software you know...

  • How often is it released? What type of problems do those fixes solve?
  • When you do an experiment how do you know whether to trust the results?
  • When you work on an assignment/project how do you test the results?

TDD is a software development process where you...

  1. Create a failing test that describes one a realistic problem you might face
  2. Make sure test fails / see which tests fail
  3. Write enough code to make sure test passes
  4. Run all tests to make sure you haven't broken other test
  5. Simplify the code
  6. Repeat

Philosophy

  • Every line of code has a test (extreme)
  • The programmer writes the tests and so they aren't dependent on others getting requirements
  • Don't Repeat Yourself (DRY)

How did you test your paper reading code?

First command...

  • given a file name, create a list variable that contains any necessary information
  • what would a test look like?
words <- readPaper("../../assignment04/mothur.txt")
## Error in eval(expr, envir, enclos): could not find function "readPaper"
is.list(words)
## Error in eval(expr, envir, enclos): object 'words' not found
  • It fails! (yippee!)

Get it to work...

readPaper <- function(file){
    words <- list()
    return(words)
}

words <- readPaper("../../assignment04/mothur.txt")
is.list(words)
## [1] TRUE
  • Um... cute.

Get it to work...

readPaper <- function(file){
    words <- list()
    return(words)
}

words <- readPaper("../../assignment04/mothur.txt")
is.list(words)
length(words[[1]])==2056
## Error in words[[1]]: subscript out of bounds
## [1] TRUE

Get it to work...

readPaper <- function(file){
    words <- list(scan(file, what=""))
    return(words)
}

words <- readPaper("../../assignment04/mothur.txt")
is.list(words)
length(words[[1]])==2056
## [1] TRUE
## [1] TRUE

Great... next command

  • if I supply the output from readPaper and a word (or a vector of words), tell me how many times the word(s) shows up
wordCount("mothur", words) == 25
## Error in eval(expr, envir, enclos): could not find function "wordCount"
wordCount("the", words) == 133
## Error in eval(expr, envir, enclos): could not find function "wordCount"
wordCount(c("mothur", "the"), words) == c(25, 133)
## Error in eval(expr, envir, enclos): could not find function "wordCount"
  • FAIL!

Great... next command

wordCount <- function(word, wordList){
    sum(wordList[[1]] == word)
}

wordCount("mothur", words) == 25
wordCount("the", words) == 133
wordCount(c("mothur", "the"), words) == c(25, 133)
## [1] FALSE
## [1] TRUE
## [1] FALSE FALSE

Punctuation!

readPaper <- function(file){
    words <- scan(file, what="")
    words <- gsub("\\W", "", words)
    words <- tolower(words)
    return(list(words))
}

wordCount <- function(word, wordList){
    sum(wordList[[1]] == tolower(word))
}

Tests

words <- readPaper("../../assignment04/mothur.txt")
is.list(words)
length(words[[1]])==2056
sum(grepl("\\W", words[[1]])) == 0

wordCount("mothur", words) == 25
wordCount("the", words) == 133
wordCount(c("mothur", "the"), words) == c(25, 133)
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] FALSE FALSE

Reading in a vector

wordCount <- function(word, wordList){
    word <- tolower(word)
    word.count <- numeric() 
    for(w in word){
        word.count[w] <- sum(wordList[[1]]==w)
    }
    names(word.count) <- NULL
    return(word.count)
}

Tests

words <- readPaper("../../assignment04/mothur.txt")
is.list(words)
length(words[[1]])==2056
sum(grepl("\\W", words[[1]])) == 0

wordCount("mothur", words) == 25
wordCount("the", words) == 133
wordCount(c("mothur", "the"), words) == c(25, 133)
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE TRUE

Introducing: testthat

  • Problems with what we've been doing
    • This process can get tedious
    • It's not automated
  • testthat is an R package for doing testing
    • Put test code into a separate file (test-????.R)
    • Code as normal in your R script file (????.R)

test-pschloss.R

words <- readPaper("../../assignment04/mothur.txt")
expect_that(words, is_a("list"))
expect_that(length(words[[1]]), equals(2056))
expect_that(sum(grepl("\\W", words[[1]])), equals(0))
expect_that(wordCount("mothur", words), equals(25))
expect_that(wordCount("the", words), equals(133))
expect_that(wordCount(c("mothur", "the"), words), equals(c(25, 133)))

pschloss.R

readPaper <- function(file){
    words <- scan(file, what="")
    words <- gsub("\\W", "", words)
    words <- tolower(words)
    return(list(words))
}

wordCount <- function(word, wordList){
    word <- tolower(word)
    word.count <- numeric() 
    for(w in word){
        word.count[w] <- sum(wordList[[1]]==w)
    }
    names(word.count) <- NULL
    return(word.count)
}

How to run...

  • Method #1:
library(testthat)
source("pschloss.R")
source("test-pschloss.R")
  • Method #2:
library(testthat)
source("pschloss.R")
test_dir("./")
## ......

Things to know about...

  • Expects the testing file to be named test_something.R
  • Options to go with expect_that include:
    • is_true: truth
    • is_false: falsehood
    • is_a: inheritance
    • equals: equality with numerical tolerance
    • equals_reference: equality relative to a reference
    • is_equivalent_to: equality ignoring attributes
    • is_identical_to: exact identity
    • matches: string matching

More...

  • Options to go with expect_that include:
    • prints_text: output matching
    • throws_error: error matching
    • gives_warning: warning matching
    • shows_message: message matching
    • takes_less_than: performance

Other elements of testthat

  • Can get fancy by defining your own expectation functions
  • Can establish specific contexts with environmental settings, etc.
  • Can automate testing so that it runs the tests whenever you update the source code

Exercise

  • if I give a word, tell me the frequency of words that follow it

Questions?