Microbial Informatics

Lecture 04

Patrick D. Schloss, PhD (microbialinformatics.github.io)
Department of Microbiology & Immunology

Announcements

  • No class on Thursday (9/18) or Friday (9/19).

Review

  • Comments
    • Use your varible names as comments
    • Comment your code with # (console) or ## (knitr)
  • Variables hold information
    • Numeric/double/integer: counts of things, measurements
    • Characters/strings: DNA sequence, amino acids, names
    • Logical: is something true or not
    • Functions: more complex...

Learning objectives

  • Review different data types
  • Learn how to create and manipulate vectors

Numerical varaibles

x <- pi
y <- 2
z <- -3

String variables

office <- "1520A MSRB I"
grade <- "A"
genome <- "ATGCATCGTCCCGT"
  • Note: the the grade value is in quotes. What happens if it is not in quotes?

Logical values as inputs (T/F; 1/0)

x <- TRUE
y <- FALSE

!x              # NOT operator
x && y          # AND operator
x & y           # bitwise AND operator (vectors)
x || y          # OR operator
x | y           # bitwise OR operator (vectors)
x == y          # is equal operator
x != y          # is not equal operator
  • Logical variables will be very useful when selecting subsets of data to work with

Logical values as outptus on numbers

x <- 5
y <- 3

x > y          # greater than operator
x >= y         # greater than or equal to operator
x < y          # less than operator
x <= y         # less than  or equal tooperator
x == y         # is equal to operator
x != y         # is not equal to operator

Logical values as outptus on strings

x <- "ATG"
y <- "CCC"

x > y          # greater than operator
x >= y         # greater than or equal to operator
x < y          # less than operator
x <= y         # less than  or equal tooperator
x == y         # is equal to operator
x != y         # is not equal to operator

Converting

as.numeric(x)
as.logical(x)
as.character(x)
  • There are other conversions that can be done. How would you figure out which converters are out there?
  • Be sure to understand the "side effects" of the conversions

Types of containers

  • Vectors
  • List
  • Matrix
  • Table
  • Data table
  • Factors
  • We will go through these more in detail throughout the course and especially in second half of the course

Vectors

  • One-dimensional sets of values of the same type
  • Everything in R is some form of a vector
  • You can read in vectors from a file or create them on the fly. Four common ways of creating a vector include using c(), :, rep(), seq(). Here are several examples:
19:55                   # list the values from 19 to 55 by ones
c(1,2,3,4)              # concatenate 1, 2, 3, 4, 5 into a vector
rep("red", 5)           # repeat "red" five times
seq(1,10,by=3)          # list the values from 1 to 10 by 3's
seq(1,10,length.out=20) # list 20 evenly spaced elements from 1 to 10
seq(1,10,len=20)        # same thing; arguments of any function can be 
c(rep("red", 5), rep("white", 5), rep("blue", 5))
rep(c(0,1), 10)
countToTen <- 1:10

Operations act on vectors

countToTen <- 1:10
length(countToTen)
countToTen
countToTen^2
countToTen > 5
typeof(countToTen)
is.vector(countToTen)  

Indexing into vectors

  • Note that in contrast to many programming languages, vectors in R are indexed such that the first value is 1 NOT 0.
code <- c("A", "T", "G", "C")

code[2]             # get the second element
code[0]             # errr...
code[-1]            # remove the first element
code[c(1,2)]        # get the first and second elements
code[code > "M"]    # get any element greater than "M"
  • What does this do?
code[length(code)]

Defining a vector

z <- numeric(5)         #   This creates a numerical vector with 5 zeros
z[3] <- 10
z
z[1:3] <- 5
z
z[10] <- pi             #   NA's are inserted between 5 and 9
z[4] <- "R rocks!"      #   everything changes to a character

t <- character(5)
t[4] <- "DNA rocks!"

Indexing by characters

  • You can also create vectors that are indexed by character strings
  • In some programming languages these are called hash-maps or look-up tables.
v <- numeric(0)
v["A"] <- 1.23498
v["T"] <- 2.2342
v["C"] <- 3
v["G"] <- 4
v["A"]
v[["A"]]      # strips the name associated with value 1

v2 <- c(A=1.23498,T=2.2342,C=3,G=4)

Naming cells in your vectors

names(v)
names(v) <- c("A", "B", "C", "D")
names(v) <- NULL  # this removes names attribute

Sorting vectors

z <- runif(10)  #generates a vector with 10 random numbers in it
z
sort(z) #sort the vector
order(z)    #get the correct order of the elements in the vector

#sort a vector, matrix, data frame using the order command
o <- order(z)
z[o]

For Tuesday

  • Start working on new assignment that will be posted this weekend
  • Read Introduction to Statistics with R (Chapters 1 and 2)

Questions?