Back to main page

Lesson 01: Analyzing Patient Data

Slicing (Subsetting) Data

A subsection of a data frame is called a slice. We can take slices of character vectors as well:

animal <- c("m", "o", "n", "k", "e", "y")
# first three characters
animal[1:3]
## [1] "m" "o" "n"
# last three characters
animal[4:6]
## [1] "k" "e" "y"
  1. If the first four characters are selected using the slice animal[1:4], how can we obtain the first four characters in reverse order?

  2. What is animal[-1]? What is animal[-4]? Given those answers, explain what animal[-1:-4] does.

  3. Use a slice of animal to create a new character vector that spells the word “eon”, i.e. c("e", "o", "n").

Plotting Data

Create a plot showing the standard deviation of the inflammation data for each day across all patients.


Lesson 02: Creating Functions

Functions to Create Graphs

Write a function called analyze that takes a filename as a argument and displays the three graphs produced in the previous lesson (average, min and max inflammation over time). analyze("data/inflammation-01.csv") should produce the graphs already shown, while analyze("data/inflammation-02.csv") should produce corresponding graphs for the second data set. Be sure to document your function with comments.


Lesson 03: Analyzing Multiple Data Sets

Printing Numbers

R has a built-in function called seq that creates a list of numbers:

seq(3)
## [1] 1 2 3

Using seq, write a function that prints the first N natural numbers, one per line:

print_N(3)
## [1] 1
## [1] 2
## [1] 3

Summing Values

Write a function called total that calculates the sum of the values in a vector. (R has a built-in function called sum that does this for you. Please don’t use it for this exercise.)

ex_vec <- c(4, 8, 15, 16, 23, 42)
total(ex_vec)
## [1] 108

Exponentiation

Exponentiation is built into R:

2^4
## [1] 16

Write a function called expo that uses a loop to calculate the same result.

expo(2, 4)
## [1] 16

Using Loops to Analyze Multiple Files

Write a function called analyze_all that takes a filename pattern as its sole argument and runs analyze for each file whose name matches the pattern.


Lesson 04: Making Choices

Choosing Plots Based on Data

Write a function plot_dist that plots a boxplot if the length of the vector is greater than a specified threshold and a stripchart otherwise. To do this you’ll use the R functions boxplot and stripchart.

dat <- read.csv("data/inflammation-01.csv", header = FALSE)
plot_dist(dat[, 10], threshold = 10)     # day (column) 10
plot_dist(dat[1:5, 10], threshold = 10)  # samples (rows) 1-5 on day (column) 10

Changing the Behavior of the Plot Command

One of your collaborators asks if you can recreate the figures with lines instead of points. Find the relevant argument to plot by reading the documentation (?plot), update analyze, and then recreate all the figures with analyze_all.

Back to main page