June 29th, 2020
Computational analyses require methods and notes to be recorded the same way you would for wet lab experiments. An excellent way to do this is via R Markdown documents. R Markdown documents are documents that combine text, R code, and R output, including figures. They are a great way to produce self-contained and documented statistical analyses.
In this first worksheet, you will learn how to do some basic markdown editing in addition to the basic use of variables and functions in R. After you have made a change to the document, press “Knit HTML” in R Studio and see what kind of a result you get. Note: You may have to disable pop-ups to get this to work.
Try out basic R Markdown features, as described here. Write some text that is bold, and some that is in italics. Make a numbered list and a bulleted list. Make a nested list. Try the block-quote feature.
This text is bold.
This text is in italics.
A numbered list:
A bulleted list:
A nested list:
Block quote:
“If we knew what it was we were doing, it would not be called research, would it?” — Albert Einstein
R code embedded in R chunks will be executed and the output will be shown.
# R code is embedded into this chunk
x <- 5
y <- 7
z <- x * y
z
## [1] 35
Play around with some basic R code, trying the following:
# assigning integers to variables
fav_num <- 6
second_fav_num <- 13
some_new_num <- second_fav_num / fav_num
# assigning strings to variables
fav_enzyme <- "cyclooxygenase"
# creating a vector of strings
fav_foods <- c("sashimi", "jambalaya", "tacos", "bao", "wings")
fav_foods
## [1] "sashimi" "jambalaya" "tacos" "bao" "wings"
# creating a vector of integers
random_nums <- c(6, 13, 21, 51, 63)
random_nums
## [1] 6 13 21 51 63
# combining vectors into a dataframe
new_df <- data.frame(fav_foods, random_nums)
new_df
## fav_foods random_nums
## 1 sashimi 6
## 2 jambalaya 13
## 3 tacos 21
## 4 bao 51
## 5 wings 63
# calling a column in a dataframe
new_df$fav_foods
## [1] sashimi jambalaya tacos bao wings
## Levels: bao jambalaya sashimi tacos wings
A function is statement internally (i.e., “under the hood”) coded to perform a specific task. For instance, the head()
function displays the first several rows of a dataframe or values in a vector.
R comes with many built-in functions and datasets. Type data()
in the console to look at a list of all available datasets. Type ?iris
in the console for more information about this specific dataset. You can take a glance at the iris
dataset using the head()
function. Run the code chunk below to test this.
# preview the first few rows a dataframe
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
You can also use the summary()
function to see the summary statistics of a dataset at a glance. Try this now with the iris
dataset.
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
You can see the column names of iris
from the code output above. Calculate the mean of the Petal.Length
column using the mean()
function. Calculate the range of the Petal.Width
column using the range()
function. Hint: call the columns the same way you did in Part 2 of the worksheet.
# calculate the mean of the `Petal.Length` column in the `iris` dataset
mean(iris$Petal.Length)
## [1] 3.758
# calculate the range of the `Petal.Width` column in the `iris` dataset
range(iris$Petal.Width)
## [1] 0.1 2.5
There are several ways to upload data into your R environment. We covered one way in Part 1 of the worksheet: manual entry. However, this is clearly not feasible for big datasets–more often, we want to read in a file containing our data. Also, we tend to modify dataframes and save them to a new file.
Try the following:
mushrooms_small.csv
from the “Test dataset” link on the class webpage.read_csv()
function to read the file, and save it to a dataframe called mushrooms
. Important: The filename must be given to the function as a string.head()
function to preview the first 10 rows of the new dataframe. Specify the integer as the second argument of the function.head()
function to a new dataframe called mushrooms_tiny
.write_csv
function to write the dataframe mushrooms_tiny
to a new .csv
file. Important: The filename must be given to the function as a string.Note: If you are coding on a local installation of R, you will have to specify a path to the location of the file or move the file to the working directory. Local installations of R do not have an “Upload” function. These concepts are covered at the end of this section.
# read in the dataset from the working directory
mushrooms <- read_csv("mushrooms_small.csv")
## Parsed with column specification:
## cols(
## class = col_character(),
## cap_shape = col_character(),
## cap_surface = col_character(),
## cap_color = col_character(),
## odor = col_character(),
## gill_spacing = col_character(),
## gill_size = col_character(),
## gill_color = col_character(),
## stalk_shape = col_character(),
## stalk_root = col_character(),
## veil_type = col_character(),
## veil_color = col_character(),
## ring_number = col_double(),
## ring_type = col_character(),
## spore_print_color = col_character(),
## population = col_character(),
## habitat = col_character()
## )
# look at the first 10 rows of that dataset
head(mushrooms, 10)
## # A tibble: 10 x 17
## class cap_shape cap_surface cap_color odor gill_spacing gill_size
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 pois… convex scaly red spicy close narrow
## 2 edib… convex smooth red none close broad
## 3 edib… convex smooth gray none crowded broad
## 4 edib… flat scaly brown almo… close broad
## 5 edib… flat fibrous brown none crowded broad
## 6 pois… convex fibrous yellow foul close broad
## 7 pois… convex smooth brown spicy close narrow
## 8 edib… bell scaly white almo… close broad
## 9 edib… knobbed smooth brown none close broad
## 10 edib… bell smooth white anise close broad
## # … with 10 more variables: gill_color <chr>, stalk_shape <chr>,
## # stalk_root <chr>, veil_type <chr>, veil_color <chr>,
## # ring_number <dbl>, ring_type <chr>, spore_print_color <chr>,
## # population <chr>, habitat <chr>
# save the first 10 rows to a new dataframe
mushrooms_tiny <- head(mushrooms, 10)
# write the new dataframe to a file
write_csv(mushrooms_tiny, "mushrooms_tiny.csv")
For this class, we are using a computer server where everyone has a preset working directory associated with your unique student ID number. Type getwd()
to see the file path to your working directory. On a local installation, the output of this function might look something like C:/Users/Rachael/Documents
.
# output the file path associated with the current working directory
getwd()
## [1] "/stor/home/student50"
This directory is where R auto-directs when you specify a file to read or write. In real life, we keep all the information we need in folders (aka sub-directories). Perform the following steps to familiarize yourself with file paths and R’s perception of where files are:
mushrooms_tiny.csv
by checking the box.list.files()
to see all the files in the current working directory.list.files("day1_data")
to see the files in the new sub-directory.full.names = TRUE
as the second argument in the function.# list files in current working directory
list.files()
## [1] "day1_data" "day1_solutions.Rmd" "day1.html"
## [4] "day1.Rmd" "mushrooms_small.csv" "mushrooms_tiny.csv"
## [7] "R"
# list files in the sub-directory called "day1_data"
list.files("day1_data")
## [1] "mushrooms_small.csv" "mushrooms_tiny.csv"
# list the full path to the files in "day1_data"
list.files("day1_data", full.names = TRUE) # this becomes very useful for reading many sub-directory files at once
## [1] "day1_data/mushrooms_small.csv" "day1_data/mushrooms_tiny.csv"
Clear your global environment (the broom symbol in the top right window). Read the file in the sub-directory “day1_data” using read_csv
. The function will need the full path given by the output from the code chunk above.
read_csv("day1_data/mushrooms_tiny.csv")
## Parsed with column specification:
## cols(
## class = col_character(),
## cap_shape = col_character(),
## cap_surface = col_character(),
## cap_color = col_character(),
## odor = col_character(),
## gill_spacing = col_character(),
## gill_size = col_character(),
## gill_color = col_character(),
## stalk_shape = col_character(),
## stalk_root = col_character(),
## veil_type = col_character(),
## veil_color = col_character(),
## ring_number = col_double(),
## ring_type = col_character(),
## spore_print_color = col_character(),
## population = col_character(),
## habitat = col_character()
## )
## # A tibble: 10 x 17
## class cap_shape cap_surface cap_color odor gill_spacing gill_size
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 pois… convex scaly red spicy close narrow
## 2 edib… convex smooth red none close broad
## 3 edib… convex smooth gray none crowded broad
## 4 edib… flat scaly brown almo… close broad
## 5 edib… flat fibrous brown none crowded broad
## 6 pois… convex fibrous yellow foul close broad
## 7 pois… convex smooth brown spicy close narrow
## 8 edib… bell scaly white almo… close broad
## 9 edib… knobbed smooth brown none close broad
## 10 edib… bell smooth white anise close broad
## # … with 10 more variables: gill_color <chr>, stalk_shape <chr>,
## # stalk_root <chr>, veil_type <chr>, veil_color <chr>,
## # ring_number <dbl>, ring_type <chr>, spore_print_color <chr>,
## # population <chr>, habitat <chr>