June 21st, 2021
Computational analyses require methods and notes to be recorded the same way you would for wet lab experiments. An excellent way to do this is via R Markdown documents. R Markdown documents are documents that combine text, R code, and R code output, and figures. They are a great way to produce self-contained and documented statistical analyses.
In this first worksheet, you will learn how to do some basic markdown editing in addition to the basic use of variables and functions in R. After you have made a change to the document, press “Knit HTML” in R Studio and see what kind of a result you get. Note: You may have to disable pop-ups to get this to work.
Try out basic R Markdown features, as described here.
–Try your Markdown syntax here–
R code embedded in R chunks will be executed and the output will be shown.
# R code is embedded into this chunk
x <- 5
y <- 7
z <- x * y
z
## [1] 35
Play around with some basic R code, trying the following:
# your R code here
A function is statement internally (i.e., “under the hood”) coded to perform a specific task. For instance, the head()
function displays the first several rows of a data frame or values in a vector.
R comes with many built-in functions and data sets. Type data()
in the console to look at a list of all available data sets. Type ?iris
in the console for more information about this specific data set. You can take a glance at the iris
data set using the head()
function. Run the code chunk below to test this.
# preview the first few rows a data frame
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
You can also use the summary()
function to see the summary statistics of a data set at a glance. Try this now with the iris
data set.
# your R code here
You can see the column names of iris
from the code output above. Calculate the mean of the Petal.Length
column using the mean()
function. Calculate the range of the Petal.Width
column using the range()
function. Hint: call the columns the same way you did in Part 2 of the worksheet.
# your R code here
There are several ways to upload data into your R environment. We covered one way in Part 1 of the worksheet: manual entry. However, this is clearly not feasible for big data sets–more often, we want to read in a file containing our data. Also, we tend to modify data frames and save them to a new file.
Try the following:
mushrooms_small.csv
from the “Test data set” link on the class webpage.read_csv()
function to read the file, and save it as a data frame called mushrooms
. Important: The file name must be given to the function as a string.head()
function to preview the first 10 rows of the new data frame. Specify the integer as the second argument of the function.head()
function to a new data frame called mushrooms_tiny
.write_csv
function to write the data frame mushrooms_tiny
to a new .csv
file. Important: The file name must be given to the function as a string.Note: If you are coding on a local installation of R, you will have to specify a path to the location of the file or move the file to the working directory. Local installations of R do now have an “Upload” function. These concepts are covered at the end of this section.
# your R code here
For this class, we are using a computer server where everyone has a preset working directory associated with your unique student ID number. Type getwd()
to see the file path to your working directory. On a local installation, the output of this function might look something like C:/Users/Rachael/Documents
.
# your R code here
This is the directory R will default to for reading and writing files. Ideally, for real life projects, we keep all the information we need organized into folders (aka sub-directories). More often than not, we have to tell R which sub-directory we want to read a file from or write a file to. Perform the following steps to familiarize yourself with file paths and R’s perception of where files are:
mushrooms_tiny.csv
by checking the box.list.files()
to see all the files in the current working directory.list.files("day1_data")
to see the files in the new sub-directory.full.names = TRUE
as the second argument in the function.# your R code here
Clear your global environment (the broom symbol in the top right window). Read the file in the sub-directory “day1_data” using read_csv
. The function will need the full path given by the output from the code chunk above.
# your R code here