November 9th, 2022
Computational analyses require methods and notes to be recorded the same way you would for wet lab experiments. An excellent way to do this is via R Markdown documents. R Markdown documents are documents that combine text, R code, and R code output, and figures. They are a great way to produce self-contained and documented statistical analyses.
In this first worksheet, you will learn how to do some basic markdown
editing in addition to the basic use of variables and functions in R.
After you have made a change to the document, press “Knit HTML” in R
Studio and see what kind of a result you get. Note: You
may have to disable pop-ups to get this to work; moreover, if the
document contains any erroneous code (e.g., typos), you will get an
error that prevents the knit. You can bypass this by debugging the code
or commenting it out with a #
.
Below I have demonstrated some basic R Markdown features, as described here. In your own work, you can use Markdown syntax to organize your coding notebook.
This text is bold.
This text is in italics.
This is a numbered list:
A bulleted list:
A nested list:
Block quote:
“Science is magic that works.” — Kurt Vonnegut
R code embedded in R chunks will be executed and the output will be shown.
# R code is embedded into this chunk
# when we start a line with '#', that tells R not to interpret the line as code
# this is called commenting code, and documents its purpose
# the code below assigns integers to the variables 'x' and 'y'
x <- 7
y <- 1029
# we can perform operations with variables
z <- x * y
z
## [1] 7203
# this is a string assigned to the variable 'my_name'
my_name <- "Rachael"
# we can create vectors of integers and strings
nums <- c(4, 8, 3, 6, 9)
fruits <- c("strawberries", "bananas", "apples", "peaches", "mangos")
# and combine them into a table using the data.frame() function
grocery_list <- data.frame(fruits, nums)
grocery_list
## fruits nums
## 1 strawberries 4
## 2 bananas 8
## 3 apples 3
## 4 peaches 6
## 5 mangos 9
# there are a number of ways to extract specific information from a table
# for instance, selecting the first column:
grocery_list[1]
## fruits
## 1 strawberries
## 2 bananas
## 3 apples
## 4 peaches
## 5 mangos
grocery_list['fruits']
## fruits
## 1 strawberries
## 2 bananas
## 3 apples
## 4 peaches
## 5 mangos
select(grocery_list, fruits)
## fruits
## 1 strawberries
## 2 bananas
## 3 apples
## 4 peaches
## 5 mangos
# the following code also targets the first column, but extracts the information in a different way
# can you spot the difference?
grocery_list$fruits
## [1] strawberries bananas apples peaches mangos
## Levels: apples bananas mangos peaches strawberries
Problem Set #1:
# your R code here
A function is statement internally (i.e., “under the hood”) coded to
perform a specific task. For instance, the head()
function
displays the first several rows of a data frame or values in a
vector.
R comes with many built-in functions and data sets. Type
data()
in the console to look at a list of all available
data sets. Type ?iris
in the console for more information
about this specific data set. Important: You can ask for help
with any built-in data set or function by typing
?<function>
in the console; for example,
?head
or ?summary
.
You can take a glance at the iris
data set using the
head()
function. Run the code chunk below to test this.
# preview the first few rows a data frame
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
You can also use the summary()
function to see the
summary statistics of a data set at a glance.
# look at summary statistics for the iris data set
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
You can see the column names of iris
from the code
output above. We can perform calculations on this data set using a
number of functions built into R. See the example below, which
calculates the median of the Sepal.Length
column.
# calculate median sepal length across all species of iris
median(iris$Sepal.Length)
## [1] 5.8
Problem Set #2:
Petal.Length
column using the
mean()
function.Petal.Width
column using the
range()
function.# your R code here
There are several ways to upload data into your R environment. We covered one way in Part 1 of the worksheet: manual entry. However, this is clearly not feasible for big data sets–more often, we want to read in a file containing our data. Also, we tend to modify data frames and save them to a new file.
Problem Set #3:
mushrooms_small.csv
from the
“Test data set” link on the class webpage.read_csv()
function to read the file, and save
it as a data frame called mushrooms
. Important: The
file name must be given to the function as a string.head()
function to preview the first 10
rows of the new data frame. Specify the integer as the second
argument of the function.head()
function as a new data
frame called mushrooms_tiny
.write_csv
function to write the data frame
mushrooms_tiny
to a new .csv
file.
Important: The file name must be given to the function as a
string.Note: If you are coding on a local installation of R, you will have to specify a path to the location of the file or move the file to the working directory. Local installations of R do not have an “Upload” function. These concepts are covered at the end of this section.
# your R code here
For this class, we are using a computer server where everyone has a
preset working directory associated with your unique student ID number.
Run getwd()
to see the file path to your working directory.
On a local installation, the output of this function might look
something like C:/Users/Rachael/Documents
.
# output the file path associated with the current working directory
getwd()
## [1] "/stor/home/student20"
This is the directory R will default to for reading and writing files. For real life projects, we keep all the information we need organized into folders (aka sub-directories). More often than not, we have to tell R which sub-directory we want to read a file from or write a file to. Perform the following steps to familiarize yourself with file paths and R’s perception of where files are:
mushrooms_tiny.csv
by checking the box.list.files()
to see all the files in the current
working directory.list.files("new_data")
to see the files in the new
sub-directory.full.names = TRUE
as the second argument in the
function.# your R code here
Clear your global environment (the broom symbol in the top right
window). Read the file in the sub-directory “new_data” using
read_csv
. The function will need the full path given by the
output from the code chunk above.
# your R code here