Day 1: Tidying data

In-class worksheet

June 29, 2020

In this worksheet, we will use the library tidyverse:

library(tidyverse)

Tidying data (pivot_longer(), pivot_wider())

1.1 Making tables longer

Consider the following two data sets, male_haireyecolor and female_haireyecolor. The data sets record the occurrence of hair and eye color phenotype combinations in a class of statistics students. Use head() to preview these data sets; are they tidy?

# download male data set
male_haireyecolor <- read_csv("https://rachaelcox.github.io/classes/datasets/male_haireyecolor.csv")
head(male_haireyecolor)
## # A tibble: 4 x 5
##   Hair  Brown  Blue Hazel Green
##   <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Black    32    11    10     3
## 2 Brown    53    50    25    15
## 3 Red      10    10     7     7
## 4 Blond     3    30     5     8
# download female data set
female_haireyecolor <- read_csv("https://rachaelcox.github.io/classes/datasets/female_haireyecolor.csv")
head(female_haireyecolor)
## # A tibble: 4 x 5
##   Hair  Brown  Blue Hazel Green
##   <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Black    36     9     5     2
## 2 Brown    66    34    29    14
## 3 Red      16     7     7     7
## 4 Blond     4    64     5     8

The data set is not tidy, because the columns Brown, Blue, Hazel, and Green are observations for a single variable, eye color. The following versions of the tables are tidy:

# your R code here

1.2 Making tables wider

Consider the following data set persons, which contains information about the sex, weight, and height of 200 individuals:

persons <- read_csv("https://rachaelcox.github.io/classes/datasets/persons.csv")
head(persons)
## # A tibble: 6 x 3
##   subject indicator value
##     <dbl> <chr>     <chr>
## 1       1 sex       M    
## 2       1 weight    77   
## 3       1 height    182  
## 4       2 sex       F    
## 5       2 weight    58   
## 6       2 height    161

Is this data set tidy? And can you rearrange it so that you have one column for subject, one for sex, one for weight, and one for height?

The data set is not tidy, because neither indicator nor value are variables. The variables are subject, sex, weight, height. The following version of the table is tidy:

# your R code here