ggplot2This 1 hour webinar will focus on making three kinds of visualizations: time-series line graphs, bar charts, and distribution plots. For examples of other kinds of plots (in addition to data transformation vignettes), check the class website for the 4 day R course taught by myself and Cory DuPai: https://rachaelcox.github.io/classes/IntroR_summer_2020.html
September 30, 2020
We will use the brewmats dataset, which is sourced from the US Alcohol and Tobacco Tax and Trade Bureau (TBB, https://www.ttb.gov/beer/statistics), scraped by the TidyTuesday group and cleaned by me. The brewmats dataset contains the following variables:
data_type = denotes that all amounts of material used are given in pounds (lbs)material_type = materials belong in one of two categories, grain (e.g., wheat) or non-grain (e.g., sugar)type = specific material details (e.g., malt, corn, rice, barley, wheat, hops, sugar, etcyear = year the amount of materials used was recorded (2008-2015)month = month the amount of materials used was recorded (1=January, 2=February, 3=March, etc)month_usage_by_type = total amount of material usage by type per month (lbs)month_sum_all_types = total amount of material used for all types (lbs)year_sum_by_type = total amount of material used by type per year# download the `brewmats` dataset
brewmats <- read_csv("https://rachaelcox.github.io/classes/datasets/brewmats.csv")
## Parsed with column specification:
## cols(
## data_type = col_character(),
## material_type = col_character(),
## year = col_double(),
## month = col_double(),
## type = col_character(),
## month_usage_by_type = col_double(),
## month_sum_all_types = col_double(),
## year_sum_by_type = col_double()
## )
Code Along: Plot the total monthly usage of all brewing materials (month_sum_all_types) on the y-axis for every month in the dataset on the x-axis, as a line graph colored by year using geom_line().
# R code here
Practice: Plot the yearly usage (year_sum_by_type) of each type of brewing materials on the y-axis, for every year in the dataset on the x-axis, as a line graph colored by type using geom_line(). Remember to map group = type so that ggplot knows how which lines you want to connect.
# R code here
For this section, we will use the mushrooms dataset (obtained from Kaggle and cleaned by me), which contains the following information:
class = whether the mushroom is edible or poisonouscap_shape = shape of the mushroom cap, e.g., bell, conical, convex, flat, etccap_color = color of the mushroom cap, e.g., brown, buff, cinnamon, gray, etcodor = smell of the mushroom (almond, anise, creosote, fishy foul, musty, pungent, none)gill_spacing = spacing between mushroom gills, aka the underside of the mushroom cap (close, crowded or distant)gill_size = size of mushroom gills (broad or narrow)gill_color = color of mushroom gills (black, brown, etc)stalk_shape = shape of the mushroom stalk (enlarging or tapering)stalk_root = type of stalk root (bulbous, club, cup, rooted, etc)veil_type = type of veil on the mushroom (partial or universal)veil_color = color of the veil (brown, orange, white or yellow)ring_number = number of rings on the stalk of the mushroom (0, 1 or 2)ring_type = description of the ring(s), if any, found on the stalk, e.g., flaring, large, pendant, etcspore_print_color = color of spores collected on a sheet of paper as a print (black, brown, yellow, etc)population = description of nearby mushrooms of the same species, if any; can be abundant, clustered, numerous, scattered, several, and solitaryhabitat = where the mushroom was found (grasses, leaves, meadows, paths, urban, waste, woods)# download the `mushrooms` dataset
mushrooms <- read_csv("https://rachaelcox.github.io/classes/datasets/mushrooms.csv")
## Parsed with column specification:
## cols(
## class = col_character(),
## cap_shape = col_character(),
## cap_surface = col_character(),
## cap_color = col_character(),
## odor = col_character(),
## gill_spacing = col_character(),
## gill_size = col_character(),
## gill_color = col_character(),
## stalk_shape = col_character(),
## stalk_root = col_character(),
## veil_type = col_character(),
## veil_color = col_character(),
## ring_number = col_double(),
## ring_type = col_character(),
## spore_print_color = col_character(),
## population = col_character(),
## habitat = col_character()
## )
Code Along: Plot a bar graph for counts of mushrooms found in each habitat, colored by class (edible or poisonous) using geom_bar().
# R code here
Practice: Plot a bar graph for counts of mushrooms of each type of odor, colored by class (edible or poisonous) using geom_bar().
# R code here
For this section, we will use the wine dataset (obtained from Kaggle and cleaned by me), which contains the following variables:
type: whether the wine is red or whitequality: median score between 0 and 10 as blindly graded by wine expertsquality_grade: quality category given to each rating based on distribution of ratings (low, med, high)alcohol: the percent alcohol content of the wine (% by volume)alcohol_grade: relative amount of alcohol compared to all wines (low, med, high)pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scaleacidity_grade: acidity intensity (low, med, higj)fixed_acidity: most acids involved with wine or fixed or nonvolatile/do not evaporate readily (tartaric acid - g / dm^3)volatile_acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste (acetic acid - g / dm^3)citric_acid: found in small quantities, citric acid can add freshness and flavor to wines (g / dm^3)residual_sugar: the amount of sugar remaining after fermentation stops; it's rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet (g / dm^3)chlorides: the amount of salt in the wine (sodium chloride - g / dm^3)free_sulfur_dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine (mg / dm^3)total_sulfur_dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine (mg / dm^3)density: degree of consistency measured by mass per unit volume (g / cm^3)sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant (potassium sulphate - g / dm^3)Typical pH of Wine Types
# download the `wine` dataset
wine <- read_csv("https://rachaelcox.github.io/classes/datasets/wine_features.csv")
## Parsed with column specification:
## cols(
## type = col_character(),
## quality = col_double(),
## quality_grade = col_character(),
## alcohol = col_double(),
## alcohol_grade = col_character(),
## pH = col_double(),
## acidity_grade = col_character(),
## fixed_acidity = col_double(),
## volatile_acidity = col_double(),
## citric_acid = col_double(),
## residual_sugar = col_double(),
## chlorides = col_double(),
## free_sulfur_dioxide = col_double(),
## total_sulfur_dioxide = col_double(),
## density = col_double(),
## sulphates = col_double()
## )
Code Along: Plot the distribution of pH for each wine type (i.e., red or wine) by mapping pH to the x-axis, coloring by type, and calling geom_density(). Then, use geom_boxplot() to visualize the distribution of pH across quality_grade, again coloring by type.
# R code here
Practice: Choose a numeric variable you are interested in. Plot its distribution relative to a categorical variable, e.g., type, quality_grade, alcohol_grade or acidity_grade. Use geom_density(), geom_boxplot(), and/or geom_violin().
# R code here