Introduction to R for Biologists, Summer 2021
This is the homepage for the introductory R course offered by the Big Data in Biology Summer School through the Center for Biomedical Research Support. All lecture slides, coding worksheets and coding worksheet solutions will be posted here. Zoom recordings will be distributed to the class via email. More information regarding summer school courses can be found here.
Class information
Zoom meeting link:
- Meeting ID: 946 0259 4373
- https://utexas.zoom.us/j/93659263033
Class compute servers (see email for password):
These compute servers (aka PODs) are managed by the Biomedical Research Computing Facility. PODs have powerful hardware for handling large data sets, come with many bioinformatics tools pre-installed, are regularly backed up, and feature web-based integrated development environments (IDEs) for both Python and R. You can find out more information about setting up a POD for your own research here and here.
Day 1: Introduction to R programming & the Tidyverse
- Slides (R basics): day1.pdf
- Slides (Tidyverse intro): tidy_intro.pdf
- You can download R from here: https://cran.r-project.org/
- You can download RStudio from here: https://www.rstudio.com/products/rstudio/download/
- R Markdown basics: https://rmarkdown.rstudio.com/authoring_basics.html
- Tidyverse website,
tidyr
vignettes: https://tidyr.tidyverse.org/ - In-class worksheet 1 (R basics):
- In-class worksheet 2 (Tidying data):
- Blank R Markdown project notebook template:
Day 2: Data visualization with ggplot2
- Slides: day2.pdf
- Tidyverse style guide: https://style.tidyverse.org/index.html
- Tidyverse website,
ggplot2
vignettes: https://ggplot2.tidyverse.org/ - Guide to all functions available in ggplot2: https://ggplot2.tidyverse.org/reference/
- Default colors that R recognizes: List of all strings with example output
- Guide to interactive plots using ggplotly: https://plot.ly/ggplot2/user-guide/
- Optimize your data viz for your data type: https://serialmentor.com/dataviz/directory-of-visualizations.html
- In-class worksheet:
Day 3: Data manipulation & analysis with dplyr
- Slides: day3.pdf
- Tidyverse website,
dplyr
vignettes: https://dplyr.tidyverse.org/ - Animated visualizations of different join() functions:
- In-class worksheet:
Day 4: Statistics & advanced data analysis
- Slides: day4.pdf
- Hypothesis testing:
- Choosing a test statistic: http://www.biostathandbook.com/testchoice.html
- Generalized hypothesis testing using infer/tidymodels packages: https://www.andrewheiss.com/blog/2018/12/05/test-any-hypothesis/
- Principal component analysis (PCA):
- Good step-by-step walkthrough of PCA calculations: https://builtin.com/data-science/step-step-explanation-principal-component-analysis
- Interactive visualization of principal component analysis (PCA): http://setosa.io/ev/principal-component-analysis/
- Interactive visualization of eigenvectors/eigenvalues if you really want to dig in: http://setosa.io/ev/eigenvectors-and-eigenvalues/
- Clustering:
- Interactive visualization of k-means clustering: https://www.naftaliharris.com/blog/visualizing-k-means-clustering/
- Towards Data Science article: The 5 Clustering Algorithms Data Scientists Need to Know
- In-class worksheet: