Spring 2025. UT Austin, Department of Statistics and Data Sciences. MWF 9–10am.
The goal of the course is to train you to be comfortable using R for
exploratory data analysis. The first part of the course will focus on
exploratory data analysis with tidyverse
. You will learn
how to wrangle data and create plots to communicate information learned
from the data. We will start from conventional tabular data, and cover
web scraping, spatial data, and text data. The second half of the course
will focus on basic machine learning algorithms and we will cover
regression (linear and logistic), cluster algorithms (K-mean and
hierarchical), PCA, KNN, tree-based methods, random forest. This is a
hand-on programming course. Most classes have an associated code
repository that contains exercises that we will work together during the
class. The class assumes no prior knowledge of programming.
Week | Class | Slides | Exercises |
---|---|---|---|
1 | a | Welcome to the class | |
b | Get to know Rmarkdown | ||
c | The big picture | ||
2 | a | Labor Day | |
b | Welcome to tidyverse and tidy data | ||
c | Data wrangling with
dplyr : basics I |
||
3 | a | Data wrangling with
dplyr : basics II |
|
b | Data visualization with
ggplot2 : different components in the grammar of
graphic |
||
c | Data visualization with
ggplot2 : distributions |
||
4 | a | Data visualization with
ggplot2 : counts and proportions |
|
b | Data visualization with
ggplot2 : factors and color |
||
c | Data visualization with ggplot2 : exercise |
||
5 | a | Data wrangling with
dplyr : joins |
|
b | Data wrangling with
lubridate : date and time |
||
c | Spatial data wrangling and
visualization with sf |
||
6 | a | Data tidying with tidyr :
pivot |
|
b | Data tidying with
tidyr II: pivot |
||
c | Case study: visualizing flight routes on the map | ||
7 | a | Project 1 introduction + working day | |
b | Webscraping with
rvest |
||
c | Case study: visualizing flight arrival and departure pattern |
I’d like to let you know about a bonus mark opportunity for this class: a 10-minute presentation in Week 13 or 14 on an advanced topic related to what we have covered, but not formally taught.
Think of it as a mini research project where you explore something new, based on what you’ve learnt in the class, with my guidance. If you find a particular topic interesting, or if you’d like to dip your toes into research, this is a low-cost opportunity to try!
What you need to do:
Pick one item from the topic list (more will be added) and email me to register your interest. Topics will be assigned on a first-come, first-served basis. Additional topics will be provided if more people sign up. You can also propose a topic you’re interested in but not covered in the class.
After I confirm your choice, you can begin investigating the problem. I’m happy to meet during the week to discuss and provide guidance, but I can’t walk you through the solution - since this is meant as a research component, you need to develop it yourself.
Prepare your findings in a presentation to share with the class in week 13 or 14 (TBD).
Depending on the quality of your investigation, you can earn an additional 3-5 marks toward your final grade.
Topic list:
Project | Description |
---|---|
(taken) Quarto 1 | We have been using R Markdown files throughout the semester, but in recent years, Posit has introduced a new format called Quarto. The two are similar, but Quarto allows for additional features. Create some demonstrations to show your classmates what is the same and what is different in Quarto and R Markdown. |
(taken) Quarto 2 | In Week 1 Wednesday, we mentioned that R Markdown/Quarto can be used for many purposes, such as creating slides, building websites, and writing books. Using the official Quarto documentation and other resources, create a simple personal website for yourself and show your classmates how to do so. |
leaflet |
We introduced plotting spatial data on Week 5 Friday. Leaflet,
originally developed in JavaScript, is another popular choice to
visualize spatial data by news agencies (e.g. The New York Times).
Focusing on the R package leaflet , show your classmates how
the mapping grammar works in leaflet and how to use it with
sf objects and other spatial data objects. |
Spatial join and filter | We have talked about joining (dplyr::*_join() ) and
filtering (dplyr::filter() ) for tabular data - but how do
you perform joins or filters on spatial data? For example, how would you
find all the airports in Texas? Create some examples to show your
classmates the functionalities in sf for spatial join and spatial
filter. |
tidygraph |
In the flight case study (Week 6 Friday), we plot a map with
airports as nodes and flight routes as edges. This is a graph structure.
In R, there is a package called tidygraph that provides a
tidy data interface to work with network data. Create some examples to
demonstrate how to wrangle and visualize network data using the
tidygraph and related package. |
R for Data Science by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund
ggplot2: Elegant Graphics for Data Analysis (3e) by Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen
Fundamentals of Data Visualization by Claus O. Wilke
Statistical Computing using R and Python by Susan Vanderplas