H. Sherry Zhang Department of Statistics and Data Sciences The University of Texas at Austin
Fall 2025
Learning objectives:
Develop a fundamental understanding of the grammar of graphics as implemented in ggplot2, including how to:
ggplot(): initialize a plot
geom_*(aes()): add geometries and aesthetic mappings
facet_*(): create small multiples
scale_[x/y/color/fill]_*(): modify scales
theme_*() and theme(): customize appearance
Most of the plots we will create today are bad š¢, but they help us to understand the components of a ggplot before we can modify them to make better plots š.
mtcars: relationship between miles per gallon (mpg) and displacement (disp)
Data visualization for communicating information
However, information can be mis-communicated if the graphic is not well-made.
some plots not aesthetically pleasing,
some can be misleading, and
some are not or partial not informative.
Data visualization for communicating information
Plot the life expectancy (lifeExp) over the years (year) for all countries in the gapminder dataset
š This is misleading because the lines almost look like flat.
Data visualization for communicating information
Plot the displacement (disp) for different car models in the mtcars dataset.
š This is not informative since it is difficult to tell how displacement relates to car models.
Data visualization for communicating information
Plot the same bar chart but with colors.
š¢ This is also not informative because the colors donāt add more information to the plot and it is arguably aesthetically pleasing (or we may say it is dazzling).
Data visualization for communicating information
Plot the same bar chart but with colors representing the number of cylinders (cyl).
š This is informative because we can learn from the plot that larger cylinders tend to have larger displacements.
Base R plots
plot(-4:4, -4:4, type ="n") # setting up coord. systempoints(x =rnorm(200), y =rnorm(200), col ="red")
Why ggplot2?
ggplot2 is a package for data visualization based on The Grammars of Graphics by Leland Wilkinson.
Originally written by Hadley Wickham (part of his PhD dissertation), now maintained by Posit/RStudio.
Main references:
ggplot2: Elegant Graphics for Data Analysis (3e) by Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen. https://ggplot2-book.org/
Geometry is actually the most complicated among all the ggplot components. You shouldnāt try to memorize all these geometries.
Think about the type of data you have (categorical, continuous, time series, spatial, etc) and the information you would like to communicate (comparison, distribution, relationship, etc) and then decide which geometry to use.
Can you color the points by continent?
Translate to ggplot language: continent mapped to color aesthetic
gapminder |>ggplot() +geom_point(aes(x = lifeExp, y = gdpPercap, color = continent))
How many aesthetics are there?
A handful: x, y, color, fill, size, shape, alpha, linetype, group, label, ā¦
color: the outline or border of a geometry
fill: the interior fill of a geometry
alpha: the transparency of a geometry
Each geom_*() has its required aesthetics and optional aesthetics.
geom_point() requires x and y and understand color, ā¦
geom_segment() requires x, y, xend or yend and understand color, ā¦
Do you know how to find out what is the required aesthetics and optional aesthetics for each geometry?
Can we use small multiples for continents?
Translate to ggplot language: facet by continent
gapminder |>ggplot() +geom_point(aes(x = lifeExp, y = gdpPercap, color = continent)) +facet_wrap(vars(continent))
How many facets are there?
You will most likely only interact with two facets: 1) facet_wrap(vars(...), ...) for one variable and 2) facet_grid(... ~ ..., ...) for two variables.
But there are other fancy facets in the wild (ggplot2 extensions):
ggh4x::facet_nested()
geofacet::facet_geo()
Can we use a different color palette?
Translate to ggplot language:
use scale_[color/fill]_[...](palette = "...") to change the color palette
gapminder |>ggplot() +geom_point(aes(x = lifeExp, y = gdpPercap, color = continent)) +facet_wrap(vars(continent)) +scale_color_brewer(palette ="Set1")
gapminder |>ggplot() +geom_point(aes(x = lifeExp, y = gdpPercap, color = continent)) +facet_wrap(vars(continent)) +scale_color_brewer(palette ="Set1") +theme_bw()
Can we move around the legend?
gapminder |>ggplot() +geom_point(aes(x = lifeExp, y = gdpPercap, color = continent)) +facet_wrap(vars(continent)) +scale_color_brewer(palette ="Set1") +theme_bw() +theme(legend.position ="bottom")
How many themes are there?
Please donāt memorize all these theme elements!
Instead, put your cursor inside theme() and press the Tab key on your keyboard to activate this popup list available theme elements:
Theme
p1 <- gapminder |>ggplot() +geom_point(aes(x = lifeExp, y = gdpPercap, color = continent)) +facet_wrap(vars(continent)) +scale_color_brewer(palette ="Set1") +theme_bw() +theme(legend.position ="bottom")p1
Theme
p1 <- gapminder |>ggplot() +geom_point(aes(x = lifeExp, y = gdpPercap, color = continent)) +facet_wrap(vars(continent)) +scale_color_brewer(palette ="Set1") +theme_bw() +theme(legend.position ="bottom")p1 +# these are some useful ones# remove unnecessary reference linestheme(panel.grid.minor =element_blank())
Theme
p1 <- gapminder |>ggplot() +geom_point(aes(x = lifeExp, y = gdpPercap, color = continent)) +facet_wrap(vars(continent)) +scale_color_brewer(palette ="Set1") +theme_bw() +theme(legend.position ="bottom")p1 +# these are some useful ones# remove unnecessary reference linestheme(panel.grid.minor =element_blank()) +# larger text size for presentationtheme(text =element_text(size =20))
Theme
p1 <- gapminder |>ggplot() +geom_point(aes(x = lifeExp, y = gdpPercap, color = continent)) +facet_wrap(vars(continent)) +scale_color_brewer(palette ="Set1") +theme_bw() +theme(legend.position ="bottom")p1 +# these are some useful ones# remove unnecessary reference linestheme(panel.grid.minor =element_blank()) +# larger text size for presentationtheme(text =element_text(size =20)) +# now you can free solotheme(legend.title =element_text(family ="menlo", size =30))
Theme
p1 <- gapminder |>ggplot() +geom_point(aes(x = lifeExp, y = gdpPercap, color = continent)) +facet_wrap(vars(continent)) +scale_color_brewer(palette ="Set1") +theme_bw() +theme(legend.position ="bottom")p1 +# these are some useful ones# remove unnecessary reference linestheme(panel.grid.minor =element_blank()) +# larger text size for presentationtheme(text =element_text(size =20)) +# now you can free solotheme(legend.title =element_text(family ="menlo", size =30)) +theme(panel.background =element_rect(fill ="lightblue", color ="black"))
Theme
p1 <- gapminder |>ggplot() +geom_point(aes(x = lifeExp, y = gdpPercap, color = continent)) +facet_wrap(vars(continent)) +scale_color_brewer(palette ="Set1") +theme_bw() +theme(legend.position ="bottom")p1 +# these are some useful ones# remove unnecessary reference linestheme(panel.grid.minor =element_blank()) +# larger text size for presentationtheme(text =element_text(size =20)) +# now you can free solotheme(legend.title =element_text(family ="menlo", size =30)) +theme(panel.background =element_rect(fill ="lightblue", color ="black")) +theme(panel.grid =element_line(color ="black", size =2))
We havenāt talked about coordinates
gapminder |>ggplot() +geom_point(aes(x = lifeExp, y = gdpPercap, color = continent)) +facet_wrap(vars(continent), nrow =1) +scale_color_brewer(palette ="Set1") +theme_bw() +theme(legend.position ="bottom") +coord_polar()
Most likely you will only use coord_cartesian(), you may see coord_flip(), coord_polar(), or coord_sf() occasionally.
Your time
We will be practicing all these components with geom_line().
geom_line() needs a group aesthetic to tell it how to connect the points.
gapminder |>ggplot(aes(x = year, y = lifeExp)) +geom_line()
gapminder |>ggplot(aes(x = year, y = lifeExp, group = country)) +geom_line()