H. Sherry Zhang Department of Statistics and Data Sciences The University of Texas at Austin
Fall 2025
Learning objectives
Diagnose what goes wrong with the pivot and perform pivot in more complicated scenarios.
What is wrong with my pivot?
billboard |>select(artist: wk2)
# A tibble: 317 × 5
artist track date.entered wk1 wk2
<chr> <chr> <date> <dbl> <dbl>
1 2 Pac Baby Don't Cry (Keep... 2000-02-26 87 82
2 2Ge+her The Hardest Part Of ... 2000-09-02 91 87
3 3 Doors Down Kryptonite 2000-04-08 81 70
4 3 Doors Down Loser 2000-10-21 76 76
5 504 Boyz Wobble Wobble 2000-04-15 57 34
6 98^0 Give Me Just One Nig... 2000-08-19 51 39
7 A*Teens Dancing Queen 2000-07-08 97 97
8 Aaliyah I Don't Wanna 2000-01-29 84 62
9 Aaliyah Try Again 2000-03-18 59 53
10 Adams, Yolanda Open My Heart 2000-08-26 76 76
# ℹ 307 more rows
billboard |>pivot_longer(-artist, names_to ="week", values_to ="rank")
Error in pivot_longer(): ! Can’t combine track and date.entered. Run rlang::last_trace() to see where the error occurred.
Short answer: you should also deselect the variable track and date.entered, since they are not the variables you want to reshape with the weeks: wk1, wk2, …
Long answer
When you only deselect artist, pivot_longer() will create a new column, week, to combine old column names (track, date.entered, wk1, …) - This part is fine.
When it creates the new column called rank, it will combine values in these old columns together (Baby Don't Cry..., 2000-02-26, 87, 82) to form a new column.
The error message is complaining don’t know how to combine them together since you have a mix of characters, dates, and numerical values.
billboard |>pivot_longer(-c(artist, track, date.entered), names_to ="week", values_to ="rank")
useful argument names_vary: “fastest” varies names_from values fastest, resulting in a column naming scheme of the form: value1_name1, value1_name2, value2_name1, value2_name2. This is the default.
A series of dataset about the number of TB cases documented by the World Health Organization in Afghanistan, Brazil, and China between 1999 and 2000.
The data contains values associated with four variables (country, year, cases, and population), but each table organizes the values in a different layout.
We will practice reshaping them to table1
table1
# A tibble: 6 × 4
country year cases population
<chr> <dbl> <dbl> <dbl>
1 Afghanistan 1999 745 19987071
2 Afghanistan 2000 2666 20595360
3 Brazil 1999 37737 172006362
4 Brazil 2000 80488 174504898
5 China 1999 212258 1272915272
# ℹ 1 more row