pivot_longer() and pivot_wider() functions in the tidyr package
pivot_longer() and pivot_wider()pivot_longer() and pivot_wider() with other tidyr functions such as separate()anscombe example)Artwork by @allison_horst
Artwork by @allison_horst
Artwork by @allison_horst
Artwork by @allison_horst
Artwork by @allison_horst
The Billboard dataset: the date a song first entered the Billboard Top 100
❌: No, because wk1, wk2, … are values, not variables - they should be recorded in cells rather than in column names
✅ Yes, because 1) The variables are: year, artist, time, track, date, week, and rank, 2) The observation is a recorded rank of a song in a particular week
Basic syntax:
Example:
You can use the full billboard dataset from the tidyr package, but we make it easier by consider only 4 columns and 5 rows.
# A tibble: 5 × 4
artist track wk1 wk2
<chr> <chr> <dbl> <dbl>
1 2 Pac Baby Don't Cry (Keep... 87 82
2 2Ge+her The Hardest Part Of ... 91 87
3 3 Doors Down Kryptonite 81 70
4 3 Doors Down Loser 76 76
5 504 Boyz Wobble Wobble 57 34
Fill in the blanks to pivot df longer:
Your result should look like this:
# A tibble: 10 × 4
artist track week rank
<chr> <chr> <chr> <dbl>
1 2 Pac Baby Don't Cry (Keep... wk1 87
2 2 Pac Baby Don't Cry (Keep... wk2 82
3 2Ge+her The Hardest Part Of ... wk1 91
4 2Ge+her The Hardest Part Of ... wk2 87
5 3 Doors Down Kryptonite wk1 81
6 3 Doors Down Kryptonite wk2 70
7 3 Doors Down Loser wk1 76
8 3 Doors Down Loser wk2 76
9 504 Boyz Wobble Wobble wk1 57
10 504 Boyz Wobble Wobble wk2 34
billboard |>
select(artist, track, wk1, wk2) |>
head(5) |>
pivot_longer(cols = c(wk1, wk2), names_to = "week", values_to = "rank")# A tibble: 10 × 4
artist track week rank
<chr> <chr> <chr> <dbl>
1 2 Pac Baby Don't Cry (Keep... wk1 87
2 2 Pac Baby Don't Cry (Keep... wk2 82
3 2Ge+her The Hardest Part Of ... wk1 91
4 2Ge+her The Hardest Part Of ... wk2 87
5 3 Doors Down Kryptonite wk1 81
6 3 Doors Down Kryptonite wk2 70
7 3 Doors Down Loser wk1 76
8 3 Doors Down Loser wk2 76
9 504 Boyz Wobble Wobble wk1 57
10 504 Boyz Wobble Wobble wk2 34
You can do exactly the same with all the rows (remove head(5)).
What would you do if you want to pivot all the week columns (wk1, wk2, …, wk76) longer?
In the full billboard dataset there are 76 weeks: wk1: wk76, we can use starts_with("wk") to select all the columns that start with “wk” (select syntax - week 3 Monday).
# A tibble: 24,092 × 5
artist track date.entered week rank
<chr> <chr> <date> <chr> <dbl>
1 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk1 87
2 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk2 82
3 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk3 72
4 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk4 77
5 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk5 87
6 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk6 94
7 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk7 99
8 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk8 NA
9 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk9 NA
10 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk10 NA
# ℹ 24,082 more rows
The inverse of pivot longer
Basic syntax:
Example:
Practice pivot_wider() to get back the original df:
df <- billboard |>
select(artist, track, wk1, wk2) |>
head(5) |>
pivot_longer(
cols = c(wk1, wk2),
names_to = "week",
values_to = "rank"
)
df# A tibble: 10 × 4
artist track week rank
<chr> <chr> <chr> <dbl>
1 2 Pac Baby Don't Cry (Keep... wk1 87
2 2 Pac Baby Don't Cry (Keep... wk2 82
3 2Ge+her The Hardest Part Of ... wk1 91
4 2Ge+her The Hardest Part Of ... wk2 87
5 3 Doors Down Kryptonite wk1 81
6 3 Doors Down Kryptonite wk2 70
7 3 Doors Down Loser wk1 76
8 3 Doors Down Loser wk2 76
9 504 Boyz Wobble Wobble wk1 57
10 504 Boyz Wobble Wobble wk2 34
Your result should look like this:
# A tibble: 5 × 4
artist track wk1 wk2
<chr> <chr> <dbl> <dbl>
1 2 Pac Baby Don't Cry (Keep... 87 82
2 2Ge+her The Hardest Part Of ... 91 87
3 3 Doors Down Kryptonite 81 70
4 3 Doors Down Loser 76 76
5 504 Boyz Wobble Wobble 57 34
billboard |>
select(artist, track, wk1, wk2) |>
head(5) |>
pivot_longer(cols = c(wk1, wk2), names_to = "week", values_to = "rank") |>
pivot_wider(names_from = week, values_from = rank)# A tibble: 5 × 4
artist track wk1 wk2
<chr> <chr> <dbl> <dbl>
1 2 Pac Baby Don't Cry (Keep... 87 82
2 2Ge+her The Hardest Part Of ... 91 87
3 3 Doors Down Kryptonite 81 70
4 3 Doors Down Loser 76 76
5 504 Boyz Wobble Wobble 57 34
Anscombe’s quartet
x1 x2 x3 x4 y1 y2 y3 y4
1 10 10 10 8 8.04 9.14 7.46 6.58
2 8 8 8 8 6.95 8.14 6.77 5.76
3 13 13 13 8 7.58 8.74 12.74 7.71
4 9 9 9 8 8.81 8.77 7.11 8.84
5 11 11 11 8 8.33 9.26 7.81 8.47
6 14 14 14 8 9.96 8.10 8.84 7.04
7 6 6 6 8 7.24 6.13 6.08 5.25
8 4 4 4 19 4.26 3.10 5.39 12.50
9 12 12 12 8 10.84 9.13 8.15 5.56
10 7 7 7 8 4.82 7.26 6.42 7.91
11 5 5 5 8 5.68 4.74 5.73 6.89
Summary of average for x and y:
# A tibble: 4 × 3
set mean_y mean_x
<chr> <dbl> <dbl>
1 1 7.50 9
2 2 7.50 9
3 3 7.5 9
4 4 7.50 9

If we can structure the data into this format…
# A tibble: 44 × 3
set x y
<chr> <dbl> <dbl>
1 1 10 8.04
2 2 10 9.14
3 3 10 7.46
4 4 8 6.58
5 1 8 6.95
6 2 8 8.14
7 3 8 6.77
8 4 8 5.76
9 1 13 7.58
10 2 13 8.74
# ℹ 34 more rows
then it is easy to map them into x and y axes and facet by set.

Should we use pivot_wider() or pivot_longer() first?
pivot_longer()!
Start from a tibble:
# A tibble: 11 × 9
id x1 x2 x3 x4 y1 y2 y3 y4
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 10 10 10 8 8.04 9.14 7.46 6.58
2 2 8 8 8 8 6.95 8.14 6.77 5.76
3 3 13 13 13 8 7.58 8.74 12.7 7.71
4 4 9 9 9 8 8.81 8.77 7.11 8.84
5 5 11 11 11 8 8.33 9.26 7.81 8.47
6 6 14 14 14 8 9.96 8.1 8.84 7.04
7 7 6 6 6 8 7.24 6.13 6.08 5.25
8 8 4 4 4 19 4.26 3.1 5.39 12.5
9 9 12 12 12 8 10.8 9.13 8.15 5.56
10 10 7 7 7 8 4.82 7.26 6.42 7.91
11 11 5 5 5 8 5.68 4.74 5.73 6.89
Use separate() to split variable into two columns: type (x or y) and set (1, 2, 3, 4).
sep = 1 means split at the first positiondt |>
pivot_longer(cols = x1: y4, names_to = "variable", values_to = "value") |>
separate(variable, into = c("type", "set"), sep = 1)# A tibble: 88 × 4
id type set value
<chr> <chr> <chr> <dbl>
1 1 x 1 10
2 1 x 2 10
3 1 x 3 10
4 1 x 4 8
5 1 y 1 8.04
6 1 y 2 9.14
7 1 y 3 7.46
8 1 y 4 6.58
9 2 x 1 8
10 2 x 2 8
# ℹ 78 more rows
Should we use pivot_wider() or pivot_longer() now?
pivot_wider()!
dt |>
pivot_longer(cols = x1: y4, names_to = "variable", values_to = "value") |>
separate(variable, into = c("type", "set"), sep = 1) |>
pivot_wider(names_from = type, values_from = value)# A tibble: 44 × 4
id set x y
<chr> <chr> <dbl> <dbl>
1 1 1 10 8.04
2 1 2 10 9.14
3 1 3 10 7.46
4 1 4 8 6.58
5 2 1 8 6.95
6 2 2 8 8.14
7 2 3 8 6.77
8 2 4 8 5.76
9 3 1 13 7.58
10 3 2 13 8.74
# ℹ 34 more rows
Now plot the data!
dt2 <- dt |>
pivot_longer(cols = x1: y4, names_to = "variable", values_to = "value") |>
separate(variable, into = c("type", "set"), sep = 1) |>
pivot_wider(names_from = type, values_from = value)
dt2 |>
ggplot(aes(x = x, y = y)) +
geom_point() +
geom_smooth(method = "lm") +
facet_wrap(vars(set), ncol = 2) Let’s plot all the origin and destination airports of all the flights in nycflights13::flights on the US map.