pivot_longer() and pivot_wider() functions in the tidyr package
pivot_longer() and pivot_wider()pivot_longer() and pivot_wider() with other tidyr functions such as separate()anscombe example)Artwork by @allison_horst
Artwork by @allison_horst
Artwork by @allison_horst
Artwork by @allison_horst
Artwork by @allison_horst
The Billboard dataset: the date a song first entered the Billboard Top 100
❌: No, because wk1, wk2, … are values, not variables - they should be recorded in cells rather than in column names
✅ Yes, because 1) The variables are: year, artist, time, track, date, week, and rank, 2) The observation is a recorded rank of a song in a particular week
Basic syntax:
Example:
You can use the full billboard dataset from the tidyr package, but we make it easier by consider only 4 columns and 5 rows.
# A tibble: 5 × 4
  artist       track                     wk1   wk2
  <chr>        <chr>                   <dbl> <dbl>
1 2 Pac        Baby Don't Cry (Keep...    87    82
2 2Ge+her      The Hardest Part Of ...    91    87
3 3 Doors Down Kryptonite                 81    70
4 3 Doors Down Loser                      76    76
5 504 Boyz     Wobble Wobble              57    34Fill in the blanks to pivot df longer:
Your result should look like this:
# A tibble: 10 × 4
   artist       track                   week   rank
   <chr>        <chr>                   <chr> <dbl>
 1 2 Pac        Baby Don't Cry (Keep... wk1      87
 2 2 Pac        Baby Don't Cry (Keep... wk2      82
 3 2Ge+her      The Hardest Part Of ... wk1      91
 4 2Ge+her      The Hardest Part Of ... wk2      87
 5 3 Doors Down Kryptonite              wk1      81
 6 3 Doors Down Kryptonite              wk2      70
 7 3 Doors Down Loser                   wk1      76
 8 3 Doors Down Loser                   wk2      76
 9 504 Boyz     Wobble Wobble           wk1      57
10 504 Boyz     Wobble Wobble           wk2      34billboard |> 
  select(artist, track,  wk1, wk2) |> 
  head(5) |> 
  pivot_longer(cols = c(wk1, wk2), names_to = "week", values_to = "rank")# A tibble: 10 × 4
   artist       track                   week   rank
   <chr>        <chr>                   <chr> <dbl>
 1 2 Pac        Baby Don't Cry (Keep... wk1      87
 2 2 Pac        Baby Don't Cry (Keep... wk2      82
 3 2Ge+her      The Hardest Part Of ... wk1      91
 4 2Ge+her      The Hardest Part Of ... wk2      87
 5 3 Doors Down Kryptonite              wk1      81
 6 3 Doors Down Kryptonite              wk2      70
 7 3 Doors Down Loser                   wk1      76
 8 3 Doors Down Loser                   wk2      76
 9 504 Boyz     Wobble Wobble           wk1      57
10 504 Boyz     Wobble Wobble           wk2      34You can do exactly the same with all the rows (remove head(5)).
What would you do if you want to pivot all the week columns (wk1, wk2, …, wk76) longer?
In the full billboard dataset there are 76 weeks: wk1: wk76, we can use starts_with("wk") to select all the columns that start with “wk” (select syntax - week 3 Monday).
# A tibble: 24,092 × 5
   artist track                   date.entered week   rank
   <chr>  <chr>                   <date>       <chr> <dbl>
 1 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk1      87
 2 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk2      82
 3 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk3      72
 4 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk4      77
 5 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk5      87
 6 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk6      94
 7 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk7      99
 8 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk8      NA
 9 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk9      NA
10 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk10     NA
# ℹ 24,082 more rowsThe inverse of pivot longer
Basic syntax:
Example:
Practice pivot_wider() to get back the original df:
df <- billboard |> 
  select(artist, track,  wk1, wk2) |> 
  head(5) |> 
  pivot_longer(
    cols = c(wk1, wk2), 
    names_to = "week", 
    values_to = "rank"
  )
df# A tibble: 10 × 4
   artist       track                   week   rank
   <chr>        <chr>                   <chr> <dbl>
 1 2 Pac        Baby Don't Cry (Keep... wk1      87
 2 2 Pac        Baby Don't Cry (Keep... wk2      82
 3 2Ge+her      The Hardest Part Of ... wk1      91
 4 2Ge+her      The Hardest Part Of ... wk2      87
 5 3 Doors Down Kryptonite              wk1      81
 6 3 Doors Down Kryptonite              wk2      70
 7 3 Doors Down Loser                   wk1      76
 8 3 Doors Down Loser                   wk2      76
 9 504 Boyz     Wobble Wobble           wk1      57
10 504 Boyz     Wobble Wobble           wk2      34Your result should look like this:
# A tibble: 5 × 4
  artist       track                     wk1   wk2
  <chr>        <chr>                   <dbl> <dbl>
1 2 Pac        Baby Don't Cry (Keep...    87    82
2 2Ge+her      The Hardest Part Of ...    91    87
3 3 Doors Down Kryptonite                 81    70
4 3 Doors Down Loser                      76    76
5 504 Boyz     Wobble Wobble              57    34billboard |> 
  select(artist, track,  wk1, wk2) |> 
  head(5) |> 
  pivot_longer(cols = c(wk1, wk2), names_to = "week", values_to = "rank") |> 
  pivot_wider(names_from = week, values_from = rank)# A tibble: 5 × 4
  artist       track                     wk1   wk2
  <chr>        <chr>                   <dbl> <dbl>
1 2 Pac        Baby Don't Cry (Keep...    87    82
2 2Ge+her      The Hardest Part Of ...    91    87
3 3 Doors Down Kryptonite                 81    70
4 3 Doors Down Loser                      76    76
5 504 Boyz     Wobble Wobble              57    34Anscombe’s quartet
   x1 x2 x3 x4    y1   y2    y3    y4
1  10 10 10  8  8.04 9.14  7.46  6.58
2   8  8  8  8  6.95 8.14  6.77  5.76
3  13 13 13  8  7.58 8.74 12.74  7.71
4   9  9  9  8  8.81 8.77  7.11  8.84
5  11 11 11  8  8.33 9.26  7.81  8.47
6  14 14 14  8  9.96 8.10  8.84  7.04
7   6  6  6  8  7.24 6.13  6.08  5.25
8   4  4  4 19  4.26 3.10  5.39 12.50
9  12 12 12  8 10.84 9.13  8.15  5.56
10  7  7  7  8  4.82 7.26  6.42  7.91
11  5  5  5  8  5.68 4.74  5.73  6.89Summary of average for x and y:
# A tibble: 4 × 3
  set   mean_y mean_x
  <chr>  <dbl>  <dbl>
1 1       7.50      9
2 2       7.50      9
3 3       7.5       9
4 4       7.50      9
If we can structure the data into this format…
# A tibble: 44 × 3
   set       x     y
   <chr> <dbl> <dbl>
 1 1        10  8.04
 2 2        10  9.14
 3 3        10  7.46
 4 4         8  6.58
 5 1         8  6.95
 6 2         8  8.14
 7 3         8  6.77
 8 4         8  5.76
 9 1        13  7.58
10 2        13  8.74
# ℹ 34 more rowsthen it is easy to map them into x and y axes and facet by set.

Should we use pivot_wider() or pivot_longer() first?
pivot_longer()!
Start from a tibble:
# A tibble: 11 × 9
   id       x1    x2    x3    x4    y1    y2    y3    y4
   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 1        10    10    10     8  8.04  9.14  7.46  6.58
 2 2         8     8     8     8  6.95  8.14  6.77  5.76
 3 3        13    13    13     8  7.58  8.74 12.7   7.71
 4 4         9     9     9     8  8.81  8.77  7.11  8.84
 5 5        11    11    11     8  8.33  9.26  7.81  8.47
 6 6        14    14    14     8  9.96  8.1   8.84  7.04
 7 7         6     6     6     8  7.24  6.13  6.08  5.25
 8 8         4     4     4    19  4.26  3.1   5.39 12.5 
 9 9        12    12    12     8 10.8   9.13  8.15  5.56
10 10        7     7     7     8  4.82  7.26  6.42  7.91
11 11        5     5     5     8  5.68  4.74  5.73  6.89Use separate() to split variable into two columns: type (x or y) and set (1, 2, 3, 4).
sep = 1 means split at the first positiondt |> 
  pivot_longer(cols = x1: y4, names_to = "variable", values_to = "value") |> 
  separate(variable, into = c("type", "set"), sep = 1)# A tibble: 88 × 4
   id    type  set   value
   <chr> <chr> <chr> <dbl>
 1 1     x     1     10   
 2 1     x     2     10   
 3 1     x     3     10   
 4 1     x     4      8   
 5 1     y     1      8.04
 6 1     y     2      9.14
 7 1     y     3      7.46
 8 1     y     4      6.58
 9 2     x     1      8   
10 2     x     2      8   
# ℹ 78 more rowsShould we use pivot_wider() or pivot_longer() now?
pivot_wider()!
dt |> 
  pivot_longer(cols = x1: y4, names_to = "variable", values_to = "value") |> 
  separate(variable, into = c("type", "set"), sep = 1) |> 
  pivot_wider(names_from = type, values_from = value)# A tibble: 44 × 4
   id    set       x     y
   <chr> <chr> <dbl> <dbl>
 1 1     1        10  8.04
 2 1     2        10  9.14
 3 1     3        10  7.46
 4 1     4         8  6.58
 5 2     1         8  6.95
 6 2     2         8  8.14
 7 2     3         8  6.77
 8 2     4         8  5.76
 9 3     1        13  7.58
10 3     2        13  8.74
# ℹ 34 more rowsNow plot the data!
dt2 <- dt |> 
  pivot_longer(cols = x1: y4, names_to = "variable", values_to = "value") |> 
  separate(variable, into = c("type", "set"), sep = 1) |> 
  pivot_wider(names_from = type, values_from = value)
dt2 |> 
  ggplot(aes(x = x, y = y)) + 
  geom_point() + 
  geom_smooth(method = "lm") + 
  facet_wrap(vars(set), ncol = 2) Let’s plot all the origin and destination airports of all the flights in nycflights13::flights on the US map.