The cubble object

The cubble class is an S3 class built on tibble that allows the spatio-temporal data to be wrangled in two forms: a nested/spatial form and a long/temporal form. It consists of two subclasses:

  • a nested/ spatial cubble is represented by the class c("spatial_cubble_df", "cubble_df")
  • a long/ temporal cubble is represented by the class c("temporal_cubble_df", "cubble_df")

In a nested cubble, spatial variables are organised as columns and temporal variables are nested within a specialised ts column:

cb_nested
#> # cubble:   key: id [3], index: date, nested form
#> # spatial:  [144.8321, -37.98, 145.0964, -37.6655], Missing CRS!
#> # temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
#>   id           long   lat  elev name              wmo_id ts               
#>   <chr>       <dbl> <dbl> <dbl> <chr>              <dbl> <list>           
#> 1 ASN00086038  145. -37.7  78.4 essendon airport   95866 <tibble [10 × 4]>
#> 2 ASN00086077  145. -38.0  12.1 moorabbin airport  94870 <tibble [10 × 4]>
#> 3 ASN00086282  145. -37.7 113.  melbourne airport  94866 <tibble [10 × 4]>
class(cb_nested)
#> [1] "spatial_cubble_df" "cubble_df"         "tbl_df"           
#> [4] "tbl"               "data.frame"

This toy dataset is a subset of a larger data climate_aus sourced from the Global Historical Climatology Network Daily (GHCND). It records three airport stations located in Melbourne, Australia and includes spatial variables such as station ID, longitude, latitude, elevation, station name, World Meteorology Organisation ID. The dataset contains temporal variables including precipitation, maximum and minimum temperature, which can be read from the cubble header.

In a long cubble, the temporal variables are expanded into the long form, while the spatial variables are stored as a data attribute:

cb_long
#> # cubble:   key: id [3], index: date, long form
#> # temporal: 2020-01-01 -- 2020-01-10 [1D], no gaps
#> # spatial:  long [dbl], lat [dbl], elev [dbl], name [chr], wmo_id [dbl]
#>    id          date        prcp  tmax  tmin
#>    <chr>       <date>     <dbl> <dbl> <dbl>
#>  1 ASN00086038 2020-01-01     0  26.8  11  
#>  2 ASN00086038 2020-01-02     0  26.3  12.2
#>  3 ASN00086038 2020-01-03     0  34.5  12.7
#>  4 ASN00086038 2020-01-04     0  29.3  18.8
#>  5 ASN00086038 2020-01-05    18  16.1  12.5
#>  6 ASN00086038 2020-01-06   104  17.5  11.1
#>  7 ASN00086038 2020-01-07    14  20.7  12.1
#>  8 ASN00086038 2020-01-08     0  26.4  16.4
#>  9 ASN00086038 2020-01-09     0  33.1  17.4
#> 10 ASN00086038 2020-01-10     0  34    19.6
#> # ℹ 20 more rows
class(cb_long)
#> [1] "temporal_cubble_df" "cubble_df"          "tbl_df"            
#> [4] "tbl"                "data.frame"

The cubble header now shows the recorded temporal period (2020-01-01 to 2020-01-10), the interval (1 day), and there is no gaps in the data.

The cubble attributes

A cubble object inherits the attributes from tibble (and its subclasses): class, row.names, and names. Additionally, it has three specialised attributes:

  • key: the spatial identifier
  • index: the temporal identifier
  • coords: a pair of ordered coordinates associated with the location

Readers familiar with the key and index attributes from the tsibble package will already know the two arguments. In cubble, the key attribute identifies the row in the nested cubble, and when combined with the index argument, it identifies the row in the long cubble. Currently, cubble only supports one variable as the key, and the accepted temporal classes for the index include the base R classes Date, POSIXlt, POSIXct as well as tsibble’s tsibble::yearmonth(), tsibble::yearweek(), and tsibble::yearquarter() classes.

The coords attribute represents an ordered pair of coordinates. It can be either an unprojected pair of longitude and latitude or a projected easting and northing value. The sf package is used under the hood to calculate the bounding box, displayed in the header of a nested cubble, and perform other spatial operations.

The long cubble has a special attribute called spatial to store the spatial variables, which includes all the variables from the nested cubble except for the ts column. Below we print the attributes information for the previously shown cb_nested and cb_long objects:

attributes(cb_nested)
#> $class
#> [1] "spatial_cubble_df" "cubble_df"         "tbl_df"           
#> [4] "tbl"               "data.frame"       
#> 
#> $row.names
#> [1] 1 2 3
#> 
#> $names
#> [1] "id"     "long"   "lat"    "elev"   "name"   "wmo_id" "ts"    
#> 
#> $key
#> # A tibble: 3 × 2
#>   id                .rows
#>   <chr>       <list<int>>
#> 1 ASN00086038         [1]
#> 2 ASN00086077         [1]
#> 3 ASN00086282         [1]
#> 
#> $index
#> [1] "date"
#> 
#> $coords
#> [1] "long" "lat"
attributes(cb_long)
#> $class
#> [1] "temporal_cubble_df" "cubble_df"          "tbl_df"            
#> [4] "tbl"                "data.frame"        
#> 
#> $row.names
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30
#> 
#> $names
#> [1] "id"   "date" "prcp" "tmax" "tmin"
#> 
#> $key
#> # A tibble: 3 × 2
#>   id                .rows
#>   <chr>       <list<int>>
#> 1 ASN00086038        [10]
#> 2 ASN00086077        [10]
#> 3 ASN00086282        [10]
#> 
#> $index
#> [1] "date"
#> 
#> $coords
#> [1] "long" "lat" 
#> 
#> $spatial
#> # A tibble: 3 × 6
#>   id           long   lat  elev name              wmo_id
#>   <chr>       <dbl> <dbl> <dbl> <chr>              <dbl>
#> 1 ASN00086038  145. -37.7  78.4 essendon airport   95866
#> 2 ASN00086077  145. -38.0  12.1 moorabbin airport  94870
#> 3 ASN00086282  145. -37.7 113.  melbourne airport  94866

The following shortcut functions are available to extract components from the attributes:

  • key_vars(): the name of the key attribute as a string , i.e. "id",
  • key_data(): the tibble object stored in the key attribute,
  • key(): the name of the key attribute as a symbol in a list, i.e. [[1]] id,
  • index(): the index attribute as a symbol, i.e. date,
  • index_var(): the index attribute as a string, i.e. "date",
  • coords(): a character vector of length two representing the coordinate pairs, i.e. "long" "lat", and
  • spatial(): the tibble object for the spatial variables.