Manipulating Simple features: point to polyline
In this post we are going to learn one of the key skills that one dealing with spatial data need to know. That’s is how to read a file that contain geographical information (longitude and latitude) and associated attribute information for each location. In its native tabular form, the geographical information are of no use at all until are converted and presented as spatial information. While industrial leading software like ArcMAP and QGIS provide the tools necessary for that conversion, but these software requires us to click the function every time we want to use them to do the process.
Unlike ArcMAP and QGIS, R software has tools that allows automate processing and conversion of geographical information into spatial data in an intuitive approach, which offers a possible way to reproduce the same. Before we proceed with our task, we need to load some of the packages whose function are needed for this post. We simply load these packages using require function of R (r-base?). The chunk below load these packages;
Dataset
We are going to use Argo float dataset. The step involved to prepare this dataset are explained in Getting and Processing Satellite Data Made Easier in R Let’s load this dataset. We import this dataset using read_csv function from readr package (Wickham, Hester, and Francois 2017)
argo.surface.tb = read_csv("../data/argo_profile_cast_location.csv") %>% 
  janitor::clean_names()
argo.surface.tb# A tibble: 217 x 8
    scan date       time     longitude latitude depth temperature salinity
   <dbl> <date>     <time>       <dbl>    <dbl> <dbl>       <dbl>    <dbl>
 1   1.5 2008-10-28 01:43:57      64.0    -21.6     0        23.5     35.4
 2   1.5 2008-11-07 02:17:07      64.2    -21.3     0        24.6     35.0
 3   1.5 2008-11-17 02:30:18      64.1    -21.0     0        25.0     35.0
 4   1.5 2008-11-27 02:37:35      63.6    -21.1     0        25.7     35.0
 5   1.5 2008-12-07 05:03:59      63.4    -21.3     0        26.1     35.0
 6   1.5 2008-12-17 03:03:40      63.3    -21.5     0        27.0     35.1
 7   1.5 2008-12-27 05:09:53      63.5    -21.6     0        26.7     35.1
 8   1.5 2009-01-06 02:40:37      63.9    -21.6     0        28.4     35.1
 9   1.5 2009-01-16 04:57:04      64.4    -21.5     0        28.0     35.1
10   1.5 2009-01-26 02:30:53      64.8    -21.2     0        28.1     35.0
# ... with 207 more rowsThe dataset has 217 rows and eight variables with date and time stamp the argo surfaced and pushed profile data into the satellite after roving the water column after ten days. The geographical location (longitude and latitude) of the float when was at the surface with the associated temperature and salinity at the surface represent with zero depth.
Simple feature
The main task of this post is to convert this dataset that contain 217 geographical information into simple feature. Wikipedia contributors (2022) describe simple feature as set of standards that specify a common storage and access model of geographic feature made of mostly two-dimensional geometries used by geographic information systems. In general, simple feature is a model of a non-topological way to store geospatial data in a database. The three common types of simple feature are point, lines and polygon, each represent a particular type of geographical feature on earth. Further, simple features refers a formal standard that describes how objects in the real world can be represented in computers, with emphasis on the spatial geometry of these objects.
Pebesma (2018) developed a simple feature packages (sf) that is dedicated to deal with simple feature objects in R. The packages has hundred of functions that make dealing with spatial information in R environment much easier than before. Its not the focus of this post to describe the package, but if you wish to dive on this topic you can simply consult book such as geocompuation in R (Lovelace, Nowosad, and Muenchow 2019).
For this post we begin by looking on how we can convert tabular data with geographical information into simple feature. That is achieved using st_as_sf function of sf package (Pebesma 2018) and parse the argument coords = c("longitude", "latitude") that bind the geographical coordinate and argument the datum crs = 4326 to define the datuma and the geographical coordinate system, which is WGS 1984 (epsg = 4326). The chunk below has code of the above description;
argo.surface.sf = argo.surface.tb %>% 
  st_as_sf(coords = c("longitude", "latitude"), crs = 4326)
argo.surface.sfSimple feature collection with 217 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 39.94 ymin: -22.623 xmax: 65.075 ymax: -1.233
Geodetic CRS:  WGS 84
# A tibble: 217 x 7
    scan date       time     depth temperature salin~1         geometry
 * <dbl> <date>     <time>   <dbl>       <dbl>   <dbl>      <POINT [°]>
 1   1.5 2008-10-28 01:43:57     0        23.5    35.4  (64.03 -21.644)
 2   1.5 2008-11-07 02:17:07     0        24.6    35.0  (64.198 -21.28)
 3   1.5 2008-11-17 02:30:18     0        25.0    35.0  (64.068 -21.01)
 4   1.5 2008-11-27 02:37:35     0        25.7    35.0 (63.607 -21.102)
 5   1.5 2008-12-07 05:03:59     0        26.1    35.0 (63.429 -21.253)
 6   1.5 2008-12-17 03:03:40     0        27.0    35.1 (63.322 -21.458)
 7   1.5 2008-12-27 05:09:53     0        26.7    35.1 (63.472 -21.558)
 8   1.5 2009-01-06 02:40:37     0        28.4    35.1 (63.901 -21.599)
 9   1.5 2009-01-16 04:57:04     0        28.0    35.1 (64.447 -21.493)
10   1.5 2009-01-26 02:30:53     0        28.1    35.0 (64.835 -21.207)
# ... with 207 more rows, and abbreviated variable name 1: salinityA printed object is the metadata of the file with key information that tells us that is a simple feature with 217 simple features (recores) and six variables (column) and the type of the object are points. The metadata also display the bounding extent covered with these points and the datum of WGS84. We can check whether the conversion is successful by simply map the simple feature object we created into an interactive map shown in Figure 1.
argo.surface.sf %>% 
  tm_shape() +
  tm_bubbles(
    size = "temperature", 
    col = "salinity", 
    scale = c(0.3,.8),
    border.col = "black", 
    border.alpha = .5, 
    style="fixed", 
    breaks=c(-Inf, seq(34.8, 35.6, by=.2), Inf),
    palette="-RdYlBu", contrast=1, 
    title.size="Temperature", 
    title.col="Salinity (%)", 
    id="Date",
    popup.vars=c("Temperature: "="temperature", "Salinity: "="salinity"),
    popup.format=list(temperature=list(digits=2))
    ) Trajectory
Looking on Figure 1 we notice that the points are following a certain. This path is commonly known as trajectory – a path that an Argo float follows through in the Indian Ocean over its lifespan. Therefore, we ought to convert the point simple feature into trajectory. Fortunate, a combination of function from dplyr package (Wickham et al. 2019) and sf (Pebesma 2018) has made it possible. Though we can create a trajectory for the entire lifespan it recorded profiles in the area, but for exploration purpose, I first created a year variable and extract year variable from date variable and then use the year variable to group the information by year and then create trajectories that are grouped by year.
argo_traj = argo.surface.sf %>% 
  dplyr::mutate(year = lubridate::year(date) %>% as.factor()) %>% 
  dplyr::group_by(year) %>% 
  dplyr::summarise() %>% 
  sf::st_cast(to = "LINESTRING")
tm_shape(shp = argo_traj)+
  tm_lines(col = "year", lwd = 3, stretch.palette = TRUE)We notice that Figure 2 trajectories are un ordered and very confusing. In fact the trajectories in Figure 2 do not reflect the pathway of the Argo float. The problem like this arise especially when you forget to parse an argument do_uninon = FALSE in the summarise function. But if we simply parse that argument, as the chunk below highlight, we correct the observed error and create a trajectory that reflect the pathwary of the Argo float shown in Figure 3
argo_traj = argo.surface.sf %>% 
  mutate(year = lubridate::year(date) %>% as.factor()) %>% 
  arrange(date) %>% 
  group_by(year) %>% 
  summarise(do_union = FALSE) %>% 
  st_cast(to = "LINESTRING")
tm_shape(shp = argo_traj)+
  tm_lines(col = "year", lwd = 3, stretch.palette = TRUE)