Geospatial Technology and Spatial Analysis in R
Preface
Welcome
Welcome to the world of spatial data and analysis in R! In today’s data-driven era, the importance of spatial information cannot be overstated. From urban planning and environmental management to transportation logistics and public health, the ability to understand and analyze spatial data has become essential for making informed decisions and solving complex problems.
This book serves as your comprehensive guide to harnessing the power of R for working with spatial data. Whether you are a student, researcher, data scientist, or professional in a related field, this resource will equip you with the knowledge and skills to effectively manipulate, visualize, analyze, and model spatial data using R.
R, a widely used programming language for statistical computing and graphics, offers a rich set of packages and tools specifically tailored for spatial data analysis. With its extensive capabilities, R provides a versatile and efficient environment for working with diverse spatial datasets, ranging from point patterns and geospatial polygons to raster data and satellite imagery.
In this book, we will start by introducing the fundamentals of spatial data and its representation in R. You will learn about spatial objects, coordinate reference systems, and the various data structures used to handle spatial information efficiently. We will explore techniques for importing, exporting, and manipulating spatial data, ensuring that you have a solid foundation for subsequent analysis.
The core of this book revolves around spatial analysis techniques in R. We will delve into essential topics such as spatial data visualization, spatial statistics, spatial interpolation, spatial regression, and geostatistics. Through step-by-step tutorials, code examples, and real-world applications, you will gain a deep understanding of how to apply these techniques to uncover patterns, explore relationships, and derive meaningful insights from spatial data.
Additionally, we will explore advanced topics such as spatial data modeling, spatial clustering, and spatial optimization, which will enable you to tackle complex spatial problems and make informed decisions. Throughout the book, we will emphasize the importance of sound statistical principles and best practices for spatial data analysis.
Each chapter is carefully crafted to provide a balance between theoretical concepts, practical demonstrations, and hands-on exercises. The code snippets and reproducible examples provided will help you to grasp the concepts effectively and apply them to your own spatial analysis tasks. We encourage you to actively engage with the material, experiment with the code, and explore additional resources to deepen your understanding.
Finally, we would like to express our sincere appreciation to the open-source community, which has contributed immensely to the development of R and its spatial packages. We are grateful to the countless individuals who have shared their knowledge, expertise, and code, making spatial data analysis in R accessible and empowering users around the world.
We hope that this book will serve as your valuable companion in your journey through the fascinating realm of spatial data and analysis in R. By the end, we aim to equip you with the skills and confidence to unlock the hidden insights in spatial data and contribute to the advancement of knowledge in your respective domains.
Let’s embark on this exciting adventure together and explore the vast possibilities of spatial data analysis in R!
Happy spatial coding!
Masumbuko SEMBA
What is R?
R is a programming language and environment, originally developed for statistical computing and graphics. As of October 2020, there are ~16,000 R packages in the official repository CRAN2.
Notable advantages of R are that it is a full-featured programming language, yet customized for working with data, relatively simple and has a huge collection of over 100,000 functions from various areas of interest.
R's popularity has been steadily increasing in recent years (Figures 0.1–0.3).
Figure 0.1: Stack Overflow Trend for the 'r' question tag (https://insights.stackoverflow.com/trends?tags=r)
Figure 0.2: IEEE Language Rankings 2019 (https://spectrum.ieee.org/computing/software/the-top-programming-languages-2019)
Figure 0.3: Proportion of research papers citing R (https://www.nature.com/news/programming-tools-adventures-with-r-1.16609)
A brief overview of the capabilities and packages for several domains of R use, are available in the "CRAN Task Views" (Figure 0.4).
Figure 0.4: CRAN Task Views (http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/)
R and analysis of spatial data
Introduction
Over time, there was an increasing number of contributed packages for handling and analyzing spatial data in R. Today, spatial analysis is a major functionality in R. As of October 2020, there are at least 185 packages3 specifically addressing spatial analysis in R.
Figure 0.5: Books on Spatial Data Analysis with R
Some important events in the history of spatial analysis support in R are summarized in Table 0.1.
Table 0.1: Significant events in the history of R-spatial | |
Year | Event |
pre-2003 | Variable and incomplete approaches (MASS , spatstat , maptools , geoR , splancs , gstat , …) |
2003 | Consensus that a package defining standard data structures should be useful; rgdal released on CRAN |
2005 | sp released on CRAN; sp support in rgdal (Section 7.1.3 |
2008 | Applied Spatial Data Analysis with R, 1st ed. |
2010 | raster released on CRAN (Section 5.3.4) |
2011 | rgeos released on CRAN |
2013 | Applied Spatial Data Analysis with R, 2nd ed. |
2016 | sf released on CRAN (Section 7.1.4) |
2018 | stars released on CRAN (Section 5.3.5) |
2019 | Geocomputation with R (https://geocompr.robinlovelace.net/) |
2021(?) | Spatial Data Science (https://www.r-spatial.org/book/) |
The question that arises here is: can R be used as a Geographic Information System (GIS), or as a comprehensive toolbox for doing spatial analysis? The answer is definitely yes. Moreover, R has some important advantages over traditional approaches to GIS, i.e., software with graphical user interfaces such as ArcGIS or QGIS.
General advantages of Command Line Interface (CLI) software include:
Automation—Doing otherwise unfeasible repetitive tasks
Reproducibility—Precise control of instructions to the computer
Moreover, specific strengths of R as a GIS are:
R capabilities in data processing and visualization, combined with dedicated packages for spatial data
A single environment encompassing all analysis aspects—acquiring data, computation, statistics, visualization, Web, etc.
Nevertheless, there are situations when other tools are needed:
Interactive editing or georeferencing (but see
mapedit
package)Unique GIS algorithms (3D analysis, label placement, splitting lines at intersections)
Data that cannot fit in RAM (but R can connect to spatial databases4 and other softwere for working with big data)
The following sections (0.3.2–0.3.11) highlight some of the capabilities of spatial data analysis packages in R, through short examples. We are going to elaborate on most of these packages later on in the book, and many of those examples will become clear.
Input and output of spatial data
Reading spatial layers from a file into an R data structure, or writing the R data structure into a file, are handled by external libraries:
GDAL/OGR is used for reading/writing vector and raster files, with
sf
andstars
PROJ is used for handling Coordinate Reference Systems (CRS), in both
sf
andstars
Working with specialized formats, e.g., NetCDF with
ncdf4
Package sf
combined with RPostgreSQL
can be used to read from, and write to, a PostGIS spatial database:
library(sf) library(RPostgreSQL) con = dbConnect( PostgreSQL(), dbname = "gisdb", host = "159.89.13.241", port = 5432, user = "geobgu", password = "*******" ) dat = st_read(con, query = "SELECT name_lat, geometry FROM plants LIMIT 5;")
dat ## Simple feature collection with 5 features and 1 field ## Geometry type: POINT ## Dimension: XY ## Bounding box: xmin: 35.1397 ymin: 31.44711 xmax: 35.67976 ymax: 32.77013 ## Geodetic CRS: WGS 84 ## name_lat geometry ## 1 Iris haynei POINT (35.67976 32.77013) ## 2 Iris haynei POINT (35.654 32.74137) ## 3 Iris atrofusca POINT (35.19337 31.44711) ## 4 Iris atrofusca POINT (35.18914 31.51475) ## 5 Iris vartanii POINT (35.1397 31.47415)
###sf
: Processing Vector Layers {.unnumbered}
GEOS is used for geometric operations on vector layers with sf
:
Numeric operators—Area, Length, Distance…
Logical operators—Contains, Within, Within distance, Crosses, Overlaps, Equals, Intersects, Disjoint, Touches…
Geometry generating operators—Centroid, Buffer, Intersection, Union, Difference, Convex-Hull, Simplification…
Figure 0.6: Buffer function
###stars
: Processing Rasters {.unnumbered}
Geometric operations on rasters can be done with package stars
:
Accessing cell values—As matrix / array, Extracting to points / lines / polygons
Raster algebra—Arithmetic (
+
,-
, …), Math (sqrt
,log10
, …), logical (!
,==
,>
, …), summary (mean
,max
, …), MaskingChanging resolution and extent—Cropping, Mosaic, Resampling, Reprojection
Transformations—Raster <-> Points / Contour lines / Polygons
geosphere
: Geometric calculations on longitude/latitude
Package geosphere
implements spherical geometry functions for distance- and direction-related calculations on geographic coordinates (lon-lat).
Figure 0.7: Points on Great Circle
Figure 0.8: Visualizing Facebook Friends with geosphere
(http://paulbutler.org/archives/visualizing-facebook-friends/)
gstat
: Geostatistical Modelling
As mentioned above, R was initially developed for statistical computing (Section 0.2). Accordingly, there is an extensive set of R packages for spatial statistics. For example, package gstat
provides a comprehensive set of functions for univariate and multivariate geostatistics, mainly for the purpose of spatial interpolation:
Variogram modelling
Ordinary and universal point or block (co)kriging
Cross-validation
Figure 0.9: Predicted Zinc concentration, using Ordinary Kriging
We are going to learn about the gstat
package in Chapter 12. An introduction to the package can also be found in Chapter 8 of Applied Spatial Data Analysis with R (Bivand, Pebesma, and Gomez-Rubio 2013).
spdep
: Spatial dependence modelling
Modelling with spatial weights:
Building neighbor lists and spatial weights
Tests for spatial autocorrelation for areal data (e.g., Moran's I)
Spatial regression models (e.g., SAR, CAR)
Figure 0.10: Neighbours list based on regions with contiguous boundaries
The spdep
package is beyond the scope of this book. An introduction to the package can be found in Chapter 9 of Applied Spatial Data Analysis with R (Bivand, Pebesma, and Gomez-Rubio 2013).
spatstat
: Spatial point pattern analysis
Package spatstat
provides a comprehensive collection of techniques for statistical analysis of spatial point patterns, such as:
Kernel density estimation
Detection of clustering using Ripley's K-function
Spatial logistic regression
Figure 0.11: Distance map for the Biological Cells point pattern dataset
The book Spatial point patterns: methodology and applications with R (Baddeley, Rubak, and Turner 2015) provides a thorough introduction to the subject of point pattern analysis using the spatstat
package. A more brief introduction can also be found in Chapter 7 of Applied Spatial Data Analysis with R (Bivand, Pebesma, and Gomez-Rubio 2013).
osmdata
: Access to OpenStreetMap data
Package osmdata
gives access to OpenStreetMap (OSM) data—the most extensive open-source map database in the worls—using the Overpass API5.
library(sf) library(osmdata) q = opq(bbox = "Beer-Sheva, Israel") q = add_osm_feature(q, key = "highway") dat = osmdata_sf(q) lines = dat$osm_lines pol = dat$osm_polygons pol = st_cast(pol, "MULTILINESTRING") pol = st_cast(pol, "LINESTRING") lines = rbind(lines, pol) lines = lines[, "highway"] lines = st_transform(lines, 32636) plot(lines, key.pos = 4, key.width = lcm(4), main = "")
Figure 0.12: Beer-Sheva road types map, using data downloaded from OpenStreetMap (OSM)
ggplot2
: Visualization
The ggplot2
package is one of the most popular packages in R. It provides advanced visualization methods through a well-designed and consistent syntax. The package supports visualization of both vector layers6 and rasters7.
The ggplot2
package is highly customizable and capable of producing publication-quality figures and maps as well as original and innovative designs (Figure 0.13). One of its strengths is in easy preparation of "small-multiple"—or facet, in the terminology of ggplot2
—figures (Figure 0.14).
Figure 0.13: London cycle hire journeys with ggplot2
(http://spatial.ly/2012/02/great-maps-ggplot2/)
Figure 0.14: Crime density by day with ggplot2
The ggplot2
package is beyond the scope of this book. A good place to start is the book ggplot2: Elegant Graphics for Data Analysis, by package author (Wickham 2016). The book is available online8.
leaflet
, mapview
: Web mapping
Packages leaflet
and mapview
provide methods to produce interactive maps using the Leaflet JavaScript library.
Package leaflet
gives more low-level control. Package mapview
is a wrapper around leaflet
, automating addition of useful features:
Commonly used basemaps
Color scales and legends
Labels
Popups
Function mapview
produces an interactive map given a spatial object. The zcol
parameter is used to specify the attribute used for symbology:
library(sf) library(mapview) states = st_read("USA_2_GADM_fips.shp") mapview(states, zcol = "NAME_1")
Figure 0.15: Intractive map made with mapview
Other materials
This section lists some other resources that are relevant for working with spatial data in R.
Books
Model-based Geostatistics (Diggle and Ribeiro 2007)
Spatial Data Analysis in Ecology and Agriculture using R (1st ed. 2012, 2nd ed. 2018) (Plant 2018)
Learning R for Geospatial Analysis (Dorman 2014)
Applied Spatial Data Analysis with R (1st ed. 2008, 2nd ed. 2013) (Bivand, Pebesma, and Gomez-Rubio 2013)
Hierarchical Modeling and Analysis for Spatial Data (1st ed. 2003, 2nd ed. 2014) (Banerjee, Carlin, and Gelfand 2014)
An Introduction to R for Spatial Analysis and Mapping (1st ed. 2015, 2nd ed. 2018) (Brunsdon and Comber 2015)
Spatial Point Patterns: Methodology and Applications with R (2015) (Baddeley, Rubak, and Turner 2015)
Displaying Time Series, Spatial, and Space-Time Data with R (1st ed. 2014, 2nd ed. 2018) (Lamigueiro 2014)
Geocomputation with R (Lovelace, Nowosad, and Muenchow 2019)
Spatial Data Science (2021?)