Chapter 1 Introduction

An R package is a mechanism for extending the basic functionality of R. It is the natural extension of writing functions that each do a specific thing well. When we interact with R, we use functions from either packages or R-base(R Core Team 2020). The use of functions simplifies things for the user because the user no longer needs to be knowledgeable of the details of the underlying code. They only need to understand the inputs and outputs.

Once one has developed many functions, it becomes natural to group them in to collections of functions that are aimed at achieving an overall goal. This collection of functions can be assembled into an R package. R packages represent another level of abstraction, where the interface presented to the user is a set of user-facing functions. These functions provide access to the underlying functionality of the package and simplify the user experience because the one does not need to be concerned with the many other helper functions that are required.

R packages are a much better way to distribute code to others because they provide a clean and uniform user experience for people who want to interact with your code. R packages require documentation in a standardized format, and the various tools that come with R (and RStudio) help to check your packages so that they do not contain inconsistencies or errors. R users are already familiar with how to use R packages, and so they will be able to quickly adopt your code if is presented in this format.

This chapter highlights the key elements of building R packages. The fine details of building a package can be found in the Writing R Extensions manual. The objectives of this section are:

  • Recognize the basic structure and purpose of an R package
  • Recognize the key directives in a NAMESPACE file

1.1 Basic Structure of an R Package

An R package begins life as a directory on your computer. This directory has a specific layout with specific files and sub-directories. The two required sub-directories are

  • R, which contains all of your R code files
  • man, which contains your documentation files.

At the top level of your package directory you will have a DESCRIPTION file and a NAMESPACE file. This represents the minimal requirements for an R package. Other files and sub-directories can be added and will discuss how and why in the sections below.

While RStudio is not required to build R packages, it contains a number of convenient features that make the development process easier and faster. That said, in order to use RStudio for package development, you must setup the environment properly. Details of how to do this can be found in Roger’s RStudio package development pre-flight check list.

1.1.1 DESCRIPTION File

The DESCRIPTION file is an essential part of an R package because it contains key metadata for the package that is used by repositories like CRAN and by R itself. In particular, this file contains the package name, the version number, the author and maintainer contact information, the license information, as well as any dependencies on other packages.

As an example, here is the DESCRIPTION file for the wior package (Semba and Peter 2020). This package provides a function for accessing and downloading oceanographic and meteorological data in tidy format.

Package: wior
Title: Easy Tidy and Process Oceanographic Data
Version: 0.0.0.1
Date: 2020-10-28
Authors@R: 
    c(person(given = "Masumbuko",
           family = "Semba",
           role = c("aut", "cre"),
           email = "first.lugosemba@gmail.com",
           comment = c(ORCID = "https://orcid.org/0000-0002-5002-9747")),
    person(given = "Nyamisi",
           family = "Peter",
           role = "aut",
           email = "first.nyamisip@gmail.com",
           comment = c(ORCID = "https://orcid.org/0000-0002-4376-2588")))
Description: *wior* is a package intended to transform and process large, complex and diverse oceanographic data acquired through satellite or instrument and turn them into a tibble, an easy way to handle and process data format in R.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
Imports: 
    broom,dplyr,magrittr,oce,ggplot2,metR,tidyverse,htmltools,htmltools,sf,btb,tibble,patchwork,rerddap,lubridate,stats,tidyr,raster,cowplot,stars,cubelyr,worldmet,stringr
Suggests: knitr,rmarkdown
VignetteBuilder: knitr
Depends: R (>= 2.10)
URL: https://github.com/lugoga/wior

1.1.2 NAMESPACE File

The NAMESPACE file specifies the interface to the package that is presented to the user. This is done via a series of export() statements, which indicate which functions in the package are exported to the user. Functions that are not exported cannot be called directly by the user (although see below). In addition to exports, the NAMESPACE file also specifies what functions or packages are imported by the package. If your package depends on functions from another package, you must import them via the NAMESPACE file.

Below is the NAMESPACE file for the wior package (Semba and Peter 2020).

# Generated by roxygen2: do not edit by hand

export("%>%")
export(add_label)
export(binning_points)
export(cnv_tibble)
export(compute_corr)
export(degree2fahrenheit)
export(fahrenheit2degree)
export(first_cap)
export(get_1daysla)
export(get_chlModis)
export(get_chlSeawif)
export(get_meteo)
export(get_mld)
export(get_oscar)
export(get_ppMODIS)
export(get_rainLand)
export(get_sss)
export(get_sstMODIS)
export(get_sstMUR)
export(get_windAscat)
export(get_windQuikscat)
export(interp_2d)
export(inverse_hyperbolic)
export(measure_location)
export(measure_summary)
export(measure_symetry)
export(measure_variation)
export(monsoon_season)
export(outlier_remove)
export(plot_profile)
export(point_tb)
export(polygon_tb)
export(raster_tb)
export(sf_crop)
export(sf_tibble)
export(tibble_sf)
export(transect)
importFrom(magrittr,"%>%")

Here we can see that all functions are exported from the wior package. We also note that only a single function is importedfrom other packages. importFrom() function takes a package and a series of function names as arguments. This directive allows you to specify exactly which function you need from an external package. For example, this package imports the pipe operator %>% function from the magrittr package (Bache and Wickham 2020).

With respect to exporting functions, it is important to think through carefully which functions you want to export. First and foremost, exported functions must be documented and supported. Users will generally expect exported functions to be there in subsequent iterations of the package. It’s usually best to limit the number of functions that you export (if possible). It’s always possible to export something later if it is needed, but removing an exported function once people have gotten used to having it available can result in upset users. Finally, exporting a long list of functions has the effect of cluttering a user’s namespace with function names that may conflict with functions from other packages. Minimizing the number of exports reduces the chances of a conflict with other packages (using more package-specific function names is another way).

1.1.3 Namespace Function Notation

As you start to use many packages in R, the likelihood of two functions having the same name increases. For example, the commonly used dplyr package has a function named filter(), which is also the name of a function in the stats package. If one has both packages loaded (a more than likely scenario) how can one specific exactly which filter() function they want to call?

In R, every function has a full name, which includes the package namespace as part of the name. This format is along the lines of

::

For example, the filter() function from the dplyr package can be referenced as dplyr::filter(). This way, there is no confusion over which filter() function we are calling. While in principle every function can be referenced in this way, it can be tiresome for interactive work. However, for programming, it is often safer to reference a function using the full name if there is even a chance that there might be confusion.

1.1.4 Loading and Attaching a Package Namespace

When dealing with R packages, it’s useful to understand the distinction between loading a package namespace and attaching it. When package A imports the namespace of package B, package A loads the namespace of package B in order to gain access to the exported functions of package B. However, when the namespace of package B is loaded, it is only available to package A; it is not placed on the search list and is not visible to the user or to other packages.

Attaching a package namespace places that namespace on the search list, making it visible to the user and to other packages. Sometimes this is needed because certain functions need to be made visible to the user and not just to a given package.

1.1.5 The R Sub-directory

The R sub-directory contains all of your R code, either in a single file, or in multiple files. For larger packages it’s usually best to split code up into multiple files that logically group functions together. The names of the R code files do not matter, but generally it’s not a good idea to have spaces in the file names.

1.1.6 The man Sub-directory

The man sub-directory contains the documentation files for all of the exported objects of a package. With older versions of R one had to write the documentation of R objects directly into the man directory using a LaTeX-style notation. However, with the development of the roxygen2 package, we no longer need to do that and can write the documentation directly into the R code files. Therefore, you will likely have little interaction with the man directory as all of the files in there will be auto-generated by the roxygen2 package.

1.1.7 Summary

R packages provide a convenient and standardized mechanism for distributing R code to a wide audience. As part of building an R package you design an interface to a collection of functions that users can access to make use of the functionality you provide. R packages are directories containing R code, documentation files, package metadata, and export/import information. Exported functions are functions that are accessible by the user; imported functions are functions in other packages that are used by your package.

References

Bache, Stefan Milton, and Hadley Wickham. 2020. Magrittr: A Forward-Pipe Operator for R. https://CRAN.R-project.org/package=magrittr.

R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Semba, Masumbuko, and Nyamisi Peter. 2020. Wior: Easy Tidy and Process Oceanographic Data.