DATIKA - Importing table files into R

Importing Data

So far, we’ve looked at several dataset in previous chapter and we have also created ourselves some datasets. While you can do all your data entry work in R or Excel, it is much more common to load data from other sources. Local and international organization have been collecting fisheries dependent and fisheries independent data for years. These historical dataset with fisheries information like fish catch, effort, landing sites, fishing ground and critical habitats can be obtained from several databases—some are open and other closed.

Much of the data we download or receive from is either comma-separated value files .csv or and Excel spreadsheets, .xlsx. .csv files are spreadsheets stored as text files - basically Excel files stripped down to the bare minimum - no formatting, no formulas, no macros. You can open and edit them in spreadsheet software like LibreOffice Calc, Google Sheets or Microsoft Excel. Many devices and databases can export data in .csv format, making it a commonly used file format that you are likely to encounter sooner rather than later.

Whether that be a comma separated (csv) or a tab delimited file, there are multiple functions that can read these data into R. We will stick to loading these data from the tidyverse packages (Wickham and Wickham, 2017) but be aware these are not the only methods for doing this. We will use the tidyverse functions just to maintain consistency with everything else we do. The first package in tidyverse we will use is called readr (Wickham et al., 2017), which is a collection of functions to load the tabular data from working directory in our machine into R session. Some of its functions include:

read_csv(): comma separated (CSV) files
read_tsv(): tab separated files
read_delim(): general delimited files
read_fwf(): fixed width files
read_table(): tabular files where columns are separated by white-space.
read_log(): web log files
readxl reads in Excel files.

Before we import the data, we need to load the packages that we will use their functions in this chapter

require(tidyverse)
require(magrittr)

Importing csv files

A CSV file is a type of file where each line contains a single record, and all the columns are separated from each other via a comma. In order to load data from a file into R, you need its path - that is, you need to tell R where to find the file. Unless you specify otherwise, R will look for files in its current working directory. You can read .csv file using read_csv() function of the readr package (Wickham et al., 2017) as shown in the chunk below;

imported.lfq = read_csv("dataset/project/tidy_LFQ_sample_4.csv")

lf4

We imported tidy_LFQ_sample_4.csv from working directory into R using read_csv() and specify the path to the file in your working directory and store as imported.lfq. If you get an error message, it means thattidy_LFQ_sample_4.csvis not in your working directory. Either move the file to the right directory (remember, you can use rungetwd()` to see what your working directory is) or change your working directory.

imported.lfq = read_csv("../data/tidy/tidy_LFQ_sample_4.csv")

If you glimpse the dataframe with glimpse() function, you should see the internal structure of the imported.lfq object we just loaded;

imported.lfq %>% 
  glimpse()

Rows: 6,185
Columns: 6
$ site  <chr> "Mombasa", "Mombasa", "Mombasa", "Mombasa", "Mombasa", "Mombasa"…
$ date  <date> 2019-04-05, 2019-04-05, 2019-04-05, 2019-04-05, 2019-04-05, 201…
$ tl_mm <dbl> 184, 185, 145, 189, 175, 165, 181, 176, 164, 154, 188, 186, 179,…
$ fl_mm <dbl> 169, 169, 134, 173, 161, 153, 165, 163, 148, 142, 173, 173, 164,…
$ wt_gm <dbl> 59.50, 54.71, 24.15, 61.36, 49.31, 38.54, 49.68, 45.27, 36.26, 3…
$ sex   <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M",…

The dataset contains six variables and 6,185 records. The variables site and sex both contain text, and have been imported as character vectors4. The date column has been imported as date format, the variable tl_mm and fl_mm are measured length and have been imported as numeric vector measured in millimeters. The variable wt_gm is the weight of fish measured in grams and also have been imported as numeric vector.

So, what can you do in case you need to import data from a file that is not in your working directory? This is a common problem, as many of us store script files and data files in separate folders (or even on separate drives). One option is to use file.choose, which opens a pop-up window that lets you choose which file to open using a graphical interface:

imported.lfq2 = read_csv(file.choose())

This solution work just fine if you just want to open a single file once. But if you want to reuse your code or run it multiple times, you probably don’t want to have to click and select your file each time. Instead, you can specify the path to your file in the call to read_csv.

Importing Excel files

Commonly our data is stored as a Excel file. There are several packages that can be used to import Excel files to R. I prefer the readxl package (Wickham and Bryan, 2018), so let’s install that:

install.packages("readxl")

The package has read_exel() function that allows us to specify which sheet within the Excel file to read. The function automatically convert the worksheet into a .csv file and read it. Let’s us import the the data in first sheet of the tidy_LFQ_sample_4.xlsx. Is a similar dataset that just imported in the previous section, but is in Excel format. We will use this file to illustrate how to import the excel file into R workspace with readxl package (Wickham and Bryan, 2018).

imported.lfq = readxl::read_excel("../data/tidy/tidy_LFQ_sample_4.xlsx", sheet = 1)

imported.lfq

# A tibble: 6,185 × 6
   site    date                tl_mm fl_mm wt_gm sex  
   <chr>   <dttm>              <dbl> <dbl> <dbl> <chr>
 1 Mombasa 2019-04-05 00:00:00   184   169  59.5 M    
 2 Mombasa 2019-04-05 00:00:00   185   169  54.7 M    
 3 Mombasa 2019-04-05 00:00:00   145   134  24.2 M    
 4 Mombasa 2019-04-05 00:00:00   189   173  61.4 M    
 5 Mombasa 2019-04-05 00:00:00   175   161  49.3 M    
 6 Mombasa 2019-04-05 00:00:00   165   153  38.5 M    
 7 Mombasa 2019-04-05 00:00:00   181   165  49.7 M    
 8 Mombasa 2019-04-05 00:00:00   176   163  45.3 M    
 9 Mombasa 2019-04-05 00:00:00   164   148  36.3 M    
10 Mombasa 2019-04-05 00:00:00   154   142  31.9 M    
# ℹ 6,175 more rows

imported.lfq %>% 
  skimr::skim()

Data summary
Name	Piped data
Number of rows	6185
Number of columns	6
_______________________
Column type frequency:
character	2
numeric	3
POSIXct	1
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
site	0	1	3	7	0	2	0
sex	0	1	1	1	0	2	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
tl_mm	1	170.77	21.08	97.0	157.00	171.00	183.00	269.00	▁▅▇▁▁
fl_mm	1	156.00	19.26	18.1	144.00	156.00	168.00	241.00	▁▁▅▇▁
wt_gm	1	46.03	19.51	7.0	32.77	43.59	55.28	194.18	▇▆▁▁▁

Variable type: POSIXct

skim_variable	n_missing	complete_rate	min	max	median	n_unique
date	0	1	2016-03-31	2020-09-11	2020-02-25	42

Saving and exporting your data

In many a case, data manipulation is a huge part of statistical work, and of course you want to be able to save a data frame after manipulating it. There are two options for doing this in R - you can either export the data as e.g. a .csv or a .xlsx file, or save it in R format as an .RData file.

Exporting data

Just as we used the functions read_csv and read_excel to import data, we can use write_csvto export it. The code below saves the bookstore data frame as a .csv file file, which will be created in the current working directory. If you wish to store


imported.lfq %>%  write_csv("assets/fao_paul_dataset/tidy/tidy_lfq.csv")

Saving and loading R data

Being able to export to different spreadsheet formats is very useful, but sometimes you want to save an object that can’t be saved in a spreadsheet format. For instance, you may wish to save a multiple processed data, functions and formula that you’ve created. .RData files can be used to store one or more R objects. To save the objects bookstore and age in a .Rdata file, we can use the save function:


save.image("assets/fao_paul_dataset/tidy/myData.RData")

References

Wickham, H., Bryan, J., 2018. Readxl: Read excel files.

Wickham, H., Hester, J., Francois, R., 2017. Readr: Read rectangular text data.

Wickham, H., Wickham, M.H., 2017. Tidyverse: Easily install and load the ’tidyverse’.

Citation

BibTeX citation:

@online{semba2024,
  author = {Semba, Masumbuko},
  title = {Importing Table Files into {R}},
  date = {2024-02-26},
  url = {https://lugoga.github.io/kitaa/posts/importingData/},
  langid = {en}
}

For attribution, please cite this work as:

Semba, M., 2024. Importing table files into R [WWW Document]. URL https://lugoga.github.io/kitaa/posts/importingData/