Chapter 1 Why Programming?

The benefit of learning to program are numerous. Programming makes data-analysis more efficient, accurate and transparent—opens new doors for new analyses that would not be practical or possible without programming. Most experiments for instance are often carried out on computers, so programming is vital for designing, creating and implementing experiments.

The advent in computer technology plays an important role in research, and there are many kinds of proprietary softwared developed specifically to deal with data ana anlysis in research. For example most statistians and researchers analyse their data with using proprietary software like Microsoft Excel, SPSS, SAS and many others. These kind of canned software are widely used and useful. They are generally user-friendely with graphic user interface (GUI)1 require lillte or no programming knowledge, and you can complete analytical tasks in relatively small amount of time.

Unfortunate the click-and drag interface you interact with in the GUI is a result of code hidden to user with these software. So you can only accomplish the taks based on how the software is program to do and incapable of doing some task that it was not designed for. You will sometimes find that the software is simply incapable of doing a particular kind of statistical anlysis or running a particular kind of experiment. At that juncture, it becomes apparent for researcher and scientists to learn programming, which help to solve solutions to unique and emerging research problems directly.

Learning how to program and create your own code allows control over every detail of analyzing data. This level of control is invaluable for creating flexible and customizable data analysis, and for being confident in the output that decision makers depends. Another advantage of learning to program worth mentioning is time-saving. Becoming fluent in programming enables you to handle research data in relatively short time. This is because with programming language skills, you automate most of the tasks with scripts or code, which save you copious amounts of time in analysing your data and open new way of exploring and visualizing the data you are working on. This make you understand the data in a different way.

Finally, computer programming is a valuable skill in general and may open doors in the larger workforce in this digital age.

1.1 Learning to program

Learning programming is a skill and requires an initial investment. It takes practice, time, and effort. There is no easy way out. There are many layers to individual programming languages, and there are many programming languages out there to learn that could be useful. Befere you begin to learn to code, it is important to recognize that the underlying skill of computer programming is problem-solving—the ability to sovle new problem yourself with computer. From the perspective of applying computer programming techniques to solve problems in science and other fields, theae are three major aspects. These are:

  1. Understaing the issues that need to be addressed
  2. Understanding the tools available at your disposal
  3. Applying the tools to solve the problem

In most cases people know the issues and want to solve them problems, the biggest and vital componet that is often missing are the tools. Recognize the tools is one way but understand how to use the tools to the solve the proleme is the most difficult step. This is what they call the chicken and egg problem. What should you learn first? Should you learn about how organize codes at the beginning or after you have learn some basic aspects of the language? Should you learn random tidbits of programming languages before learning how to apply those tidbits to solve a problem? This is a dilemma for most of us, because we real unsure of the base and where to start.

The major porting of this book will deal with the understanding the tool— the basic building blocks of any programming languages. Fortunate, most programming languages use the same basci concept forming the building bloacks for making all sort of programs. However, this book will specifically focus on R language. We believe that iving examples codes will make you understand the tools available to you and this has spill over effect as it will help you to:

  1. understand the conceptual tools
  2. write codes to implement the tools using the specific syntax2 to solve the problems or address the issues you are facing.

Coding is writing a recipe for solving a problem. More specifically, it is writing the solution to a problem in a highly detailed manner that forces a computer to follow the directions to solve the problem. scientists have dubbed these code-based recipes algorithms3. Learning how to create algorithms involves first ldeterminging the problem and then converting the solution into an ordered steps that solve the problem. These ordered steps of instructions make use of the tools or building blocks of the programming language.

1.2 Programming in R

All programming languages involves the same basic building blocks. This chapter introduced the building blocks in R. Once you master the building block in R, you can switch to other languages with little difficulties. Because our goal is to learn how to code to analyze data and produce graphics, we will begin learning R, which is well-suited to this purpose. First, we will learn how to work with basic programming concepts in R , then we will learn how to handle and analyse data in R.

1.2.1 What is R?

R is primarily a programming language for statistical analysis data manipulation and graphics (R Core Team 2018). It is a powerful language used in mathematical operations, data-processing, analysis and graphical display of data. Xia, Sun, and Chen (2018) pointed out that R is a vehicle for newly developing methods of interactive data analysis. R have rapidly developed and extended its capability to a large collection of packages provided by researchers and volunteers. R can be downloaded from https://www.r-project.org and available for all thre major operating systems—Windows, Mac and Unix/Linux. Like other statistical software, R provides a statistical framework and terminal-based interface for users to parse commands for data ingestion, manipulation and graphics.

The R programming usage has become one-stop solution to data analysis. R was created in 1993 and has evolved into a stable programming language. It has become a de facto standard for data analysis both in academic and industry sectors. R has its roots in the statistics community, being created by statistians for statistics. Many of its core tools are directed toward statistics. Being an open-source language, R has many advantage oover other commercial statistical platform like MATLAB, SAS and SPSS. The big rip for using R is its ecosystem of packages.

R is an interpreted language—the expression specified in this languages executes line by line similar to other languages like python or ruby rather than compiling the source code an executable chunk as in C++. One of the power of R language is its dynamic—infers the data types of the variables based on the context. We do not need to declare variables separately. R is considered as esoteric because its syntax are easily understood by people from different fields.

1.2.2 Installing R

R is available on most computing platform like Windows. Linux and Mac. There are two ways of installing R in the computer. The first is downloading latest version of R Binary and compiling R from the https://cran.r-project.org/ depending on your operating system. The compiled binary executes and install R in your machine. R can also be installed using packages managers. For example, Ubuntu users can install R using the apt-get package manager by running: > sudo apt-get install r-base r-base-dev

Similarly, Mac users can install R using ports > sudo port install R

There is an active community of R developers which regularly releases a new version R. Each version has a name assigned to it. Except for major resease, the version of R are usually backward compatible in terms of their funcitonality. The version and common packages used in this book can be accessed by simply passing a commands devtools::session_info() in the console. you can also check the version of R installed with version() function

version
               _                           
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          4.4                         
year           2018                        
month          03                          
day            15                          
svn rev        74408                       
language       R                           
version.string R version 3.4.4 (2018-03-15)
nickname       Someone to Lean On          

1.2.3 What is RStudio?

Once you have installed R on your computer, you might want to install another program called Rstudio. This program allows the user to run R in a more user-friendly environment (Xia, Sun, and Chen 2018). RStudio is a free and open-source integrated development environment (IDE) for R (RStudio Team 2016). You must have already installed R in your machine before you install Rstudio. You can download the software from the R-studio website at this link http:/www.rstudio.com.

Rstudio work in the four operating systems—Windows, Mac, Ubuntu and Fedora. To obtain the reference infomation for citing RStudio in publication by simply typing RStudio.version() command in the console.

1.2.4 Basic Featurs of RStudio

After the installation of Rstudio, the software is launched either by clicking the shortcult icon on the desktop or in startup menu of windows operating system. The sotware comes up with several window panels (Figure 1.1. The four windows in Rstudio are editor, console, worksapce and plots.`

  • source editor and data viewer: The top left corner contains the script editor. This a simple text editor for writing and editing R scripts and markdown documents. Several tabs can be opened in the editor window at once, with each tab representing a different document. These files can be saved from the editor into the working directory. The editor window is also used to view data frame similar to Excel spreadsheet. Source editor is recommended in most programming work because of its efficient way to manage scripts and Rmarkdown document for reproducible work.

  • environment (workspace) and history: The top right panel contains several tabs but I introduce the workspace and history tabs. The workspace list all the objects that are currently loaded in R’s memory. You can check each object by clicking them or run View() function in console. The history tab provides a record of the recent commands executed in the console, scripts or rmarkdown documents.

  • R consoles: The bottom left window is the command line widely known as console. this window is similar to console in base R. This is used to directly enter commands into R. Once you have entered commands here, press enter to execute the command. The console is useful for entering single lines of code and running them. Oftentime this occurs when you are exploring or testing a certain task with a combinations of functions and objects. But because coding is all about creating scripts that automate the analytical process, you will find that very rare you use the the console.

  • files, plots, packages, help, and viewer: The bottom-right window has five tabs for files, plots, package, help and viewer. The files tab allows browsing of the computers file directory. The plots tab will show recent plots and figures drawn with graphic functions in R. The packages tabl list all packages installed in R and provide tools for download new package and update packages. The help tab is an invaluable tab, it help to search for functions and see examples of how they are used. The viewer tab is like the plot tab that both are used to show recent plots and figures. However, while the plots is for static plot, the viewer is for animations and interactive plots and figures.

The graphical user interface of RStudio

Figure 1.1: The graphical user interface of RStudio

An important concept in R is the current working directory. This is the file folder that R points to by default. By default the software stores all of your R-related files in the startup folder, which is the Document folder. Alternatively, you can create a personal working directory in which to store your R-related files with the setwd() function. This is especially important when reading in data to R. The current working directory should be set to the folder containing the data to be loaded into R.

1.2.5 Rstudio editor

You should be making use of Rstudio’s text edditor to code in R. It is important to practice writing good code that are easy to read and understand what they mean. In short make sure your code is easy to read and understand what it does. This will help you understand your own code later, and help other people understand your code when shared or when they ask you for a help.

1.2.6 Rmarkdown

Rstudio’s text editor allows to create markdown files called rmarkdown. Markdown is a plain text that combine codes and text and ability to ouput different format like word, PDF and HTML format. This ability of combining text and code has made rmarkdown a powerful markup language in R. In general, with rmarkdown you can write document with titles, headings, paragraphs, insert figures, tables and equations.

Another useful feature of rmarkdown document is the option to publish R projects in various file format such as HTML, Latex, PDF EPUB, Word and many others. This feature enables you to share your results with colleagues who may or may not have R software. The published document includes formated plain text in section, heading and paragraphs that comes with code chunk, figures and tables.

1.2.7 Use descriptive names

While programming, you will often declare and assign names of the variables. R usually will not care what name you give to any variables. But declare a descriptive name is informative because you easily remember what it contains. Therefore, its is encouraged to give names that represent the meaning of the data stored in each variable.

1.2.8 Use comments

While working with R script, the hash tag # set a comment and tell the script to skip the line with the hash tag. In rmarkdown, the hash tag are used to set headings and therefore you can specify the comment with hash tage in the chunk code. Comments are useful as they help you and other people about a particular code and what particular aspect it does. Therefore, it is always recommend to write clear and precise comments that can help you in the future when you want to understand what the code does and also can help others follows with little consultation if shared.

References

R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Xia, Yinglin, Jun Sun, and Ding-Geng Chen. 2018. “Introduction to R, Rstudio and Ggplot2.” In Statistical Analysis of Microbiome Data with R, 77–127. Springer.

RStudio Team. 2016. RStudio: Integrated Development Environment for R. Boston, MA: RStudio, Inc. http://www.rstudio.com/.


  1. The formal rules of formulating the statements of a computer language

  2. A precise step-by-step plan for a computational procedure that begins with an input value and yield an ouput value in a finite number of steps