The key process for plotting using ggplot2

ggplot2 is a powerful and versatile data visualization framework in R that allows users to create elegant and informative plots through a structured, layered approach.
Data Science
Data Visualization
ggplot
Author
Affiliation
Published

June 8, 2024

Introduction

ggplot2 is a popular graphics framework in R known for its elegant and aesthetically pleasing visualizations (Wickham, 2016). It is well-structured and offers a comprehensive approach to creating various types of plots (Kassambara, 2020). This post aims to uncover the underlying structure of ggplot2, providing a foundation for creating any type of ggplot. However, it is important to note that creating plots in ggplot2 differs significantly from base graphics, which may require a steep learning curve for those familiar with traditional graphics methods (Wickham and Wickham, 2017).

To fully grasp ggplot2, it’s essential to set aside preconceived notions about base graphics and embrace the unique approach of ggplot2. By following this tutorial, you are just 5 steps away from mastering the art of creating ggplots. The distinctive feature of the ggplot2 framework is the way you make plots through adding layers. The process of making any ggplot is as follows.

We begin by loading a tidyverse ecosystem in our session. This is required because it contains several function that we are going to use

require(tidyverse)

Data

We then use the chinook dataset, which has 112 rows and 3 columns, named loc, tl, and w. The loc column is character type and represents location names. The tl and w columns are numeric (double type), representing continuous variables total length and weight, respectively.

chinook = read_csv("../data/chinook_lw.csv")
chinook |> glimpse()
Rows: 112
Columns: 3
$ tl  <dbl> 120.1, 115.0, 111.2, 110.2, 110.0, 109.7, 105.0, 100.1, 98.0, 92.1…
$ w   <dbl> 17.9, 17.2, 16.8, 15.8, 14.3, 13.8, 12.8, 11.7, 12.8, 14.8, 9.7, 7…
$ loc <chr> "Argentina", "Argentina", "Argentina", "Argentina", "Argentina", "…

The Setup

First, you need to tell ggplot what dataset to use. This is done using the ggplot(df) function, where df is a dataframe that contains all features needed to make the plot. This is the most basic step. Unlike base graphics, ggplot doesn’t take vectors as arguments.

Optionally you can add whatever aesthetics you want to apply to your ggplot (inside aes() argument) - such as X and Y axis by specifying the respective variables from the dataset. The variable based on which the color, size, shape and stroke should change can also be specified here itself. The aesthetics specified here will be inherited by all the geom layers you will add subsequently.

If you intend to add more layers later on, may be a bar chart on top of a line graph, you can specify the respective aesthetics when you add those layers.

Below, I show few examples of how to setup ggplot using in the chinook

 ggplot(data = chinook, aes(x = tl, y = w))

However, no plot will be printed until you add the geom layers.

 ggplot(data = chinook, aes(x = tl, y = w))+
  geom_point()

The aes argument stands for aesthetics. ggplot2 considers the X and Y axis of the plot to be aesthetics as well, along with color, size, shape, fill etc.

 ggplot(data = chinook, aes(x = tl, y = w, color = loc))+
  geom_point()

If you want to have the color, size etc fixed (i.e.not vary based on a variable from the dataframe), you need to specify it outside the aes(), like this.

 ggplot(data = chinook, aes(x = tl, y = w))+
  geom_point(color = "steelblue", size = 4, shape = 16)

The Layers

The layers in ggplot2 are also called geoms. Once the base setup is done, you can append the geoms one on top of the other.

 ggplot(data = chinook, aes(x = tl, y = w))+
  geom_point()+
  geom_smooth()

We have added two layers (geoms) to this plot - the geom_point() and geom_smooth().

 ggplot()+
  geom_point(data = chinook, aes(x = tl, y = w))+
  geom_smooth(data = chinook, aes(x = tl, y = w))

Since the X axis Y axis and the color were defined in ggplot() setup itself, these two layers inherited those aesthetics. Alternatively, you can specify those aesthetics inside the geom layer also as shown below.

 ggplot()+
  geom_point(data = chinook, aes(x = tl, y = w, color = loc))+
  geom_smooth(data = chinook, aes(x = tl, y = w, fill = loc, color = loc))

Notice the X and Y axis and how the color of the points vary based on the value of loc variable. The legend was automatically added. I would like to propose a change though.

 ggplot(data = chinook, aes(x = tl, y = w, color = loc, fill = loc))+
  geom_point()+
  geom_smooth()

The points in the plot are colored based on the values of the loc variable, and a legend has been automatically added. However, instead of having multiple smoothing lines for each level of 'loc, you would like to have just one smoothing line that integrates all the levels.

The Labels

Labels play a crucial role in creating clear and informative plots in ggplot2. They include titles, axis labels, legend labels, and annotations, and can be customized in a variety of ways within ggplot2.

Once you have created the main elements of the plot in ggplot2, you may want to add a title for the overall plot and customize the titles for the x and y axes. This can be accomplished using the labs layer, which is designed for specifying the labels in your ggplot2 visualization.

 ggplot(data = chinook, aes(x = tl, y = w, color = loc))+
  geom_point()+
  geom_smooth()+
  labs(
    title = "The Chinook fish", 
    subtitle = "The length and weight relationship",
    y = "Weight (kg)",
    x = "Length (cm)"
      )

Note

If you are showing a ggplot inside a function, you need to explicitly save it and then print using the print(gg), like we just did above.

The Theme

Themes in ggplot2 are responsible for controlling the overall look of the plot, which includes elements like text, lines, and background colors. These themes can be tailored to suit the visual requirements of your plot. Below are some important elements and ways to customize themes in ggplot2.

 ggplot(data = chinook, aes(x = tl, y = w, color = loc))+
  geom_point()+
  geom_smooth()+
  labs(
    title = "The Chinook fish", 
    subtitle = "The length and weight relationship",
    y = "Weight (kg)",
    x = "Length (cm)"
      ) +
  theme(
    legend.position = c(.25,.65),
    plot.title=element_text(size=30, face="bold"),   
     plot.subtitle = element_text(size=18, face="plain"),   
    axis.text.x=element_text(size=12),       
    axis.text.y=element_text(size=12),    
    axis.title.x=element_text(size=15), 
    axis.title.y=element_text(size=15),
    legend.title = element_text(size = 13, face = "bold"),
    legend.text = element_text(size = 10, face = "italic"),
    legend.background = element_rect(fill = "grey90", color = "black", linewidth = .1)
    )

Scale

In ggplot2, scales control the mapping between data and aesthetics. They are used to customize the appearance of plots, such as axis labels, colors, sizes, and shapes. Here are some key aspects of scales in ggplot2:

 ggplot(data = chinook, aes(x = tl, y = w, color = loc, fill = loc))+
  geom_point()+
  geom_smooth()+
  labs(
    title = "The Chinook fish", 
    subtitle = "The length and weight relationship",
    y = "Weight (kg)",
    x = "Length (cm)"
      ) +
  theme(
    legend.position = c(.25,.65),
    plot.title=element_text(size=30, face="bold"),   
     plot.subtitle = element_text(size=18, face="plain"),   
    axis.text.x=element_text(size=12),       
    axis.text.y=element_text(size=12),    
    axis.title.x=element_text(size=15), 
    axis.title.y=element_text(size=15),
    legend.title = element_text(size = 13, face = "bold"),
    legend.text = element_text(size = 10, face = "italic"),
    legend.background = element_rect(fill = "grey90", color = "black", linewidth = .1)
    )+
  scale_x_continuous(breaks = seq(20,125,10))+
  scale_y_continuous(breaks = seq(2,30,4))+
  scale_color_manual(values = c("firebrick", "steelblue", "orange"))+
  scale_fill_manual(values = c("firebrick", "steelblue", "orange"))

The Facets

In the previous chart, you had the scatterplot for all different values of cut plotted in the same chart. What if you want one chart for one cut?

 ggplot(data = chinook, aes(x = tl, y = w, color = loc, fill = loc))+
  geom_point()+
  geom_smooth()+
  labs(
    title = "The Chinook fish", 
    subtitle = "The length and weight relationship",
    y = "Weight (kg)",
    x = "Length (cm)"
      ) +
  theme(
    legend.position = "none",
    plot.title=element_text(size=30, face="bold"),   
     plot.subtitle = element_text(size=18, face="plain"),   
    axis.text.x=element_text(size=12),       
    axis.text.y=element_text(size=12),    
    axis.title.x=element_text(size=15), 
    axis.title.y=element_text(size=15),
    legend.title = element_text(size = 13, face = "bold"),
    legend.text = element_text(size = 10, face = "italic"),
    legend.background = element_rect(fill = "grey90", color = "black", linewidth = .1)
    )+
  scale_x_continuous(breaks = seq(20,125,10))+
  scale_y_continuous(breaks = seq(2,30,4))+
  scale_color_manual(values = c("firebrick", "steelblue", "orange"))+
  scale_fill_manual(values = c("firebrick", "steelblue", "orange"))+
  facet_wrap(~loc, nrow = 1)

In facet_wrap, the scales of the X and Y axis are fixed to accomodate all points by default. This would make comparison of attributes meaningful because they would be in the same scale. However, it is possible to make the scales roam free making the charts look more evenly distributed by setting the argument scales=free.

 ggplot(data = chinook, aes(x = tl, y = w, color = loc, fill = loc))+
  geom_point()+
  geom_smooth()+
  labs(
    title = "The Chinook fish", 
    subtitle = "The length and weight relationship",
    y = "Weight (kg)",
    x = "Length (cm)"
      ) +
  theme(
    legend.position = "none",
    plot.title=element_text(size=30, face="bold"),   
     plot.subtitle = element_text(size=18, face="plain"),   
    axis.text.x=element_text(size=12),       
    axis.text.y=element_text(size=12),    
    axis.title.x=element_text(size=15), 
    axis.title.y=element_text(size=15),
    legend.title = element_text(size = 13, face = "bold"),
    legend.text = element_text(size = 10, face = "italic"),
    legend.background = element_rect(fill = "grey90", color = "black", linewidth = .1)
    )+
  scale_x_continuous(breaks = seq(20,125,10))+
  scale_y_continuous(breaks = seq(2,30,4))+
  scale_color_manual(values = c("firebrick", "steelblue", "orange"))+
  scale_fill_manual(values = c("firebrick", "steelblue", "orange"))+
  facet_wrap(~loc, nrow = 1, scales = "free")

Wind up

ggplot2 is a powerful and versatile graphics framework in R that is renowned for its ability to create elegant and visually appealing visualizations. By following the principles and techniques outlined in this post, you are well on your way to becoming a skilled ggplot2 practitioner, capable of harnessing the power of this popular graphics framework to enhance the impact and clarity of your data-driven insights.

References

Kassambara, A., 2020. Ggpubr: ’ggplot2’ based publication ready plots.
Wickham, H., 2016. ggplot2: Elegant graphics for data analysis. Springer-Verlag New York.
Wickham, H., Wickham, M.H., 2017. Tidyverse: Easily install and load the ’tidyverse’.

Citation

BibTeX citation:
@online{semba2024,
  author = {Semba, Masumbuko},
  title = {The Key Process for Plotting Using Ggplot2},
  date = {2024-06-08},
  url = {https://lugoga.github.io/kitaa/posts/visualize_ggplotP1/},
  langid = {en}
}
For attribution, please cite this work as:
Semba, M., 2024. The key process for plotting using ggplot2 [WWW Document]. URL https://lugoga.github.io/kitaa/posts/visualize_ggplotP1/