Create lineplots using ggplot2 to visualize changes in global life expectancy

using ggplot2, a popular data visualization package in R, to create line plots that visualize changes in global life expectancy over time.

June 2, 2024


This post provides a step-by-step guide to visualizing changes in global life expectancy using the data from the World Bank group through its package wbstats and the ggplot2 library in R. The post demonstrates various techniques for creating an informative and visually appealing line plot, including arranging the order of faceted panels, creating efficient visualizations of summary statistics, displaying graphic elements that extend beyond the plot boundary, making unique annotations in selected faceted panels, and loading custom fonts.

Packages and data cleaup


theme_set(theme_minimal(base_size = 15)) # global theme

Getting data from the World Bank

It is very simple to use wbstats if you know the names of the indicators from the World Development Indicators database. Today we download the 3 indicators used in gapminder: life expectancy, GDP per capita, and the total population. We also get the database of countries that we can look at.

my_indicators <- c(
  life_expectancy = "SP.DYN.LE00.IN", 
  gdp_capita ="NY.GDP.PCAP.CD", 
  pop = "SP.POP.TOTL"
 data = wb(
   country = "all",
   indicator = my_indicators, 
   startdate = 1880, 
   enddate = 2023,  
   return_wide = TRUE
   ) |> 
  rename(life_expectancy = 6, gdp_capita = 5, pop = 7)
data |> FSA::headtail()
      iso3c date iso2c  country gdp_capita life_expectancy      pop
1       ABW 1960    AW    Aruba         NA          64.152    54608
2       ABW 1961    AW    Aruba         NA          64.537    55811
3       ABW 1962    AW    Aruba         NA          64.752    56682
16663   ZWE 2020    ZW Zimbabwe   1372.697          61.124 15669666
16664   ZWE 2021    ZW Zimbabwe   1773.920          59.253 15993524
16665   ZWE 2022    ZW Zimbabwe   1676.821          59.391 16320537

The current dataset consists of individual countries around the world, lacking information about their respective continents. In order to link each country with its corresponding continent, we require an additional file containing a mapping of countries to continents.

countries_continents = read_csv("../data/countries_continents.csv") |> 
  janitor::clean_names() |> 
  mutate(continent = str_replace(continent, "Middle East & North Africa", "MENA"))

By combining this supplementary file with our original dataset using country names as the main identifier, we can enhance our data with continental details. This process is essential for conducting continent-level analysis and developing visual representations that offer a more comprehensive geographical perspective. = data |> 
  left_join(countries_continents) |> 
  filter(! |> 
  select(year = date, country, continent, gdp_capita:pop) |> 
  mutate(year = as.integer(year))
theme_set(theme_classic(base_size = 12))

f1 = |> 
  ggplot(aes(x = year, y = life_expectancy, 
             color = continent, fill = continent )) +
  geom_line(aes(group = country), alpha = .2)+
  stat_summary(fun = mean, geom = "line", size = 2)+
  scale_x_continuous(breaks = seq(1960,2020,20))+
  scale_y_continuous(limits = c(30, 85))


f2 = f1+
  facet_wrap(~continent, nrow = 1)+
  ggsci::scale_fill_aaas() +
  theme(legend.position = "none")


f3 = f2 +
  # year 1960
  geom_vline(xintercept = 1960, linetype = "dashed", color = "orange3") +
  # year 2020
  geom_vline(xintercept = 2020, linetype = "dashed", color = "skyblue3") +
  # add text annotation
  # year 1952
  annotate(geom = "text", x = 1960, y = 30, label = " 1960", 
           fontface = "bold", size = 2.8, hjust = 0, color = "orange3") +
  # year 2007
  annotate(geom = "text", x = 2020, y = 30, label = "2020 ", 
           fontface = "bold", size = 2.8, hjust = 1, color = "skyblue3") 


life.1960_2020 <- %>% 
  filter(year %in% c(1960, 2020)) %>% 
  group_by(continent, year) %>% 
  summarise(life_expectancy = mean(life_expectancy, na.rm = T) %>% round())

life.1960_2020 |> head()
# A tibble: 6 × 3
# Groups:   continent [3]
  continent  year life_expectancy
  <chr>     <int>           <dbl>
1 Africa     1960              43
2 Africa     2020              64
3 Americas   1960              59
4 Americas   2020              74
5 Asia       1960              51
6 Asia       2020              74
panel.titles <- |> distinct(continent) |> arrange(continent) 

f3 + 
  # not clip graphical elements beyond the panel range
  coord_cartesian(clip = "off") + 
    data = panel.titles,
    aes(x = 1980, y = 85, label = continent),
    size = 4.5, nudge_x = 10, fontface = "bold") +
  # titles
    # title = "Steady increase of Human Life Expectancy", 
    caption = "Each line represents one country; central line: the average; \nribbon, one standard deviation around the mean.",
    x = NULL)  +
    strip.text = element_blank(), 
    axis.line.y = element_blank(),
    axis.line.x = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.title.y = element_blank(),
    axis.ticks.length.y = unit(5, "pt"),
    axis.ticks.y = element_line(linetype = "solid", linewidth = .15),
    plot.title = element_text(size = 18, family = "fat"),
    plot.caption = element_text(hjust = 0, size = 10, color = "grey50"),
    plot.background = element_rect(fill = "#d8cfd0"),
    panel.background = element_rect(fill = "#f2f1ef")


