Rplicate Series: Bold Axis & Character with ggplot2
Dyah Nurlita S

10 minute read

Welcome again to the Rplicate Series! In this 6th article of the series, we will replicate The Economist plot titled “Marvellous”. In the process, we will explore ways to use bold text and characters for our axes. Let’s dive in below!

Load Packages

These are the packages and some set up that we will use.

library(tidyverse) # for data wrangling
library(ggplot2) # for data visualization
library(scales) # to customize axes in plot
library(ggrepel) # add & customize repelled text
library(grid) # create grid & enhance the layouting of plot
library(gridExtra) 
library(png) # import plot to image
library(extrafont) # font library

# load font from local
font_import() # type y when asked to import
#> Importing fonts may take a few minutes, depending on the number of fonts and the speed of the system.
#> Continue? [y/n]
loadfonts(device = "win")

# to prevent R displaying scientific notation
options(scipen = 100) 

Dataset

The plot that we are going to replicate comes from this article. It contains information about sales and market shares of Disney Box Office in United States, Canada, and around the world in the year of 2019.

First Plot (Bar Plot)

Data Wrangling

Before making a visualization, let’s prepare and clean the data:

  # read csv
  box_office_sales <- read_csv("data_input/rplicate6/box office sales 2019.csv")
  head(box_office_sales)
  #> # A tibble: 6 x 6
  #>    Rank Movie `Worldwide Box ~ `Domestic Box O~ `International ~
  #>   <dbl> <chr> <chr>            <chr>            <chr>           
  #> 1     1 Aven~ $2,797,800,564   $858,373,000     $1,939,427,564  
  #> 2     2 The ~ $1,656,313,097   $543,638,043     $1,112,675,054  
  #> 3     3 Froz~ $1,430,769,340   $474,900,703     $955,868,637    
  #> 4     4 Spid~ $1,131,927,996   $390,532,085     $741,395,911    
  #> 5     5 Capt~ $1,129,729,839   $426,829,839     $702,900,000    
  #> 6     6 Toy ~ $1,073,394,813   $434,038,008     $639,356,805    
  #> # ... with 1 more variable: `Domestic Share` <chr>

In this visualization, we only use the first 11 row of data and column Movie, Worldwide Box Office, and International Box Office. Therefore let’s select these data and rename some column.

data_1 <- head(box_office_sales, 11) %>% 
  select(Movie,
         'Domestic Box Office',
         'International Box Office')
data_1
#> # A tibble: 11 x 3
#>    Movie                           `Domestic Box Offic~ `International Box Offi~
#>    <chr>                           <chr>                <chr>                   
#>  1 Avengers: Endgame               $858,373,000         $1,939,427,564          
#>  2 The Lion King                   $543,638,043         $1,112,675,054          
#>  3 Frozen II                       $474,900,703         $955,868,637            
#>  4 Spider-Man: Far From Home       $390,532,085         $741,395,911            
#>  5 Captain Marvel                  $426,829,839         $702,900,000            
#>  6 Toy Story 4                     $434,038,008         $639,356,805            
#>  7 Joker                           $335,251,773         $736,645,941            
#>  8 Star Wars: The Rise of Skywalk~ $511,874,363         $548,577,048            
#>  9 Aladdin                         $355,559,216         $695,400,000            
#> 10 Jumanji: The Next Level         $301,861,286         $468,828,061            
#> 11 Fast & Furious Presents: Hobbs~ $173,956,935         $586,625,355

Since there’s no information about “Jumanji” movie inside the plot, it is best to remove it from the data.

data_box_office <- data_1 %>% filter(Movie !='Jumanji: The Next Level')

Next, we will also perform some string manipulation to replace special character like “$” and “,” into a blank space. This is to make sure that we can convert the columns into its correct data type.

data_box_office[,c(2,3)]<- lapply(data_box_office[,c(2,3)], 
                                  function(x) gsub('\\$', '', x))
data_box_office[,c(2,3)]<- lapply(data_box_office[,c(2,3)], 
                                  function(x) gsub('\\,', '', x))
data_box_office[-1] <- lapply(data_box_office[-1], as.numeric)

head(data_box_office, 3)
#> # A tibble: 3 x 3
#>   Movie             `Domestic Box Office` `International Box Office`
#>   <chr>                             <dbl>                      <dbl>
#> 1 Avengers: Endgame             858373000                 1939427564
#> 2 The Lion King                 543638043                 1112675054
#> 3 Frozen II                     474900703                  955868637

Now, let’s also transform the original wide-format data frame into its long-format for easier plotting.

#> # A tibble: 6 x 3
#>   Movie             revenue_type                  value
#>   <chr>             <chr>                         <dbl>
#> 1 Avengers: Endgame Domestic Box Office       858373000
#> 2 Avengers: Endgame International Box Office 1939427564
#> 3 The Lion King     Domestic Box Office       543638043
#> 4 The Lion King     International Box Office 1112675054
#> 5 Frozen II         Domestic Box Office       474900703
#> 6 Frozen II         International Box Office  955868637

Create Visualization

plot_bar <- ggplot(data = data_plot,
                   aes(x = reorder(Movie, value), y = value)) +
  coord_flip() +
  geom_col(aes(fill = revenue_type),
           position = position_stack(reverse = TRUE),
           width = 0.75) +
  geom_hline(yintercept = 0,
             lwd = 1.75) +
  labs(x = "",
       y = "")
plot_bar

We need to declare specific format for x-axis and y-axis text. Namely, bold character for specific movie and additional “>” character with its red color. We can use function expression(). Here is the code below:

plot_bar <- plot_bar +
   scale_x_discrete(labels = rev(
     c(expression(paste(bold("Avengers:   Endgame"))), # adding bold format
       expression(paste(bold("The Lion King"))),
       expression(paste(bold("Frozen II"))),
       expression(paste("Spider-Man: Far From Home")),
       expression(paste(bold("Captain Marvel"))),
       expression(paste(bold("Toy Story 4"))),
       expression(paste("Joker")),
       expression(paste(bold("Aladdin"))),
       expression(paste(bold("Star Wars: The Rise of Skywalker"))),
       expression(paste("Fast & Furious: Hobbs & Shaw")))
                  )) +
  scale_y_continuous(limits = c(0, 3000000000),
                     labels = c("0","1","2","3"),
                     expand = c(0, 2), # adjust additional space after axis limits
                     position = "right" # makes it on top (because of coord_flip())
                     )

plot_bar

plot_bar <-
  plot_bar +
  labs(title = "Box-office sales, 2019, $bn \n",
       caption_left = " \n") # \n adding empty line in left side of the plot 
                             # to position ">" character later)    

plot_bar

And then we can apply theme into the plot:

# apply theme
plot_bar <- plot_bar +
  
  theme(
    
    legend.title = element_blank(),
    legend.direction = "vertical",
    legend.box = "horizontal",
    legend.position = c(0.7,1.18),
    legend.text = element_text(size = 14),
    
    panel.background = element_blank(),
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_line(color = "#B2C2CA", size = 1),
    panel.grid.major.y = element_blank(),
    
    plot.title = element_text(face = "bold", size = 18, hjust = -2.32, vjust = -0.1),

    axis.text = element_text(color = "black"),
    axis.ticks = element_blank(),
    axis.text.y = element_text(hjust = 0, size = 14),
    axis.text.x = element_text(size = 14),
    
    text = element_text(family = "Calibri")
   
  )

# edit legend title and labels
plot_bar <- plot_bar + 
   scale_fill_manual(values = c("#076fa1","#2fc1d3"), # color for the fill
                     name = "\n", # no title, only empty line
                     labels = c("United States and Canada", 
                                "Rest of the world"))

plot_bar

Save Plot

After all of the visualization steps above were done, we can save the plot into a png file. To apply an arrow “>” symbol, because ggplot haven’t yet provide feature to accomodate it, we can apply it using grob text. And here below the result.

# prepare file
png("bar.png", width = 7, height = 5.5, units = "in", res = 300)

# make file (RUN ALL)
plot_bar
grid.rect(x = 0.064, y = 0.98,
          hjust = 1.1, vjust = 0,
          width = 0.05,height = 0.01,
          gp = gpar(fill="#353535",lwd=0))
grid.text("Disney",
          x=0.038, y=0.82, vjust = 0, hjust=0,
          gp=gpar(col="#99404f", fontsize=14, fontfamily="Calibri", fontface="bold"))
grid.text(">",
          x=0.01, y=0.76, vjust = 0, hjust=0,
          gp=gpar(col="#99404f", fontsize=12, fontfamily="Calibri", fontface="bold"))
grid.text(">",
          x=0.01, y=0.7, vjust = 0, hjust=0,
          gp=gpar(col="#99404f", fontsize=12, fontfamily="Calibri", fontface="bold"))
grid.text(">",
          x=0.01, y=0.62, vjust = 0, hjust=0,
          gp=gpar(col="#99404f", fontsize=12, fontfamily="Calibri", fontface="bold"))
grid.text(">",
          x=0.01, y=0.47, vjust = 0, hjust=0,
          gp=gpar(col="#99404f", fontsize=12, fontfamily="Calibri", fontface="bold"))
grid.text(">",
          x=0.01, y=0.4, vjust = 0, hjust=0,
          gp=gpar(col="#99404f", fontsize=12, fontfamily="Calibri", fontface="bold"))
grid.text(">",
          x=0.01, y=0.25, vjust = 0, hjust=0,
          gp=gpar(col="#99404f", fontsize=12, fontfamily="Calibri", fontface="bold"))
grid.text(">",
          x=0.01, y=0.18, vjust = 0, hjust=0,
          gp=gpar(col="#99404f", fontsize=12, fontfamily="Calibri", fontface="bold"))

# finish
dev.off()

# Second Plot (Line Plot)

## Data Wrangling

Before making a visualization, let’s prepare and clean the data:

# read data
market_shares_disney <- read_csv("data_input/rplicate6/Disney Market Shares.csv")
head(market_shares_disney)
#> # A tibble: 6 x 8
#>    Year `Movies in Rele~ `Market Share` Gross `Tickets Sold` `Inflation-Adju~
#>   <dbl>            <dbl> <chr>          <chr>          <dbl> <chr>           
#> 1  1995               38 19.04%         $1,0~      232651499 $2,119,455,156  
#> 2  1996               37 20.76%         $1,1~      270981385 $2,468,640,417  
#> 3  1997               33 13.93%         $885~      193004183 $1,758,268,107  
#> 4  1998               28 16.38%         $1,1~      236462602 $2,154,174,304  
#> 5  1999               30 16.95%         $1,2~      244888472 $2,230,933,980  
#> 6  2000               28 14.75%         $1,1~      206151531 $1,878,040,447  
#> # ... with 2 more variables: `Top-Grossing Movie` <chr>, `Gross that
#> #   Year` <chr>

We will drop some columns and filter some data:

data_market_shares <- market_shares_disney %>% 
  select(year = Year, 
         market_share = 'Market Share') %>% # rename into easier format for plotting
  filter(year !='2020')

head(data_market_shares, 3)
#> # A tibble: 3 x 2
#>    year market_share
#>   <dbl> <chr>       
#> 1  1995 19.04%      
#> 2  1996 20.76%      
#> 3  1997 13.93%

Now, we will replace special character “%” into a blank space and convert the Market Share data into its correct data type.

data_market_shares[2]<- lapply(data_market_shares[2], function(x) gsub('\\%','',x))
data_market_shares[2]<- lapply(data_market_shares[2], as.numeric)

head(data_market_shares, 3)
#> # A tibble: 3 x 2
#>    year market_share
#>   <dbl>        <dbl>
#> 1  1995         19.0
#> 2  1996         20.8
#> 3  1997         13.9

Create Visualization

# creating plot
plot_line <- ggplot(data = data_market_shares, 
                    aes(x = year, y = market_share, group = 1)) + # group 1 to make 1 line
  geom_line(color = "#99404f", size = 2.2) + labs(x = "", y = "")

plot_line

We need to declare the axis text and add some title and subtitle inside the plot, here is the code below:

plot_line <- plot_line +
  scale_y_continuous(limit = c(0,40),
                    expand = c(0,0))+
  scale_x_continuous(breaks = seq(1995,2019,by=1),
                     labels = c("1995", rep("",4), # rep to add repetitive blank space 4 times
                                "2000", rep("",4), "05", rep("",4), 
                                "10", rep("",4), "15", rep("",3), "19")) +
  labs(title = "United States and Canada, \nDisney films, box-office sales \n\n",
       subtitle = expression(paste("% of total \n\n"))) +
  coord_cartesian(clip = "off")

plot_line

Next, we need to customize y-axis text using grob text, because text needs to be put inside the plot:

label_0 <- grobTree(textGrob("0", x=0.99,y=0.03, hjust=0,
                             gp=gpar(col="black", fontsize=14)))
label_10 <- grobTree(textGrob("10", x=0.975,y=0.302, hjust=0,
                              gp=gpar(col="black", fontsize=14)))
label_20 <- grobTree(textGrob("20", x=0.975,y=0.535, hjust=0,
                              gp=gpar(col="black", fontsize=14)))
label_30 <- grobTree(textGrob("30", x=0.975,y=0.79, hjust=0,
                              gp=gpar(col="black", fontsize=14)))
label_40 <- grobTree(textGrob("40", x=0.975,y=1.049, hjust=0,
                              gp=gpar(col="black", fontsize=14)))

plot_line <- plot_line +
  annotation_custom(label_0) +
  annotation_custom(label_10) +
  annotation_custom(label_20) +
  annotation_custom(label_30) +
  annotation_custom(label_40)

plot_line

Then, we can add theme into the plot:

plot_line <- plot_line +
  
  theme(
    text = element_text(family = "Calibri"),
    axis.text = element_text(size = 14),
    axis.line.x = element_line(color = "black", size = 0.5),
    
    panel.background = element_blank(),
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank(),
    panel.grid.major.y = element_line(color = "#B2C2CA", size = 0.8),
    
    axis.ticks.y = element_blank(),
    axis.ticks.x = element_line(size = 0.75),
    axis.ticks.length.x = unit(5, "pt"),
    axis.text.y = element_blank(),
    
    plot.title = element_text(family = "Calibri", face = "bold", 
                              size = 18, hjust = 0, vjust = 1),
    plot.subtitle = element_text(family = "Calibri", size = 14,hjust = 0, vjust = 1)
    
  )

plot_line

Save Plot

After all of the steps were done, we can save the plot into png file:

# prepare file
png("line.png", width = 5.6, height = 5.5, units = "in", res =300)

# make file (RUN ALL)

plot_line
# adding additional box for final touch
grid.rect(x = 0.078, y = 0.99,vjust = 0.2,
          width = 0.05,height = 0.01,
          gp = gpar(fill="#353535",lwd=0))

# finish
dev.off()

Combine Plots

In this finishing step, we will combine two plot into one plot and add plot accessories like caption and header.

# prepare file
png("plot.png", width = 14, height = 8, units = "in", res = 300)

# make plot 
## read previously made png file
bar_plot <- rasterGrob(as.raster(readPNG("bar.png")),interpolate = FALSE)
line_plot <- rasterGrob(as.raster(readPNG("line.png")),interpolate = FALSE)
spacing <- rectGrob(gp = gpar(col = "white")) # prepare space

## arrange plots 
grid.arrange(bar_plot, spacing, line_plot, 
             ncol = 3, # arrange into column wise; 3 columns
             widths = c(0.52,0.025,0.4))

# add accessory rectangle/line
grid.rect(x = 1, y = 0.995,
          hjust = 1, vjust = 0.02,
          height = 0.01,
          gp = gpar(fill = "#E5001c", lwd=0))
grid.rect(x = 0.04, y=  0.98,
          hjust = 1, vjust = 0.01, height = 0.05,
          gp = gpar(fill= "#E5001c", lwd = 0))
# title 
grid.text("Marvellous",
          x = 0.005, y = 0.93, vjust = 0, hjust = 0,
          gp = gpar(col = "black", fontsize = 28, fontfamily = "Calibri",
                    fontface = "bold"))
# caption
grid.text("Source: The Numbers; Box Office Mojo",
          x = 0.01, y = 0.145, vjust = 0, hjust = 0,
          gp = gpar(col = "#5E5E5E", fontsize = 14, fontfamily = "Calibri"))
grid.text("The Economist",
          x = 0.01, y = 0.1, vjust = 0, hjust = 0,
          gp = gpar(col = "#5E5E5E", fontsize = 15,
                    fontfamily = "Calibri", fontface = "bold"))

# finish
dev.off()

Finally, here is the final replicated plot using ggplot2! Looks nice!

comments powered by Disqus