Making gifs in R with `gganimate`

Introduction

Since I’m writing R code to make certain figures for this website, I thought I could go ahead and annotate some of it in R markdown to serve as blog posts.

Package Setup

library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0.9000     ✔ purrr   0.2.5     
## ✔ tibble  1.4.2          ✔ dplyr   0.7.6     
## ✔ tidyr   0.8.1          ✔ stringr 1.3.1     
## ✔ readr   1.1.1          ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(rEDM)
library(png)
library(gganimate) # devtools::install_github("thomasp85/gganimate")

# some packages that are installed, but which we'll reference directly
# library(portalr) # devtools::install_github("weecology/portalr")
# library(forecast)
# library(here)

Data

First, we want to load up the data. Luckily the portalr package contains all we need to download and load up the rodent abundance time series we want.

raw_dat <- portalr::abundance(shape = "flat", # return data in long-form
                          time = "all",   # return time in all formats
                          clean = FALSE)  # include data that hasn't been QC'd
## Loading in data version 1.55.0

str(raw_dat)
## 'data.frame':    8799 obs. of  5 variables:
##  $ newmoonnumber: int  28 28 28 28 28 28 28 28 28 28 ...
##  $ period       : num  27 27 27 27 27 27 27 27 27 27 ...
##  $ censusdate   : Date, format: "1979-09-22" "1979-09-22" ...
##  $ species      : Factor w/ 21 levels "BA","DM","DO",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ abundance    : int  0 16 0 11 1 6 12 0 2 2 ...
##  - attr(*, "na.action")= 'omit' Named int  1 2 3 4 5 6 7 8 9 10 ...
##   ..- attr(*, "names")= chr  "1" "2" "3" "4" ...

Next, we want to do some initial processing of the data. Here are the steps:

  1. filter for just the Dipodomys merriami (species code = "DM")
  2. linearly interpolate the dates and abundances for missing censuses (which are nearly-monthly)
  3. re-format the columns correctly
dat <- raw_dat %>%
    filter(species == "DM") %>%
    select(-species, -period) %>%
    complete(newmoonnumber = full_seq(newmoonnumber, 1), fill = list(NA)) %>%
    mutate_at(vars(-newmoonnumber), forecast::na.interp) %>%
    mutate(censusdate = as.Date(as.numeric(censusdate), origin = "1970-01-01"), 
           abundance = as.numeric(abundance))

str(dat)
## Classes 'tbl_df', 'tbl' and 'data.frame':    483 obs. of  3 variables:
##  $ newmoonnumber: num  28 29 30 31 32 33 34 35 36 37 ...
##  $ censusdate   : Date, format: "1979-09-22" "1979-10-24" ...
##  $ abundance    : num  16 15 25 30 35 19 23 22 23 21 ...

Here’s an initial plot of the abundance:

ggplot(dat, 
       aes(x = censusdate, y = abundance)) + 
    geom_line() + 
    theme_bw(base_size = 20, base_line_size = 1)
DM abundance through time

Figure 1: DM abundance through time

Forecasting

For our next step, we’re going to use the functions in rEDM to fit some simple time series models to the abundance of DM through time using simple lags. We’ll go through some steps to determine the values for the hyper-parameters, E, and theta.

First, using simplex() and picking the value of E based on the best fit.

dm_count <- dat$abundance # pull out just the numeric vector
simplex_out <- simplex(dm_count, E = 1:24, silent = TRUE)
best_E <- simplex_out$E[which.min(simplex_out$rmse)]

Next, using s_map() with E = 10 and picking the value of theta based on the best fit.

smap_out <- s_map(dm_count, E = best_E, silent = TRUE)
best_theta <- smap_out$theta[which.min(smap_out$rmse)]

Finally, re-running the S-map model again with E = 10 and theta = 0.75 to get a one-step ahead forecast for the future.

out <- s_map(c(dm_count, NA), E = best_E, theta = best_theta, stats_only = FALSE)
## Warning in model$run(): Found overlap between lib and pred. Enabling cross-
## validation with exclusion radius = 0.
forecast <- tail(out$model_output[[1]], 2)[1,] # get second-to-last row of model_output

Generate data to plot

There are a few things we want to eventually plot, so we’ll need to construct the data appropriately.

First, get the last values from the original data, and append the forecast. Here, we want to capture the uncertainty, so compute the standard-deviation as the square-root of the estimated variance.

to_plot <- tail(dat)
to_plot$sd <- 0

to_plot <- rbind(to_plot, 
                 data.frame(newmoonnumber = max(to_plot$newmoonnumber) + 1, 
                            censusdate = max(to_plot$censusdate) + 29, 
                            abundance = forecast$pred, 
                            sd = sqrt(forecast$pred_var)))

Next, generate a sample of forecasts from the distribution, and create a data.frame that includes each forecast as a line from the last observation to the forecast.

set.seed(123)
num_forecasts <- 20
forecast_dist <- rnorm(num_forecasts, 
                       mean = forecast$pred, sd = sqrt(forecast$pred_var))

forecast_df <- map_dfr(seq(num_forecasts), function(idx) {
    temp <- to_plot # copy to_plot
    temp$abundance[NROW(temp)] <- forecast_dist[idx]
    temp$idx <- idx
    return(tail(temp, 2))
})

Finally, read in a silhouette picture of the rodent in question. We downloaded the appropriate image from the PortalPredictions repo, and added transparency and resized accordingly.

dm_image <- readPNG(here::here("static/img/dipodomys_merriami.png"))

Figure

Ok, so we can put this all together now into a single figure.

Because of the background image, we’ll want to do pre-compute the plot limits and use those to position the image to ensure that it’s sized correctly.

xlims <- range(to_plot$censusdate)
ylims <- c(0, 60)

We’ll use ggplot to assemble the figure, with a line for each forecast, adding a geom_ribbon for the +/- 2 SD for forecasts, a plain geom_line for the observations, and the rodent image as a background.

p <- ggplot(forecast_df, 
       aes(x = censusdate, y = abundance)) + 
    
    # plot boundaries
    coord_cartesian(xlim = xlims, ylim = ylims) + 
    
    # background image
    annotation_raster(dm_image * 0.3,
                      xmin = min(xlims),
                      xmax = max(xlims),
                      ymin = 9.5,
                      ymax = 47, interpolate = TRUE) +
    
    # forecasts
    geom_line(size = 2, color = "blue") + 
    
    # observations (exclude last point, which is the forecast)
    geom_line(size = 2, data = to_plot[-NROW(to_plot), ]) + 
    
    # uncertainty area
    geom_ribbon(aes(ymin = abundance - 2 * sd, 
                    ymax = abundance + 2 * sd), 
                data = to_plot, fill = "#BBBBFF", alpha = 0.5) + 
    
    # theming
    theme_bw(base_size = 20, base_line_size = 1) + 
    labs(x = "", y = "DM Abundance")

To animate, we’ll use animate(), with some arguments for how to change each frame and saving the output as a gif.

animate(p + transition_time(idx), width = 400, height = 400)
anim_save(here::here("static/img/forecasting-preview.gif"))

Related