The animint2 Manual by Toby Dylan Hocking


Chapter 3, the showSelected keyword

This chapter explains showSelected, one of the two main keywords that animint introduces for interactive data visualization. After reading this chapter, you will be able to

  • Use the showSelected keyword in your plot sketches to specify geoms for which only a subset of data should be plotted at any time.
  • Use selection menus in animint to change the subset of plotted data.
  • Specify smooth transitions between data subsets using the duration option and key aesthetic.
  • Create animated data visualizations using the time option.

Sketching with showSelected

In this section, we will explain how the showSelected keyword can be used in plot sketches. The showSelected keyword specifies a variable to use for subsetting the data before plotting. Each geom in a data visualization has its own data set, and its own definition of showSelected variables. That means different geoms can specify different data sets and showSelected keywords to show different data subsets.

In fact, we have already used the showSelected keyword, which was automatically created by the interactive legends that we created in the previous two chapters. For example, consider the sketch below of the Keeling Curve data visualization from Chapter 1.

CO2 data viz

The sketch above includes showSelected=month for the geom_point, meaning that it should show the subset of data for the selected months. In contrast, since the geom_line does not include showSelected keywords, it always shows the entire data set (regardless of the selected months).

As another example, consider the sketch below of the first WorldBank data visualization from Chapter 2.

WorldBank data viz with showSelected

The sketch above specifies showSelected=region for the geom_point, meaning that it should show the subset of data for the selected regions.

Note that the code we used in chapter 2 did not explicitly specify showSelected=region. Instead, we specified aes(color=region), and animint automatically assigned a showSelected keyword. In general, animint will assign a showSelected keyword for each variable that is used in a categorical legend.

However, the showSelected keyword is not limited to use with categorical legends. You can use showSelected keywords for any data variables you like, by explicitly specifying the variable names in the showSelected argument of the geom.

Each variable that is used with showSelected is treated by animint as a selection variable. For example, the Keeling Curve data viz has one selection variable (month), and so does the WorldBank data viz (region). For each selection variable, animint keeps track of the currently selected values. When the selection changes, animint updates the subset of data that is shown.

Each of the data visualizations sketched above has only one selection variable. However, a data visualization can have any number of selection variables. In the next section, we will explore a visualization of the World Bank data that has selection variables for region and year.

Selecting data subsets using menus

Consider the following sketch which adds a showSelected variable, and changes the data set.

WorldBank data viz with showSelected

Note that there are two showSelected variables, region and year. Also note that the data is specified as all years (but only one will be shown at a time due to showSelected=year). Below, we translate this sketch into R code.

library(animint2)
data(WorldBank)
scatter <- ggplot()+
  geom_point(aes(
    x=life.expectancy, y=fertility.rate, color=region),
    showSelected="year",
    data=WorldBank)
scatter
## Warning: Removed 1490 rows containing missing values (geom_point).

Note that the ggplot above contains the showSelected argument, one of the two main features introduced in animint2. The showSelected keyword is ignored when rendering the plot using the usual R graphics devices, which produce a scatterplot with one point for every country and year. Note that since color=region was specified, animint also automatically uses region as a showSelected variable.

In constrast, rendering the same ggplot using animint yields the interactive data visualization below.

animint(scatter)

Note that the data viz above has two selection variables: region and year. Each variable has a menu at the bottom of the data viz that can be used to change the current selection. In this data viz, these selection menus are shown by default. They can be hidden by clicking the “Hide selection menus” button, and shown again by clicking the “Show selection menus” button.

Discrete legend variables such as region default to multiple selection, so several values are selected and shown at once. Try changing the selected region in the interactive legend and the selection menu. When you change the selection using either method, both the interactive legend and the selection menu should update to reflect the current selection.

We use the terms “direct manipulation” and “indirect manipulation” to describe these different ways of changing the selection. Direct manipulation typically involves clicking on the objects that you want to change, and is usually easier to understand. In contrast, indirect manipulation techniques such as menus are typically more complicated to understand. In the animint above, you can change the value of the region variable using either the legend or the menu. Using the legend is a more direct manipulation technique, since the legend is drawn closer to the plotted data points that will be updated.

Other selection variables such as year default to single selection, so only one value is selected and shown at any time. Try changing the selected value of the year variable using the selection menu. You should see the points in the scatterplot immediately update to show the fertility rate and life expectancy of all the countries in the year that you selected.

Multi-layer exercise: Add another geom to this interactive scatterplot. As in Chapter 2, you can use a geom_text to show the name of each country (easy), or a geom_text to show the selected year (medium), or a geom_path to show the previous 5 years of data (hard). Hint: make sure to specify showSelected=year for all geoms.

Multi-plot exercise: Add a time series plot to the data viz above. As in Chapter 2, you can use a geom_line to show the fertility rate for each country over all years. Add a geom_vline with showSelected=year to highlight the currently selected year.

Transitions: the duration option and key aesthetic

You may have noticed that there are buttons at the bottom of each data visualization created by animint. Try clicking the “Show animation controls” button above. This table contains a row for each selection variable. The text boxes show the number of milliseconds that are used for transition durations after updating each selection variable. The default transition duration for each selection variable is 0, meaning data will be immediately placed at their new positions after updating each variable.

To illustrate the significance of transition durations, try changing the transition duration of the year variable to 2000. Then, change the selected value of the year variable. You should see the data points move slowly to their new positions, over a duration of 2 seconds.

Some transitions result in points moving only a little bit, to nearby positions (e.g. 1979-1980). Other transitions result in points moving a lot more, to far away locations (e.g. 1980-1981). Why is that?

Smooth transitions only make sense for data points that exist both before and after changing the selection. In the R code below we compute a table of counts of data points that can be plotted in each of these three years.

three.years <- subset(WorldBank, 1979 <= year & year <= 1981)
can.plot <- with(three.years, {
  (!is.na(life.expectancy)) & (!is.na(fertility.rate))
})
table(three.years$year, can.plot)
##       can.plot
##        FALSE TRUE
##   1979    27  187
##   1980    27  187
##   1981    26  188

It is clear from the table above that there are 187 points that can be plotted in 1979 and 1980. However, in 1981 there is one more data point, corresponding to a country for which we did not have data in 1980. Below we show the data for that country, Kosovo.

subset(three.years, country=="Kosovo")
##      iso2c country year fertility.rate life.expectancy population
## 5850    KV  Kosovo 1979             NA              NA    1491000
## 5851    KV  Kosovo 1980             NA              NA    1521000
## 5852    KV  Kosovo 1981         4.5758        65.93268    1552000
##      GDP.per.capita.Current.USD 15.to.25.yr.female.literacy iso3c
## 5850                         NA                          NA   KSV
## 5851                         NA                          NA   KSV
## 5852                         NA                          NA   KSV
##                                         region  capital longitude latitude
## 5850 Europe & Central Asia (all income levels) Pristina    20.926   42.565
## 5851 Europe & Central Asia (all income levels) Pristina    20.926   42.565
## 5852 Europe & Central Asia (all income levels) Pristina    20.926   42.565
##                   income lending
## 5850 Lower middle income     IDA
## 5851 Lower middle income     IDA
## 5852 Lower middle income     IDA

Indeed, the table above shows that fertility rate and life expectancy are missing for Kosovo during 1979-1980. Thus it does not make sense to do a smooth transition for countries such as Kosovo which would not be plotted either before or after the transition. How to specify that in the data visualization? In the code below, we use aes(key=country) to specify that the country variable should be used to match data points before and after changing the selection.

scatter.key <- ggplot()+
  geom_point(aes(
    x=life.expectancy, y=fertility.rate, color=region,
    key=country),
    showSelected="year",
    data=WorldBank)

The key aesthetic in the ggplot above is only meaningful for interactive data visualization, so it ignored when rendering with the usual R graphics devices. However, if we render this ggplot using animint2, the country variable will be used to make sure transtion durations are meaningful. To specify a default transition duration for the year variable, we use the duration option in the data viz below.

(viz.duration <- animint(scatter.key, duration=list(year=2000)))

The duration option must be a named list. Each name should be a selection variable, and each value should specify the number of milliseconds to use for a transition duration when the selected value of that variable is changed.

If you click “Show animation controls” in the data viz above, you will see that the text box for the year variable is 2000, as specified in the R code. If you change the selection from 1980 to 1981, you should see a proper transition.

In general the key aesthetic should be specified for all geoms that use showSelected with a variable that appears in the duration option. In this example, we used the duration option to specify a smooth transition for the year variable. Since we use showSelected=year in the geom_point, we also specified the key aesthetic for this geom.

Animation: the time option

The time option is used to specify a variable to use for animation.

viz.duration.time <- viz.duration
viz.duration.time$time <- list(variable="year", ms=2000)
viz.duration.time

Exercise: make an animated data visualization that does NOT use smooth transitions. Hint: make a list of ggplots that has the time option but no duration option.

Chapter summary and exercises

This chapter explained the showSelected aesthetic, selection menus, transition durations, and animation.

Exercises:

  • Make an improved version of viz.aligned from the previous chapter. Instead of fixing the year at 1975, use showSelected=year so that the user can select a year. Add geoms that show the selected year: a geom_text on the scatterplot, and a geom_vline on the time series.
  • Translate one of the animation package examples to an animint. Hint: in the code for the animation package there is always a for loop over the time variable. Instead of calling a plotting function inside the for loop, use the list of data tables idiom to store the data that should be plotted. Then use those data along with showSelected to create ggplots, and render them using animint.

Next, Chapter 4 explains the clickSelects keyword, which indicates a geom that can be clicked to update a selection variable.