Adding Visual Scaling Information to Trellis Plots with the addScales Package

Bert Gunter

2020-06-09

Why addScales?

In an article entitled “How Much Warmer Was Your City in 2019?”, the New York Times provided its online readers with an interactive graphical display that could show graphs of daily temperatures from 1981 to 2010 for more than 3500 cities. Here is the display for New York City.

Although this display presents historical temperature data, it does not actually allow historical trends in the data to be visually detected. The only temperature trends shown are seasonal variation: winter is colder than summer in New York!

This naturally leads one to consider how such historical temperature trends could be visualized. To explore this, the addScales package includes daily temperature data (highs and lows) for New York, San Francisco, and Chicago, downloaded from the National Climate Data Center. The New York City record goes back to 1870, and that is the data set discussed here (through the end of 2019). Naturally, given the growth of the city from its relatively modest 19th century size to a 21st century megalopolis – with all the associated asphalt, concrete, buildings, air conditioners, cars, and so forth – one would expect to see a gradual warming trend (wholly apart from any possible global warming effects). So the question is: how to show this?

Note: In what follows, code that replicates that used in the addScales Help page examples will (mostly) be omitted here.

Clearly, plotting averages of the daily highs and lows in sequence isn’t going to be useful. There are, apart from leap years, 365*150 = 54,750 days! A reasonable first attempt might be to simply summarize each year’s temperatures – by their median, say – and plot the 150 summaries versus time. Here’s the result.

library(addScales)
## Loading required package: lattice
data(NYCTemps) ## load the data sets
## Extract Month and Year from DATE COLUMNS
NYC <-within(NYCTemps, {
              Daily <- .5*(TMAX + TMIN)
              Month <- factor(months(as.Date(DATE)), levels = month.name)
              Year <- as.numeric(substring(DATE,1,4))
})
## Summarize by median yearly temperature
yearly <- aggregate(Daily ~ Year, data = NYC, FUN = median)
## Plot, overlaying a smooth 'loess' trend curve
trellis.par.set(plot.symbol = list(col = "darkblue"),
               plot.line = list(col = "darkblue"))
xyplot(Daily ~ Year, data = yearly, 
       type = c("l","g"),
       col = "darkblue",
       panel = function(x,y,...){
           panel.xyplot(x,y,...)
           panel.loess(x,y, col = "maroon", lwd = 1.5,
                       span = .6, deg = 2)
       },
       ylab = "Median Yearly Temp",
       main = "NYC Yearly Median Temperatures Over Time"
)



This plot clearly shows the expected temperature trend, an increase of about 6°F over 150 years. However, there’s a cost: a lot of the fine-grained information of the daily temperature record has been lost. For example, one might ask whether there is seasonal variation in this trend – do summers and winters show the same trend, for example? And is the day-to-day variation around this trend the same throughout the the year?

As mentioned previously, plotting daily results in one plot as residuals from the yearly trend, say, won’t work. There are simply too many points. An obvious alternative is to plot the historical record in a trellis display with different panels for different times of the year. For example, one might pick one day in the middle of every month and plot the multi-year record of each day’s data in a trellised arrangement of 12 plots.

A slight improvement on this idea is to plot monthly summaries – averages, for example – instead. The monthly averages will be less variable than individual days, but still granular enough to address questions like those above (because NYC temperatures don’t usually vary that much within a month compared to yearly variation). Here is such a plot using the code in the Help page for addScales.



This clearly doesn’t work either! The default y-axis scaling that was used put all the data on the same scale. In doing so, the large yearly NYC temperature changes ended up squeezing the monthly data into a small portion of their respective panels, so the data got lost in whitespace. In short, not very useful!

Of course, the lattice software provides options to deal with this, specifically by using the relation = "free" option of the scales argument to separately scale the y-scale for each panel to fit its data only. To avoid having to repeat the call with this change, it is convenient to just use lattice’s update method to directly modify the saved trellis plot object, nyctemps. Note that the prepanel.trim prepanel function has been used to set the panel y limits to hold only the middle 90% of each panel’s y data, trimming the most extreme 5% on either end from the display. See the Further Package Functionality section for why this can be useful. This will also slightly change the estimate of the historical trend from the prior yearly median plot of course.

nyctemps <- update(nyctemps,
                   prepanel = function(trim.x, trim.y,...)
                       prepanel.trim(trim.x = 0, trim.y = .05,...),
                   scales = list(axs = "i", alternating = 1, tck = c(1,0),
                                 y = list(relation = "free"))
)
nyctemps



This reveals the plot details that were previously lost. But now there is another problem. Because each panel in the display has been separately scaled to its own data, one has to closely inspect the scales to compare the behavior in different panels,.for example, to see whether the increase in December temperatures over 150 years is the same as that for July. This is not easy to do and of course gets even more difficult when there are more panels. In addition, all the separate axis scales in the display occupy a lot of the visual real estate. We need scaling information, but one would prefer that it occupy as little space as possible. After all, it’s the data behavior that’s of interest. This is the dilemma addScales tries to resolve.

To show how, we first remove the scales with another trellis update call and then call addScales – using its defaults – on that. Here’s what you get:

nyctemps <- update(nyctemps, scales = list(axs = "i",
                                           alternating = 1, tck = c(1,0),
                                 y = list(relation = "free", draw = FALSE))
)
addScales(nyctemps)