library(downscale)
#> From v5.0.0 objects of class 'raster' and 'sp' are being phased out.
#> They are still supported but we recommend using 'terra' or 'sf' inputs instead.
#> You can convert to 'terra' or 'sf' objects through rast() or st_as_sf()
In order to assess and manage the status of a species we need to know the abundance of individuals in the population(s) and their changes over time. For the vast majority of species this information is unobtainable, however one important proxy of true abundance and extinction risk is the area occupied by the species. For example, the area of occupancy (AOO) is a little-used measure of conservation status in the IUCN red list (IUCN 2014). Although easier to estimate than true abundance, the difficulty in estimating AOO lies in the extensive sampling required across the full range of the species at a grain size sufficiently fine to give meaningful estimates. For the majority of species this is still impractical or unfeasible at these grain sizes leaving a large number of unsampled cells and therefore false absences. However, as we estimate occupancy at increasing grain sizes we increase our confidence in our presence-absence predictions. Such coarse-grain atlas data, generally generated from opportunistic recording over extended periods of time, are much more widely available. However, at such coarse grain sizes we also lose resolution in our status estimates as occupancies at large grain sizes are less closely correlated with true abundance (Hartley and Kunin 2003).
A solution is to employ the occupancy-area relationship (OAR); the increase in the area occupied by a species as grain size increases (Kunin 1998). If the relationship can be described at these coarser grain sizes where confidence is high, then we can extrapolate to predict occupancy at the fine grain sizes more closely related to the true abundance and conservation status.
Many models have been proposed to describe this geometric relationship, and it appears that no one model consistently provides the best predictions (Azaele et al. 2012, Barwell et al. 2014, Groom et al. 2018, Marsh et al. 2021). This package provides functions for ten commonly applied model: Nachman, power law, logistic, poisson, negative binomial, generalised negative binomial, improved negative binomial, finite negative binomial, Thomas and Hui models. The Hui model (Hui et al. 2006, 2009) is unique in that it only requires data from a single spatial scale but does require the spatial relationships (i.e. cell coordinates) of all cells. The other nine models require the occupancies at multiple grain sizes. The package then attempts to optimise the model parameters to fit the occupancy data in log-log space. Once each model is parameterised the relationships are extrapolated to estimate occupancy at finer grain sizes.
As well as the fitting, prediction and plotting of the downscaling models, the package also contains functions for preparing coarse-scale data (“upgraining”; fig. 1). The tutorial vignette “Upgraining atlas data for downscaling” provides an overview of the different methods for achieving this using the upgrain.threshold
function.
vignette("Upgraining", package = "downscale")
For downscaling it is important to check the data for the scale of saturation and endemism. The scale of saturation is the grain size where all cells are occupied (fig. 2a) i.e. the proportion of occupancy = 1. The scale of endemism is the grain size where all presences occur in a single cell (2b). All data above these grain sizes should be discarded as they provide no further information for modelling the occupancy-area curve. The downscale functions will automatically set these occupancies to NA for modelling purposes, but this may lead to insufficient scales (less than three) remaining for modelling.
The general flow of the downscale
package is presented in fig. 3. The package is based around seven user functions: upgrain.threshold
, upgrain
, downscale
, predict
, hui.downscale
, plot
and ensemble.downscale
. Ten downscaling models are available (Nachman, power law, logistic, poisson, negative binomial, generalised negative binomial, improved negative binomial, finite negative binomial, Thomas and Hui models). Details of all models can be found in the help files, and in the supplementary information of Barwell et al. 2014.
The user may input four types of data:
sf
containing a data frame with presence-absence data (presence = 1; absence = 0) for each coordinate;To carry out downscaling with the Hui model (Hui et al. 2006, 2009) or upgraining of atlas data (and exploration of upgraining thresholds) then the input data must be of type 2 or 3. The table below shows the functions to use to achieve desired objectives with regards to input data. In all cases upgrain.threshold
can be used to explore thresholds for upgrain
.
Input data type | Objective | Function flow |
---|---|---|
Data frame of cell areas and occcupancies | Downscale (excluding Hui model) | \(\texttt{downscale} \Rightarrow\) \(\texttt{predict} \Rightarrow\) \(\texttt{plot}\) |
Data frame of cell coordinates and presence-absence data | Downscale (excluding Hui model) | \(\texttt{upgrain} \Rightarrow\) \(\texttt{downscale} \Rightarrow\) \(\texttt{predict} \Rightarrow\) \(\texttt{plot}\) |
Raster layer of presence-absence data | Downscale (excluding Hui model) | \(\texttt{upgrain} \Rightarrow\) \(\texttt{downscale} \Rightarrow\) \(\texttt{predict} \Rightarrow\) \(\texttt{plot}\) |
Data frame of cell coordinates and presence-absence data | Downscale (including Hui model) | (\(\texttt{upgrain} \Rightarrow\)) \(\texttt{hui.downscale} \Rightarrow\) \(\texttt{plot}\) |
Raster layer of presence-absence data | Downscale (including Hui model) | (\(\texttt{upgrain} \Rightarrow\)) \(\texttt{hui.downscale} \Rightarrow\) \(\texttt{plot}\) |
Data frame of cell areas and occcupancies | Ensemble modelling (excluding Hui model) | \(\texttt{ensemble.downscale}\) |
Data frame of cell coordinates and presence-absence data | Ensemble modelling (with or without Hui model) | \(\texttt{upgrain} \Rightarrow\) \(\texttt{ensemble.downscale}\) |
Raster layer of presence-absence data | Ensemble modelling (with or without Hui model) | \(\texttt{upgrain} \Rightarrow\) \(\texttt{ensemble.downscale}\) |
First, we must download the downscale package from CRAN if not already done so.
install.packages("downscale")
Then load in the library
library("downscale")
We will also make use of the terra
and sf
libraries throughout the tutorial for manipulating spatial data
library("sf")
#> Linking to GEOS 3.11.1, GDAL 3.4.3, PROJ 8.2.1; sf_use_s2() is TRUE
library("terra")
#> terra 1.7.29
We will start with the simplest example of using the downscaling package, where we already have occupancy data across a number of grain sizes. We first create some imaginary data; a data frame where the first column are the cell areas (grain size) and the proportion of occupancy as the second column:
<- data.frame(Cell.area = c(100, 400, 1600, 6400),
occupancy Occupancy = c(0.23, 0.56, 0.87, 1))
Now we use downscale to estimate the model parameters for the logistic model to the data. Note: for this type of data input we must also specify the extent (the total area over which occupancy has been measured) which is necessary for converting the modelled proportion of occupancies to area of occupancy (AOO) later on. In this imaginary data we will set extent to be 320000 km\(^2\):
## fit logistic model to observed data using downscale
<- downscale(occupancies = occupancy,
logisMod model = "Logis",
extent = 384000)
This creates an object of class 'downscale'
logisMod#> $model
#> [1] "Logis"
#>
#> $pars
#> C z
#> 0.002014927 1.083725419
#>
#> $observed
#> Cell.area Occupancy
#> 1 100 0.23
#> 2 400 0.56
#> 3 1600 0.87
#> 4 6400 1.00
#>
#> $extent
#> [1] 384000
#>
#> attr(,"class")
#> [1] "downscale"
The downscale function has estimated best-fit parameters of 0.00201 for C
and 1.0837 for z
for the logistic model. We then take these parameters from the 'downscale'
object to extrapolate the fitted logistic function to predict occupancies at finer grain sizes. We will first create a vector of grain sizes (cell area) to predict. If we include the original cell sizes used for modelling we can also observe the model fit.
## new grain sizes to predict
<- c(1, 2, 5, 25, 100, 400, 1600, 6400)
areasPred
## predict for the new grain sizes using the downscale object
<- predict(logisMod,
logisPred new.areas = areasPred,
plot = FALSE)
## this creates an object of class 'predict.downscale'
## occupancy is given as a proportion (Occupancy) and area of occupancy (AOO)
$predicted
logisPred#> Cell.area Occupancy AOO
#> 1 1 0.002010876 772.1762
#> 2 2 0.004252482 1632.9531
#> 3 5 0.011396542 4376.2720
#> 4 25 0.061873398 23759.3847
#> 5 100 0.228564936 87768.9353
#> 6 400 0.570999510 219263.8119
#> 7 1600 0.856717832 328979.6477
#> 8 6400 0.964106833 370217.0239
Now we can plot the predictions with log-log axes. Black points are the observed values, red points are the predicted values.
plot(logisPred)
For the majority of cases we will have atlas data at a single scale. For downscaling we will need to therefore upgrain the atlas data. Read in the atlas data for a hypothetical UK species provided in the package. In this case the format is a data frame of sample cell coordinates and presence-absence data but it could also be a raster layer:
## if it is not already loaded, load in the package
library(downscale)
There is some example atlas data included with the package which we can load up for this example. When loading our own data then the data frame must have the column names “lon”, “lat” and “presence” (0 = absence; 1 = presence).
<- system.file("extdata", "atlas_data.txt", package = "downscale")
dataFile <- read.table(dataFile, header = TRUE)
atlasData head(atlasData)
#> x y presence
#> 1 8170 10 0
#> 2 8130 20 0
#> 3 8140 20 0
#> 4 8160 20 0
#> 5 8170 20 0
#> 6 8140 30 0
The first step is to upgrain the atlas data to calculate occupancy at larger grain sizes than the atlas data – this provides the occupancy data points to fit the different downscaling models to. If we simply increase the cell sizes of the atlas data then the extent also increases. As downscaling models requires the proportion of occupancy at each grain, if the proportions are calculated from a different total area in each case we may run in to problems. Therefore it is important that we fix the extent of all grain sizes to the extent of the largest grain size, but this means compromising between assigning unsampled cells as absences or excluding sampled cells (Groom et. al. 2018, Marsh et. al. 2018). We therefore carry this out by applying a threshold to the proportion of a coarse-scale cell that has been sampled at the fine-scale. For example, if a 40km width cell was only sampled by a single 10km atlas absence cell within it, we may want to discard it as a) it is likely that at least one of the unsampled cells may actually be a presence (coarse-scale false absence), and b) all the unsampled atlas cells would be assigned as absences even though they have not been sampled (atlas-scale false absence). However, discarding the cell will also lead to some loss of information.
The choice of threshold can have important consequences on the model predictions and so it is highly recommended to read the vignette “Upgraining atlas data for downscaling” and the helpfile for the upgrain.threshold function (?upgrain.threshold
) for more detail on creating your multi-scale standardised atlas data:
vignette("Upgraining", package = "downscale")
The upgrain.threshold
function allows the user to explore these trade-offs through several plots, and provides four recommendations for possible threshold selections: including all the sampled cells ("All_Sampled"
); including only cells at the largest grain size that were completely sampled at the atlas scale ("Sampled_Only"
); a species-specific threshold that retains all species occurrences ("All_Occurrences"
); and an atlas-specific threshold that maintains the same extent as the original atlas data ("Gain_Equals_Loss"
).
## explore thresholds using upgrain.threshold
<- upgrain.threshold(atlas.data = atlasData,
thresh cell.width = 10,
scales = 3)
This gives two sets of plots. First is a set of four plots that explore the trade-offs between discarding sampled cells and making assumptions about unsampled cells, which are automatically assigned as absences.
The second set of plots (hit return
or click on the plot window to view the second window) are the standardised atlas data generated after applying the four different threshold criteria ("All_Sampled"
, "All_Occurrences"
, "Gain_Equals_Loss"
and "Sampled_Only"
).
We can see the threshold values for the four threshold criteria
$Thresholds
thresh#> All_Sampled All_Occurrences Gain_Equals_Loss Sampled_Only
#> 1 0 0.04 0.51 1
Once the user has decided on a threshold value (any value between 0 and 1 can be selected) or one of the threshold criteria, the upgrain
function will prepare the atlas data for downscaling. For now we’ll use one of the pre-defined options "All_Occurrences"
which ensures that all occurrence records are retained. This creates an object of class 'upgrain'
which can then then be used directly as an input for downscale
, along with plots of the original and standardised atlas data at each scale. We will save the 'upgrain'
object here for subsequent analyses in the next section.
## upgrain data (using All Occurrences threshold)
<- upgrain(atlas.data = atlasData,
occupancy cell.width = 10,
scales = 3,
method = "All_Occurrences",
plot = TRUE)
As we can pass our 'upgrain'
object directly in to the downscale
function we no longer require to specify the extent. Let’s try the improved negative binomial model (INB) first:
## Improved Negative Binomial model
<- downscale(occupancies = occupancy,
(inb model = "INB"))
#> $model
#> [1] "INB"
#>
#> $pars
#> C gamma b
#> 596.698339 7.023199 1.769302
#>
#> $observed
#> Cell.area Occupancy
#> 1 100 0.2856360
#> 2 400 0.4122807
#> 3 1600 0.5219298
#> 4 6400 0.7017544
#>
#> $extent
#> [1] 364800
#>
#> attr(,"class")
#> [1] "downscale"
The downscaling functions use an optimisation procedure to fit the models to the upgrained occupancy data. Suitable starting values for model parameters are automatically inputted, however if the models aren’t converging then it is possible to specify user-specific parameters. The table below shows the default starting parameters implemented.
Model | Full name | Parameter 1 | Parameter 2 | Parameter 3 |
---|---|---|---|---|
Nachman |
Nachman | "C" = 0.01 |
"z" = 0.01 |
|
PL |
Power law | "C" = 0.01 |
"z" = 0.01 |
|
Logis |
Logistic | "C" = 0.01 |
"z" = 0.01 |
|
Poisson |
Poisson | "lambda" = 1e-8 |
||
NB |
Negative binomial | "C" = 0.01 |
"k" = 0.01 |
|
GNB |
Generalised negative binomial | "C" = 0.00001 |
"z" = 1 |
"k" = 0.01 |
INB |
Improved negative binomial | "C" = 1 |
"r" = 0.01 |
"b" = 0.1 |
FNB |
Finite negative binomial | "W" = 10 |
"k" = 10 |
|
Thomas |
Thomas | "rho" = 1e-8 |
"mu" = 10 |
"sigma" = 1 |
If using your own parameters, they must be in the form of a list with the same parameter names (take particular note of capitals) as the original starting parameters:
### Manually specifying the starting parameters
<- list("C" = 0.1, "gamma" = 0.00001, "b" = 0.1)
paramsNew <- downscale(occupancies = occupancy,
inbNew model = "INB",
starting_params = paramsNew)
We can visually compare the two to see which has a better fit by extrapolating the modelled curves to finer grain sizes using predict
as before (plotting can be called directly from predict
or through plot
). The first plot is the prediction using the original parameters and the second plot using the new parameters (a much worse fit in this case):
## plot the predictions of two FNB models using predict.downscale
<- predict(inb,
inbPred new.areas = c(1, 2, 5, 25, 100, 400, 1600, 6400),
plot = TRUE)
<- predict(inbNew,
inbPredNew new.areas = c(1, 2, 5, 25, 100, 400, 1600, 6400),
plot = TRUE)
The Thomas model (Azaele et al. 2012) involves an integration process that can be time-consuming to run. For this reason the user may alter the tolerance during integration – the finer the tolerance the more accurate the prediction but the longer the computation time. It can therefore be a good idea to initially try a larger tolerance value than the default (\(1e^{-6}\)) in order to ascertain if the starting parameters are likely to be correct. You can then always use the parameter estimates as the starting parameters when using a smaller tolerance value.
## Thomas model
<- downscale(occupancies = occupancy,
thomas model = "Thomas",
tolerance = 1e-3)
## the tolerance can also be set for the predict function
<- predict(thomas,
thomas.pred new.areas = c(1, 2, 5, 25, 100, 400, 1600, 6400),
tolerance = 1e-6)
When plotting the results we can also change the look of the graphics. For example:
plot(thomas.pred,
col.pred = "green", # change the colour of the prediction
pch = 16, # change point character
lwd.obs = 3) # change line width of the observed data
The Hui model is slightly different from the other downscaling models in that it does not need occupancy from multiple scales (Hui et al. 2006, 2009). Instead, it only takes the coordinates of presence-absence data at the atlas scale and uses this to calculate occupancy at finer grain sizes. For this reason it is implemented using a separate function, hui.downscale
, which in effect runs downscale
and predict.downscale
in a single step.
The input data must either be a presence-absence raster layer of the atlas data, or a data frame of cell coordinates and presence-absence data. Additionally the function requires the cell widths of the input data, and if using a data frame as the input data, the total extent, and the grain sizes (cell area) for which we wish to predict occupancy. These must be smaller than the cell area of the input data. Like the Thomas model, the tolerance can be specified if the results appear inaccurate (set tolerance to a smaller number) or takes extensive programming time (set tolerance to a larger number).
## Hui model using a data frame as input
<- hui.downscale(atlas.data = atlasData,
hui cell.width = 10,
extent = 228900,
new.areas = c(1, 2, 5, 15, 50))
## the output is a normal 'predict.downscale' object
plot(hui)
Or we can use the ‘upgrain’ object as input which will use the extent-standardised data.
<- hui.downscale(atlas.data = occupancy,
huiStand cell.width = 10,
new.areas = c(1, 2, 5, 15, 50),
plot = TRUE)
It is critical to note here that the proportion of occupancies are very different between the two plots (note the differences in y-axis values). This is because the extents are different between the original atlas data (228,900 km\(^2\)) and the standardised atlas data (364800 km\(^2\)).
For example, here are the predicted proportion of occupancies (column ‘Occupancy’) and area of occupancies (column ‘AOO’ in km\(^2\)) for the original data:
$predicted
hui#> Cell.area Occupancy AOO
#> 1 1 0.2787162 63798.13
#> 2 2 0.2866964 65624.80
#> 3 5 0.3026427 69274.90
#> 4 15 0.3347401 76622.02
#> 5 50 0.3979450 91089.62
And the extent-standardised data:
$predicted
huiStand#> Cell.area Occupancy AOO
#> 1 1 0.1717034 62637.42
#> 2 2 0.1765820 64417.12
#> 3 5 0.1863919 67995.77
#> 4 15 0.2064095 75298.18
#> 5 50 0.2470669 90130.01
If comparing predictions using multiple models it is therefore crucial to use the same standardised data in all cases, or else only compare the converted area of occupancies (AOO) and not the proportion of occupancies, although even the AOO values will differ slightly depending on the upgraining method.
No single model appears to provide the most accurate fine-scale occupancy predictions in all cases (Azaele et al. 2012; Barwell et al. 2014; Groom et al. 2018), and it is difficult to predict which model will in a given situation. The ensemble function will model and predict occupancy for multiple models simultaneously, and also applies a simple model averaged prediction (the means of the log occupancies). Some or all of the models can be selected.
Again, lets start where our data is a data frame of occupancies at each grain size:
## hypothetical occupancy data
<- data.frame(Cell.area = c(100, 400, 1600, 6400),
occupancy Occupancy = c(0.23, 0.56, 0.87, 1))
## grain sizes (cell areas) to predict
<- c(1, 2, 5, 25, 100, 400, 1600, 6400) areasPred
The ensemble.downscale
function does the modelling and predicting in a single step so we need a few more arguments than when just using downscale
. We also need the cell areas of the fine grain sizes we wish to predict, the total extent and the models we wish to apply. Also note, with this type of input data we can not apply the Hui model.
When plotting the results The model averaged predictions are in grey and the model predictions in red.
<- ensemble.downscale(occupancies = occupancy,
ensemble new.areas = areasPred,
extent = 320000,
models = c("PL",
"Logis",
"NB",
"GNB",
"INB"),
plot = TRUE)
#> PL model is running... complete
#> Logis model is running... complete
#> NB model is running... complete
#> GNB model is running... complete
#> INB model is running... complete
To print the predicted proportion of occupancies for each model:
$Occupancy
ensemble#> Cell.area PL Logis NB GNB INB
#> 1 1 0.05581996 0.002010876 0.002839782 0.004059485 0.004848122
#> 2 2 0.07113722 0.004252482 0.005665857 0.007645696 0.009010969
#> 3 5 0.09801796 0.011396542 0.014062767 0.017587266 0.020215308
#> 4 25 0.17211624 0.061873398 0.067087022 0.073735529 0.079420529
#> 5 100 0.27953527 0.228564936 0.228552488 0.230188966 0.232513816
#> 6 400 0.45399534 0.570999510 0.566755879 0.558122076 0.549476086
#> 7 1600 0.73733723 0.856717832 0.870210393 0.877387330 0.888668269
#> 8 6400 1.19751493 0.964106833 0.976099466 0.984486989 0.994970905
#> Means
#> 1 0.005747789
#> 2 0.010338052
#> 3 0.022356594
#> 4 0.084006865
#> 5 0.239107440
#> 6 0.537962783
#> 7 0.844134896
#> 8 1.019962271
And to print the predicted area of occupancies (AOO) for each model
$AOO
ensemble#> Cell.area PL Logis NB GNB INB Means
#> 1 1 17862.39 643.4802 908.7303 1299.035 1551.399 1839.293
#> 2 2 22763.91 1360.7942 1813.0743 2446.623 2883.510 3308.177
#> 3 5 31365.75 3646.8933 4500.0855 5627.925 6468.898 7154.110
#> 4 25 55077.20 19799.4873 21467.8469 23595.369 25414.569 26882.197
#> 5 100 89451.29 73140.7794 73136.7961 73660.469 74404.421 76514.381
#> 6 400 145278.51 182719.8432 181361.8812 178599.064 175832.348 172148.091
#> 7 1600 235947.91 274149.7064 278467.3259 280763.946 284373.846 270123.167
#> 8 6400 383204.78 308514.1865 312351.8290 315035.836 318390.690 326387.927
Alternatively, the input data may be an object of class 'upgrain'
, which also allows us to run the Hui model as long as we specify the cell width:
<- system.file("extdata", "atlas_data.txt", package = "downscale")
dataFile <- read.table(dataFile, header = TRUE)
atlasData
## upgrain data (using "All Occurrences" threshold)
<- upgrain(atlas.data = atlasData,
occupancy cell.width = 10,
scales = 3,
method = "All_Occurrences",
plot = FALSE)
## ensemble modelling
<- ensemble.downscale(occupancies = occupancy,
ensemble new.areas = areasPred,
cell.width = 10,
models = c("Nachman",
"PL",
"Logis",
"GNB",
"FNB",
"Hui"),
plot = TRUE)
#> Nachman model is running... complete
#> PL model is running... complete
#> Logis model is running... complete
#> GNB model is running... complete
#> FNB model is running... complete
#> Hui model is running... complete
If we want to run all ten models we can specify models = "all"
. Once again, we can set the tolerance values for the modelling (tolerance_mod
) and prediction (tolerance_pred
) of the Thomas model and the Hui model (tolerance_hui
) to improve processing times or accuracy.
<- ensemble.downscale(occupancies = occupancy,
ensemble new.areas = areasPred,
cell.width = 10,
models = "all",
tolerance_mod = 1e-3,
plot = TRUE)
#> Nachman model is running... complete
#> PL model is running... complete
#> Logis model is running... complete
#> Poisson model is running... complete
#> NB model is running... complete
#> GNB model is running... complete
#> INB model is running... complete
#> FNB model is running... complete
#> Thomas model is running... complete
#> Hui model is running... complete
We can also specify the starting parameters for specific models. For each model the starting parameters should be in the form of a list as before, and each model list is an item in a combined list:
## Specifying starting parameters for Nachman and GNB models
<- list(Nachman = list("C" = 0.1, "z" = 0.01),
newParams GNB = list("C" = 0.1, "z" = 1, "k" = 0.01))
newParams#> $Nachman
#> $Nachman$C
#> [1] 0.1
#>
#> $Nachman$z
#> [1] 0.01
#>
#>
#> $GNB
#> $GNB$C
#> [1] 0.1
#>
#> $GNB$z
#> [1] 1
#>
#> $GNB$k
#> [1] 0.01
<- ensemble.downscale(occupancies = occupancy,
ensemble new.areas = areasPred,
cell.width = 10,
models = "all",
tolerance_mod = 1e-3,
starting_params = newParams,
plot = TRUE)
#> Nachman model is running... complete
#> PL model is running... complete
#> Logis model is running... complete
#> Poisson model is running... complete
#> NB model is running... complete
#> GNB model is running... complete
#> INB model is running... complete
#> FNB model is running... complete
#> Thomas model is running... complete
#> Hui model is running... complete