What is the shortest path from Mombasa, Kenya to Marseille, France by
boat? It is not the straight line between those locations, as it cuts
through land. This package applies graph theory to gridded spatial data,
e.g. deriving shortest paths between places taking barriers or cost
surfaces into account. In the mentioned example, this would be
delineating a path along the East African coast, through the Red Sea,
the Suez Canal, and the Mediterranean. Apart from relating locations on
Earth, `spaths`

can also compute shortest paths more
generally on spheres and planes.

`spaths`

originates from a research project with larger data
and more extensive computations than earlier R packages in this field
could handle. Like other packages, previous versions of
`spaths`

used the `igraph`

package, a wrapper for
a C library, for graph theoretical algorithms. To further optimize
computational performance and tailor the code to the gridded, spatial
use case, `spaths`

dropped the `igraph`

dependency
and now comes with its own C++ graph theory implementation.

Apart from computational performance, `spaths`

is geared
towards user friendliness. Users do not have to understand how
graph-theoretical algorithms, transition functions, or spatial distance
functions work. And via a set of parameters, they can choose an
implementation that fits their machine’s capabilities and the
application’s prerequisites. Another aspect of user friendliness is the
ease of installation. Linux users without administrator rights often
struggle to install geospatial or other packages that build on external
system libraries. To prevent such issues, `spaths`

comes with
few dependencies. Apart from core packages by default installed with R,
users only have to install `Rcpp`

and
`data.table`

. Installing `terra`

is optional and
only required, when using `terra`

objects as inputs.

The package is written to be a foundation upon which other packages
build. As `igraph`

has become a dependency for
graph-theoretical software more broadly, `spaths`

aims to be
the dependency for applications connecting points in grids. The
extensions of `spaths`

can be in the geospatial domain,
computing e.g. sailing vessel routes taking wind patterns and ocean
currents into account, animal migration influenced by terrain topology
and vegetation, or optimal paths for Mars explorations. However, they
can also target biomedical studies or any other question connecting
points in a grid. All you have to do for most extensions is to pass a
custom transition function to the `tr_fun`

parameter that
tells `spaths`

how to compute transitions costs between cells
from one or more grid layers.

Before `spaths`

applies graph-theoretical algorithms, like
shortest paths identifications, it converts the spatial data into a
graph. A graph consists of vertices, also called nodes, that are
connected via edges. In `spaths`

, grid cell centroids are
vertices linked to neighboring cells’ centroids through edges. Which
neighbors are connected depends on the `contiguity`

argument.
Figure 1 illustrates the parameter’s two options: rook’s case contiguity
and queen’s case contiguity. In the former model, a cell is directly
connected to its four horizontally and vertically adjacent neighbors.
With the latter structure, a cell is also directly linked to the four
diagonally adjacent neighbors, implying a total of eight direct
neighbors.

`spaths`

uses queen’s case contiguity by default as it often
produces more appropriate shortest paths than rook’s case contiguity
does. The advantages of rook’s case contiguity are that the fewer edges
imply lower RAM requirements and shorter computational times of shortest
path algorithms than with queen’s case contiguity.

There are other types of contiguity that directly connect a cell to
second order neighbors. This can produce paths that look smoother, as it
permits the algorithm to traverse the grid at a larger set of angles.
Yet, `spaths`

does currently not implement such type of
contiguity because it would allow the algorithm to jump over barriers
that are one cell wide. In an application deriving ship routes, the
algorithm could jump over e.g. islands and peninsulas.

All of the input grid’s, i.e. `rst`

argument’s,
non-`NA`

pixels become part of the graph. In the example of
ship route estimations, you would set all land pixels to
`NA`

, restricting ships to only traverse water cells. The
number of of non-`NA`

cells and thereby size of the graph is
the most important determinant of RAM requirements and execution time.
Crop the grid to the relevant area or set irrelevant pixels to
`NA`

to boost efficiency.

Shortest paths algorithms, like the here used Dijkstra’s (1959) algorithm, minimize the sum of edge
weights along the paths between origins and destinations. By default,
the edge weight in `spaths`

is the geographic distance
between the centroids of the respective neighboring cells. Figure 2
illustrates edge weights as distance in kilometers between a cell at 1°N
1°E an its neighboring cells under queen’s case contiguity in an
unprojected, i.e. lonlat, grid of a one degree resolution. The use of
kilometers is for visualization purposes and deviates from the actual
implementation which uses meters in most cases, as the documentation
states.

In calculating distances between locations distributed around the globe,
it is usually best to use unprojected data. Projections,
i.e. transfering coordinates from a spheroid or ellipsoid onto a two
dimensional plane, necessarily distort the input. These distortions can
severely bias distances between cells. Projections are designed for
specific, often regional, applications. If you are not sure whether a
projection allows for correct distances in your case, use unprojected
data. This phenomenon is not specific to `spaths`

, but holds
for any GIS software. Another caveat of using projections is that
`spaths`

, like `terra`

, does not connect cells
across the projected grid’s edges, even if the data is global. Meaning
it does not connect the left most cells to the right most cells in the
grid. However, when the data is unprojected and global, it does connect
them. This applies to any `rst`

input class, including a
SpatRaster, a RasterLayer, and a matrix or a list of matrices with
`spherical = TRUE`

.

`spaths`

defaults to the fastest ways of deriving geographic
distances. It uses optimized Haversine distance computations for
unprojected data on Earth or other spherical objects and optimized
Euclidean distance computations for planar data. If you want account for
Earth’s ellipsoid shape rather than working with spherical distances and
computational efficiency is not a top priority, set
`dist_comp`

to `"terra"`

, which derives distances
via `terra::distance`

. `terra`

is a great and
performant package. What makes the `"terra"`

option slower
than `"spaths"`

in `dist_comp`

is that it triggers
the function to derive distances between all neighboring cells
separately and export them from R to C++ while the latter choice
computes them in a way tailored to this application in the graph
construction phase in C++. This implementation e.g. leverages the
occurrence of repeated distance values in gridded data.

Edge weights do not have to be geographic straight line distances
between cell centroids. With a custom transition function passed to
`tr_fun`

, you can freely define them in a different way. In
this function, you may use the parameters `d`

(distance
between the pixel centroids), `x1`

(x coordinate or longitude
of the first cell), `x2`

(x coordinate or longitude of the
second cell), `y1`

(y coordinate or latitude of the first
cell), `y2`

(y coordinate or latitude of the second cell),
`v1`

(`rst`

layers’ values from the first cell),
`v2`

(`rst`

layers’ values from the second cell),
and `nc`

(number of CPU cores set by `ncores`

).
The algorithm moves from the first to the neighboring second cell. If
the data is unprojected, i.e. lonlat, or if
`dist_comp = "terra"`

, `d`

is measured in meters.
Otherwise, it uses the units of the CRS. If `rst`

has one
layer, the values are passed to `v1`

and `v2`

as
vectors. Otherwise they are passed as a data table, if
`v_matrix`

is `FALSE`

(default), and as a matrix,
if `v_matrix`

is `TRUE`

. The function has to be
callable from R, but does not have to be written in R. It can e.g. be a
C++ function implemented via the `Rcpp`

package.

A common transition function is Tobler’s (1993) hiking function. It estimates the travel
time between locations conditional on terrain topography. To illustrate
this use case, imagine that the input grid is a digital elevation model.
It consists of one layer and the pixel values document the location’s
elevation in meters. This `tr_fun`

function then expresses
the transition cost in terms of the hours it takes to travel between
cell centroids according to Tobler:
`function(d, v1, v2) d / (6000 * exp(-3.5 * abs((v2 - v1) / d + 0.05)))`

.
It combines the straight line distance in meters `d`

with the
altitude difference, i.e. the value of the second cell `v2`

minus the value of the first cell `v1`

. Tobler’s (1993) hiking function produces asymmetric edge
weights. Walking uphill is not equally fast as walking downhill. This
results in a directed graph, meaning transition cost depends on the
direction in which the algorithm traverses between grid cells. So, the
`tr_directed`

argument has to be `TRUE`

.

With transition functions generating symmetric edge weights,
`tr_directed`

should always be set to `FALSE`

,
improving computational performance. Tobler’s (1993) function would e.g. produce symmetric
transition costs in the absence of the `+ 0.05`

. The
`tr_directed`

parameter only has an effect in the presence of
custom transition functions, i.e. if `tr_fun`

is not
`NULL`

. Otherwise, the graph is always undirected.

As already mentioned, `spaths`

is not restricted to
geographic applications. Here is an arbitrary transition function
combining various layers of `rst`

:
`function(v1, v2) v1[[1L]] * v2[[1L]] + v1[[2L]] * v2[[2L]]`

.
This multiplies the first cell’s value in the first layer with the
second cell’s value in the first layer and adds it to the cells’ second
layer product.

It is recommended to use vectorized R functions like the ones above. It
computes all edge weights without explicitly calling an iterative
structure like `for`

, `*apply`

, or
`foreach`

loops. R loops are slow. If your function requires
a loop, write it in C++. Here is an example with
`v_matrix = TRUE`

, the Armadillo C++ library, C++ 20, and
`...`

as a placeholder for the function body:

```
Rcpp::sourceCpp(code = '
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp20)]]
// [[Rcpp::export]]
Rcpp::NumericVector example_tr_fun(arma::mat &v1, arma::mat &v2, Rcpp::NumericVector &d) {
...
}
')
```

By default, `shortest_paths`

returns distances. These
distances are the sum of edge weights along the shortest paths between
points. Without a custom transition function, it is the geographic
distance between the centroids of the cells on a path between two
points. It is the length of the path in meters, unless you use a
projection with different units. With a custom transition function, the
distance or length of the path is expressed in whatever units the
transition function returns, such as hours of travel time with Tobler’s
(1993) hiking function.

```
set.seed(3L)
input_grid <- terra::rast(crs = "epsg:4326", resolution = 2, vals = sample(c(1L, NA_integer_), 16200L,
TRUE, c(0.8, 0.2)))
origin_pts <- rnd_locations(3L, output_type = "SpatVector")
destination_pts <- rnd_locations(3L, output_type = "SpatVector")
shortest_paths(input_grid, origin_pts)
origins destinations distances
<int> <int> <num>
1: 1 2 19627694
2: 1 3 7290325
3: 2 3 14467797
shortest_paths(input_grid, origin_pts, destination_pts)
origins destinations distances
<int> <int> <num>
1: 1 1 13313293
2: 1 2 6158046
3: 1 3 15837664
4: 2 1 9137621
5: 2 2 16130624
6: 2 3 4810903
7: 3 1 15919393
8: 3 2 10787554
9: 3 3 19275995
```

Instead of distances, `shortest_paths`

can output the path
lines or lines and distances jointly.

```
shortest_paths(input_grid, origin_pts, output = "lines")
class : SpatVector
geometry : lines
dimensions : 3, 3 (geometries, attributes)
extent : -179, 179, -63, -1 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
names : origins destinations connected
type : <int> <int> <logical>
values : 1 2 TRUE
1 3 TRUE
2 3 TRUE
shortest_paths(input_grid, origin_pts, output = "both")
class : SpatVector
geometry : lines
dimensions : 3, 3 (geometries, attributes)
extent : -179, 179, -63, -1 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
names : origins destinations distances
type : <int> <int> <num>
values : 1 2 1.963e+07
1 3 7.29e+06
2 3 1.447e+07
```

When no destinations are specified, `shortest_paths`

computes
the paths between all origin combinations. If transition costs are
symmetric, i.e. traveling from cell A to neighboring cell B is as
expensive as traveling from B to A, the function by default only returns
distances in one direction to boost computational efficiency and lower
the RAM requirements of the return object. If you would like the output
to report on both directions, set `bidirectional`

to
`TRUE`

.

```
shortest_paths(input_grid, origin_pts)
origins destinations distances
<int> <int> <num>
1: 1 2 19627694
2: 1 3 7290325
3: 2 3 14467797
shortest_paths(input_grid, origin_pts, bidirectional = TRUE)
origins destinations distances
<int> <int> <num>
1: 1 2 19627694
2: 1 3 7290325
3: 2 3 14467797
4: 2 1 19627694
5: 3 1 7290325
6: 3 2 14467797
```

The distance from a point to itself is zero. So, irrespective of what
arguments you pass to `shortest_paths`

, the function never
returns paths from points to themselves.

If you do not want to connect all origins to all destinations, specify
`pairwise = TRUE`

. This connects the first origin to the
first destination, the second origin to the second destination, etc. For
computational optimization, the function can change the order of the
results. In the example below, the first row contains the distance
between the second origin and the second destination. Always check the
output’s origins and destinations variables regarding which points an
estimate refers to.

```
shortest_paths(input_grid, origin_pts, destination_pts, pairwise = TRUE)
origins destinations distances
<int> <int> <num>
1: 2 2 16130624
2: 1 1 13313293
3: 3 3 19275995
```

The origins and destinations variables by default refer to the row
numbers in the `origins`

(and `destinations`

)
inputs. You can make `shortest_paths`

to utilize other names
by specifying a column in `origins`

(and
`destinations`

) containing point names. These names can be of
types `character`

, `integer`

, and
`numeric`

.

```
origin_pts$name <- letters[1:3]
shortest_paths(input_grid, origin_pts, origin_names = "name")
origins destinations distances
<char> <char> <num>
1: a b 19627694
2: a c 7290325
3: b c 14467797
```

Unconnected points are marked with `Inf`

in the
`distances`

variable, if `distance_type`

is
`"double"`

or `"float"`

, with `NA`

, if
`distance_type`

is `"int"`

or
`"unsigned short int"`

, and with a `connected`

variable, if distances are not returned. Integers use `NA`

rather than `Inf`

because `Inf`

is a numeric, not
an integer, value.

If `output = "distances"`

, the output is by default returned
as a data table. Data tables are data frames and you can use them in
methods expecting data frames. If you want the result to be a data frame
only, not a data table, set `output_class`

to
`"data.frame"`

.

```
shortest_paths(input_grid, origin_pts, output_class = "data.frame")
origins destinations distances
1 1 2 19627694
2 1 3 7290325
3 2 3 14467797
```

If `output`

is `"lines"`

or `"both"`

,
the the function returns a SpatVector, if `rst`

is a
SpatRaster or a RasterLayer, and a list, if `rst`

is a matrix
or a list of matrices. Explicitly setting `output_class`

to
`"list"`

returns a list in any case.
`output_class = "SpatVector"`

, however, returns a SpatVector
only if `rst`

is a SpatRaster or a RasterLayer.

`NA`

cells in `rst`

act as barriers. They mark the
cells which the algorithm must not travel through. What if these
barriers move? In the example of ships moving between ports, these
moving barriers could be Caribbean hurricanes. Ships do not go through
the storms, but around them.

Assume each hurricane is documented with a separate polygon. You could
mask the `rst`

grid with the different polygons, create one
grid per hurricane, and loop over these grids with
`shortest_paths`

. This would reestimate all shipping routes
in each call of `shortest_paths`

. Even lines not passing
through the Caribbean, e.g. routes between India and Australia, would be
recomputed. On top of that, each iteration would check the inputs and
convert them into the format used by the algorithm. It is an inefficient
strategy.

`shortest_paths`

comes with an efficient solution to the
moving barrier case. You pass the hurricane polygons to
`update_rst`

and the function computes the shortest paths in
a hurricane-free grid and the grids subject to hurricanes. It just
recomputes paths affected by a hurricane and is much more efficient than
looping over `shortest_paths`

is.

```
barrier <- terra::vect("POLYGON ((-179 -25, 100 -25, 100 -26, -179 -26, -179 -25))", crs = "epsg:4326")
shortest_paths(input_grid, origin_pts, update_rst = barrier)
origins destinations distances layer
<int> <int> <num> <int>
1: 1 2 19627694 0
2: 1 3 7290325 0
3: 2 3 14467797 0
4: 1 2 19627694 1
5: 1 3 13207350 1
6: 2 3 15465933 1
barriers <- list(barrier, terra::vect("POLYGON ((0 20, 1 20, 1 -20, 0 -20, 0 20))", crs = "epsg:4326"))
shortest_paths(input_grid, origin_pts, update_rst = barriers)
origins destinations distances layer
<int> <int> <num> <int>
1: 1 2 19627694 0
2: 1 3 7290325 0
3: 2 3 14467797 0
4: 1 2 19627694 1
5: 1 3 13207350 1
6: 2 3 15465933 1
7: 1 2 19813077 2
8: 1 3 7290325 2
9: 2 3 14467797 2
```

Layer 0 is the hurricane-free base grid, layer 1 is the hurricane-free
base grid updated with the first polygon, and layer 2 is the
hurricane-free base grid updated with the second polygon. Each layer
updates the unmodified `rst`

, not the grid already updated by
another polygon. `update_rst`

sets cells covered by the
respective polygon to `NA`

. It never sets cells to any other
value than `NA`

.

The actual implementation does not truly update `rst`

or the
graph, but marks the respective cells as blocked in a more efficient
way. The internal strategy obtains the same results, but is much faster
than physically updating the grid would be. So, you should treat
`update_rst`

as a method of setting any `rst`

cells that it intersects with to `NA`

, irrespective of the
optimized implementation.

`spaths`

is not limited to geographic locations on Earth. The
functions can be applied to other scenarios connecting points in grids.
These can be of a geographic nature, like other planets, or
non-geographic subjects.

As an example, we consider astronauts walking on Mars. Non-Earth
applications must provide `rst`

as a matrix or a list of
matrices. The SpatRaster and RasterLayer inputs are restricted to
evaluations on Earth. If `rst`

is a matrix or a list of
matrices, the parameters `spherical`

, `radius`

,
and `extent`

define what that matrix refers to. In our
Martian example, we define `rst`

to be a global, unprojected
grid of a two degree resolution.

Because the data is unprojected, meaning it is expressed in degrees on a
sphere, not points on a plane, we specify `spherical = TRUE`

.
Mars’ radius is `3389500`

meters and the grid’s global nature
implies an `extent`

of `c(-180, 180, -90, 90)`

. It
stretches from -180 to 180 in the x dimension and -90 to 90 in the y
dimension.

When `rst`

is a matrix or a list of matrices,
`origins`

(and `destinations`

) are supplied as a
matrix, data frame, or data table of coordinates, with columns named
`x`

and `y`

. The Martian example uses a data
table.

```
set.seed(3L)
origin_pts <- rnd_locations(3L)
shortest_paths(input_grid, origin_pts, spherical = TRUE, radius = 3389500, extent = c(-180, 180, -90, 90))
origins destinations distances
<int> <int> <num>
1: 1 2 8229741
2: 1 3 5372309
3: 2 3 9088755
```

`dist_comp = "terra"`

is not available in non-Earth-based
scenarios. Unless, you compute edge weights through a custom transition
function, they are derived through the `dist_comp = "spaths"`

methods. Custom transition functions work essentially like they do for
Earth. A difference is that grid layers are not passed as layers of a
single SpatRaster, but as matrices in a list. As in the SpatRaster case,
the layers can be accessed by index and name.

```
set.seed(3L)
input_grid <- list(even_terrain = input_grid, temperature = matrix(stats::rnorm(16200L, 20, 5), nrow = 90))
custom_tr <- function(v1, v2) v1[[1L]] * v2[[1L]] + v1[[2L]] * v2[[2L]]
custom_tr <- function(v1, v2) v1[["even_terrain"]] * v2[["even_terrain"]] + v1[["temperature"]] * v2[["temperature"]]
shortest_paths(input_grid, origin_pts, spherical = TRUE, radius = 3389500, extent = c(-180, 180, -90, 90), tr_fun = custom_tr)
origins destinations distances
<int> <int> <num>
1: 1 2 15973.37
2: 1 3 10254.19
3: 2 1 15973.37
4: 2 3 17432.90
5: 3 1 10254.19
6: 3 2 Inf
```

Grid updating is not done with a SpatVector, but a vector of cell
numbers, a matrix, or a list of either. The vector’s cell numbers mark
the cells to set to `NA`

. `spaths`

counts cells
like `terra`

does, starting with one in the top left, then
increasing from left to right and afterwards from top to bottom. This
differs from how R base matrices enumerate cells, which iterate first
from top to bottom and second from left to right. If
`update_rst`

is a matrix, it must be of the same dimensions
as `rst`

and marks cells to be updated using `NA`

values. Cells with non-`NA`

values in an
`update_rst`

matrix are not updated.

```
set.seed(3L)
input_grid <- matrix(sample(c(1L, NA_integer_), 16200L, TRUE, c(0.8, 0.2)), nrow = 90)
barrier_vector <- sample.int(16200L, 10L)
shortest_paths(input_grid, origin_pts, spherical = TRUE, radius = 3389500, extent = c(-180, 180, -90, 90), update_rst = barrier_vector)
origins destinations distances layer
<int> <int> <num> <int>
1: 1 2 8229741 0
2: 1 3 5372309 0
3: 2 3 9088755 0
4: 1 2 8229741 1
5: 1 3 5372309 1
6: 2 3 9088755 1
barrier_matrix <- matrix(rep.int(1L, 16200L), nrow = 90)
barrier_matrix[sample.int(16200L, 10L)] <- NA_integer_
shortest_paths(input_grid, origin_pts, spherical = TRUE, radius = 3389500, extent = c(-180, 180, -90, 90), update_rst = barrier_matrix)
origins destinations distances layer
<int> <int> <num> <int>
1: 1 2 8229741 0
2: 1 3 5372309 0
3: 2 3 9088755 0
4: 1 2 8229741 1
5: 1 3 5372309 1
6: 2 3 9088755 1
```

In each of the two cases, `update_rst`

marks ten cells in
`rst`

to be set to `NA`

, once using a vector and
once using a matrix. In either approach, the ten cells do not affect the
paths and accordingly the results are the same for both layers.

Performance optimization is a central design principle of
`spaths`

. `shortest_paths`

’ runs with defaults
that are optimal for the average use case. Nonetheless, there are some
performance considerations that should generally be accounted for and
various parameters that can help in tailoring the function execution to
the application.

The number of non-`NA`

pixels in `rst`

has the
largest influence on both computational time and RAM demand. Crop the
`rst`

to the relevant area and set any cells that the
shortest paths are do surely no pass through to `NA`

. The
quantity of non-`NA`

cells does not only determine the size
of the graph, but the size of multiple intermediate objects, the number
of edge weights to derive, and the number of places to consider in the
shortest path algorithm.

To be more precise, the primary contributor to graph size is the number
of edges: the quantity of links between cells. Of course, the number of
edges increases in the grid cell count. There is an option to influence
this link though. `shortest_paths`

builds on the above
described queen’s case contiguity with up to eight edges per vertex.
Changing `contiguity`

to `"root"`

cuts RAM
consumption and computational time. Yet, it may induce less desireable
paths than `contiguity = "queen"`

does.

You can track the function’s progress with
`show_progress = TRUE`

. This prints messages on stages of
`shortest_paths`

, including a progress bar, if the number of
paths or `update_rst`

elements is less than or equal to
`bar_limit`

. `bar_limit`

defaults to 150. In the
example below, the function prints three `=`

, one per path.
`show_progress = TRUE`

is meant for testing purposes.
Especially printing a progress bar from a parallelized function
execution can prolong runtimes because the program limits writing to
output to one thread at a time. If you choose to print a progress bar,
do not set `bar_limit`

too high, which can cause the output
buffer to overflow and crash the program.

```
shortest_paths(input_grid, origin_pts, show_progress = TRUE)
Checking arguments
Converting spatial inputs
Preparing algoritm inputs
Starting distances calculation
|---|
|===|
Generating output object
origins destinations distances
<int> <int> <num>
1: 1 2 19627694
2: 1 3 7290325
3: 2 3 14467797
```

Deriving lines is computationally more expensive and requires more RAM than deriving distances. Storing the coordinates of 10,000 lines comprised of 500,000 cells on average each, requires 74.5 GB RAM. Storing the distances associated with those 10,000 paths requires 78.1 KB RAM, or 0.0001 percent as much as storing the coordinates. This is, of course, not the only information that the function holds. There are intermediate objects etc. Yet, the computational requirements of assembling path lines should not be underestimated.

By default, `ncores`

is `NULL`

and
`shortest_paths`

parallelizes across all of the machine’s CPU
cores. It is implemented with OpenMP in C++. OpenMP is much more
efficient than R level parallelism and scales quasi linearly in the
number of cores. Each thread runs an iteration of the shortest paths
algorithm. How many iterations there are depends on the number of origin
and destination points. The software utilizes shared memory parallelism.
Hence, objects that are shared across executions of the
graph-theoretical algorithm, like the graph and `update_rst`

,
are not copied. Instead, all threads share the same representation. This
does not mean that the RAM use does not increase in the number of cores.
Each execution of the shortest path algorithm comes with its own
objects, such as the priority queue managing the order of the cells to
be visited, which need allocation. So, explicitly setting the number of
cores to a value below the default is a way of reducing RAM consumption.

OpenMP is commonly not available on MacOS by default. This is not
specific to `spaths`

, but affects many modern R packages that
are geared towards performance. It implies that the package may be much
faster on Windows and Linux than on MacOS.

`shortest_paths`

runs Dijsktra’s algorithm to identify
shortest paths between points. It runs the algorithm once per origin. If
you provide one origin and ten destinations, the function produces ten
paths and runs the algorithm once. If you provide two origins and five
destinations, the function also produces ten paths, but runs the
algorithm twice. The second example, therefore, takes longer than the
first one. If the generated graph is undirected, it does not matter, if
you pass two origins and five destinations or five origins and two
destinations. `shortest_paths`

automatically adjusts the
direction to minimize the frequency with which Dijkstra’s (1959) algorithm is called. In the example, it
would be called twice in either case.

The algorithm by default derives the shortest paths from an origin to
other cells of the same graph until all destination cells have been
visited. This technique allows the algorithm to potentially stop before
having visited each vertex. Yet, it comes at the cost of checking for
each visited cell whether it is in the set of destinations. Hence, the
default `early_stopping = TRUE`

is efficient when point pairs
are close to each other compared to the entirety of cells. If at least
one points pair in an execution of the shortest paths algorithm is far
from each other, the alternative `early_stopping = FALSE`

can
be faster. It derives the distance to all other cells and then picks the
destinations cells from the result, avoiding the check for destination
cells while iterating through the graph.
`early_stopping = TRUE`

and
`early_stopping = FALSE`

produce the same results, but differ
in computational performance.

In a grid, the vector of distances between neighboring cells is made up
of not that many unique and repeating values. That makes it efficient to
precompute edge weights for the entire graph, and is the behavior which
the default `pre = TRUE`

induces. Computing individual edge
weights while Dijkstra’s (1959) algorithm
runs, `pre = FALSE`

, requires less RAM, as the function only
stores one edge weight at a time, but is almost always much slower than
precomputing them all jointly. So, setting `pre`

to
`FALSE`

is one of the last options to consider, when the
machine has insufficient RAM.

When `update_rst`

is a list, there are two potential
dimensions for parallelism. In iterating over the updated grids, the
function could parallelize at the level of point connections, like in
the base grid, or it could parallelize across grids, i.e. list elements
of `update_rst`

. By default the function chooses the latter,
meaning `par_lvl`

is `"update_rst"`

. All
connections in the grid updated with the first element of
`update_rst`

are handled by one core, all connections in the
grid updated with the second element of `update_rst`

are
handled by another core, etc. If `par_lvl`

is
`"points"`

, the function instead calls the former option. It
computes all connections in the grid updated with the first element of
`update_rst`

in parallel, then computes all connections in
the grid updated with the second element of `update_rst`

in
parallel, etc. The `par_lvl`

argument only affects the grids
updated with `update_rst`

list elements. The unupdated base
grid always uses the `"points"`

strategy. Which
`par_lvl`

option is preferred depends on the number of
recomputed paths and the number of `update_rst`

list
elements. Assume you run `shortest_paths`

with 8 CPU cores,
pass `update_rst`

of two elements, and run the shortest paths
algorithm 16 times per updated grid. `par_lvl = "update_rst"`

would utilize two cores and leave six cores idle.
`par_lvl = "points"`

, in contrast, would use all 8 cores,
commonly running the algorithm twice per core. It does not matter how
many connections there are in total and how often Dijkstra’s (1959) algorithm is called in the base grid.
Only paths that are affected by the grid updating, i.e. where
`update_rst`

sets at least one cell on the path to
`NA`

, are recomputed.

Internally, `shortest_paths`

stores paths in the form of cell
numbers. By default, these are four byte signed integers, the same type
R uses for integers. Only if `output`

is `"lines"`

or `"both"`

, are these cell numbers converted to coordinates,
usually two 8 byte double precision floating point numbers per cell,
before the function returns. If `rst`

has less than 65,535
non-`NA`

cells, you can make use of 2 byte unsigned short
integers instead. The `path_type = "unsigned short int"`

option requires half as much RAM to store cell numbers as
`path_type = "int"`

demands, but is slower as it comes with
type conversions from 4 byte signed integers. The results are the same,
irrespective of which option you employ.

Another data type selection regards distances.
`distance_type`

defaults to the fastest and most precise
option: double precision floating point numbers. `"double"`

corresponds to the `numeric`

type in R. It is an 8 byte type.
Alternatively, you can choose 4 byte single precision floating point
numbers (`"float"`

), 4 byte signed integers
(`"int"`

), and 2 byte unsigned short integers
(`"unsigned short int"`

). Unlike `path_type`

,
`distance_type`

changes the results. `"float"`

stores distances between cells with lower precision than
`"double"`

does and `"int"`

and
`"unsigned short int"`

round distances to integers. These
deviations accumulate over a path and can induce marked differences in
the result. Always test the effect in your application before choosing
one of the less RAM demanding options. If `distance_type`

is
`"int"`

, the distance between any cells, i.e. the sum of edge
weights along any path, not just the returned ones, must not exceed
2,147,483,647. With `"unsigned short int"`

that limit is
65,535, making it not applicable in most scenarios.
`shortest_paths`

does not check, if you meet these numerical
constraints. The results are simply wrong, if you violate them.
`"float"`

, `"int"`

, and
`"unsigned short int"`

are slower than `"double"`

because they require type conversions.

Contributions to this package are highly welcome. You can submit them as
a pull request to the `ChristianDueben/spaths`

GitHub
repository or via email to the maintainer email mentioned in the
`DESCRIPTION`

file. You may also build a package on top of
`spaths`

, as `movecost`

and
`leastcostpath`

did with `gdistance`

.

This is because of the `contiguity`

options that
`spaths`

implements: queen’s and rook’s case contiguity. As
clarified above, these rules restrict the algorithm to traverse between
directly adjacent cells only. Rook’s case contiguity provides access to
four neighbors spaced at 90° angles. Queen’s case contiguity provides
access to eight neighbors spaced at 45° angles. Incorporating types of
contiguity that allow for traversing between second order neighbors
would introduce a larger set of angles for the algorithm to choose from.
However, this lets the function to skip over first degree neighbors. In
the ship route example above, the ship could skip over land cells, such
as islands or peninsulas. Hence, the lines are the shortest paths given
the angles the algorithm may travel at.

The recommendation is to use SpatRaster `rst`

, SpatVector
`origins`

(and SpatVector `destinations`

) objects
when handling locations on Earth. The other classes also work, but the
recommended classes tend to be the most appropriate in that they require
little conversion and return output in a convenient format. If the data
do not represent locations on Earth, you need to use a matrix or list of
matrices as `rst`

with a matrix or a data frame as
`origins`

(and `destinations`

). Unlike a matrix, a
data frame accepts `origin_names`

and
`destination_names`

columns of a different type than the
coordinates.

The next features include alternative shortest paths algorithms, beyond Dijkstra’s algorithm, and centrality measures. In the respective categories, the A* algorithm and closeness centrality will be first. How long it will take to publish these updates depends on the maintainer’s academic career prospects.

Dijkstra, E. W. 1959. “A note on two problems
in connexion with graphs.” *Numerische Mathematik*
1 (1): 269–71.

Tobler, Waldo. 1993. “Three Presentations on
Geographical Analysis and Modeling: Non- Isotropic Geographic Modeling,
Speculations on the Geometry of Geography, Global Spatial
Analysis.” Technical Report 93-1. National Center
for Geographic Information and Analysis.