Thank you for your interest in the backbone package! This vignette
illustrates how to use this package to extract the *backbone* of
a network. A backbone is a sparse and unweighted subgraph of a network
that contains only the most â€˜importantâ€™ or â€˜significantâ€™ edges. A
backbone can be useful when the original network is too dense, when edge
weights are not needed, or when edge weights are difficult to
interpret.

The `backbone`

package can be cited as:

**Neal, Z. P. (2022). backbone: An R package to Extract Network
Backbones. PLOS ONE, 17. https://doi.org/10.1371/journal.pone.0269137**

For additional resources on the backbone package, please see https://www.rbackbone.net/.

If you have questions about the backbone package or would like a backbone hex sticker, please contact the maintainer Zachary Neal by email (zpneal@msu.edu) or via Mastodon (@zpneal@mastodon.social). Please report bugs in the backbone package at https://github.com/zpneal/backbone/issues.

The backbone package can be loaded in the usual way:

```
library(backbone)
#> ____ backbone v2.1.4
#> | _ \ Cite: Neal, Z. P., (2022). Backbone: An R package to extract network backbones.
#> |#|_) | PLOS ONE, 17, e0269137. https://doi.org/10.1371/journal.pone.0269137
#> |# _ <
#> |#|_) | Help: type vignette("backbone"); email zpneal@msu.edu; github zpneal/backbone
#> |____/ Beta: type devtools::install_github("zpneal/backbone", ref = "devel")
```

Upon successful loading, a startup message will display that shows the version number, citation, ways to get help, and ways to contact us.

The primary use of the backbone package is backbone extraction, that is, identifying and preserving only the most important edges in a network. Different types of networks require different backbone extraction methods, however the basic workflow using the backbone package is the same for all methods:

You begin with some network data

`dat`

, which may take any of the following forms: an adjacency matrix (as a matrix, Matrix, or Sparse Matrix object), an edge list (as a 2- or 3-column matrix or data frame), or an igraph object.You run an applicable backbone function on these data (e.g.,

`sdsm(dat)`

), which yields an unweighted network of the same class as`dat`

.

The figure illustrates which functions are applicable for which types of network data, and highlights in blue the analytic workflow and function that is recommended for each type.

If you are unsure about your type of network data or which backbone
function to use, you can run `suggest.backbone(dat)`

. This
helper function will examine the data and make a suggestion. If you also
specify a significance level or sparsification parameter `s`

,
then the function will extract the backbone using the suggested approach
and will display text describing what it did. In all cases,
`s`

can range from 0 to 1, with smaller values yielding
sparser backbones. For example:

```
dat <- matrix(runif(100),10,10) #Some data
backbone.suggest(dat) #What should I do?
#> The disparity filter is suggested. Type "?disparity" for more information.
backbone <- backbone.suggest(dat, s = 0.05) #Or, just do it
#> Extracting backbone using: disparity(G, alpha = 0.05, signed = FALSE, mtc = "none", class = "original", narrative = TRUE)
#>
#> === Suggested manuscript text and citations ===
#> We used the backbone package for R (v2.1.4; Neal, 2022) to extract the unweighted backbone of a weighted and directed unipartite network containing 10 nodes. An edge was retained in the backbone if its weight was statistically significant (alpha = 0.05) using the disparity filter (Serrano et al., 2009). This reduced the number of edges by 100%, and reduced the number of connected nodes by 100%.
#>
#> Neal, Z. P. 2022. backbone: An R Package to Extract Network Backbones. PLOS ONE, 17, e0269137. https://doi.org/10.1371/journal.pone.0269137
#>
#> Serrano, M. A., Boguna, M., & Vespignani, A. (2009). Extracting the multiscale backbone of complex weighted networks. Proceedings of the National Academy of Sciences, 106(16), 6483-6488. https://doi.org/10.1073/pnas.0808904106
```

The sections below describe and illustrate the backbone functions applicable to different types of networks, including weighted bipartite projections, non-projection weighted networks, and unweighted networks. Additionally, the package includes several utility functions that are used by the backbone methods, but may also have independent applications.

`narrative`

parameterMost functions in the backbone package include a
`narrative`

parameter. When `narrative = TRUE`

,
narrative text describing the backbone extraction will be displayed with
the relevant citations. This text is suitable for use in manuscripts,
and ensures complete and consistent reporting of the backbone
extraction. You can see an example of this type output in the
`backbone.suggest()`

example above, as well as in some of the
examples shown below.

The functions in the backbone package support many different data formats that can be used to represent a network or graph in R:

**matrix**- A bipartite network can be represented as an incidence matrix, and a unipartite matrix can be represented as an adjacency matrix. An incidence/adjacency matrix can be supplied to a backbone extraction function as a*matrix*object or*Matrix*object created using the Matrix package.**edgelist**- A network can be represented as an edgelist, supplied as a*dataframe*object with two or three columns. When the network is one-mode/unipartite, the first two columns contain the IDs of the sending and receiving nodes, respectively, while the third column (optionally) contains the edge weights. When the netwrok is two-mode/bipartite, the first column contains the IDs of one set of nodes, the second column contains IDs of the other set of nodes, and the third column (optionally) contains the edge weights.**igraph**- A network can be supplied to a backbone extraction function as an*igraph*object from the igraph package.

Note: Earlier versions of backbone also supported *network*
objects created using the statnet package. To reduce package
dependencies, these objects are no longer supported. You can convert a
*network* bipartite graph into an incidence matrix using
`network::as.matrix.network(graph, type = "incidence")`

, and
can convert a *network* unipartite graph into an adjacency matrix
using
`network::as.matrix.network(graph, type = "adjacency")`

. In
either case, if the graph is weighted, the command should also include
`attrname = "weight"`

.

A weighted unipartite network contains one type of node, and pairs of
nodes are connected by edges that have weights. These weights usually
indicate the intensity of the relationship between them (e.g., strength,
intensity, etc.). A weighted unipartite network can be represented by a
matrix **W**, where W_{ij} indicates the weight of
the edge between node *i* and node *j*. Weighted networks
can be undirected (i.e., W_{ij} = W_{ji}) or directed
(i.e., W_{ij} != W_{ji}).

The goal of backbone extraction from a weighted network is to
identify edges in **W** whose weights are statistically
significantly strong (or weak, for signed backbones), and thus should be
preserved in the backbone. One simple possibility is to apply a global
threshold *T*, such that all edges with weights stronger than
*T* are preserved, however this approach tends to systematically
overlook nodes with low degree. Alternative approaches, such as the
disparity filter (Serrano, BogunÃ¡, and Vespignani
2009), involve comparing edge weights to their expected
values under a null model that considers each nodeâ€™s local
neighborhood.

To illustrate extracting the backbone from a weighted network, we
begin by creating a simulated weighted network, stored in a matrix
`W`

. This network has strongly heterogeneous edge weights;
some are quite weak (10) while others are quite strong (100), which
gives the network a multi-scalar character. This network might represent
the number of passengers flying between each of 10 airports, where there
are two hub airports (one large, one small) that each serve their local
areas.

```
W <- matrix(c(0,10,10,10,10,75,0,0,0,0,
10,0,1,1,1,0,0,0,0,0,
10,1,0,1,1,0,0,0,0,0,
10,1,1,0,1,0,0,0,0,0,
10,1,1,1,0,0,0,0,0,0,
75,0,0,0,0,0,100,100,100,100,
0,0,0,0,0,100,0,10,10,10,
0,0,0,0,0,100,10,0,10,10,
0,0,0,0,0,100,10,10,0,10,
0,0,0,0,0,100,10,10,10,0),10)
```

We could simply examine this network as a weighted network, however this yields a cluttered visualization, and many network analytic techniques (e.g., ERGM, SOAM) are not possible with weighted networks:

```
weighted <- igraph::graph_from_adjacency_matrix(W, mode = "undirected", weighted = TRUE, diag = FALSE)
plot(weighted, edge.width = sqrt(igraph::E(weighted)$weight), vertex.label = NA)
```

We could extract a backbone by applying a global threshold using the
`global()`

function. One possibility is to extract a backbone
that preserves all edges with weights greater than 0. However, this
yields a network that is very dense because all edges are preserved:

Instead, we could choose the threshold value based on the edge weights. For example, we could extract a backbone by using the average edge weight as the global threshold, thus obtaining a backbone that preserves stronger-than-average edges. However, this yields a network that is focused only on high-degree nodes (i.e., the large hub airport), while ignoring the low-degree nodes (i.e., the smaller hub airport):

A better option is to extract the backbone using a statistical null model, such as the disparity filter (currently, the only available model). Now we can clearly see the hub-and-spoke structure of the transportation system:

```
bb <- disparity(W, alpha = 0.05, narrative = TRUE, class = "igraph")
#>
#> === Suggested manuscript text and citations ===
#> We used the backbone package for R (v2.1.4; Neal, 2022) to extract the unweighted backbone of a weighted and undirected unipartite network containing 10 nodes. An edge was retained in the backbone if its weight was statistically significant (alpha = 0.05) using the disparity filter (Serrano et al., 2009). This reduced the number of edges by 43.8%, and reduced the number of connected nodes by 0%.
#>
#> Neal, Z. P. 2022. backbone: An R Package to Extract Network Backbones. PLOS ONE, 17, e0269137. https://doi.org/10.1371/journal.pone.0269137
#>
#> Serrano, M. A., Boguna, M., & Vespignani, A. (2009). Extracting the multiscale backbone of complex weighted networks. Proceedings of the National Academy of Sciences, 106(16), 6483-6488. https://doi.org/10.1073/pnas.0808904106
plot(bb, vertex.label = NA)
```

A bipartite network contains two types of nodes, which we call
*agents* and *artifacts*, and edges that exist only
between nodes of different types (i.e., an edge may connect an agent to
an artifact, but never an agent to another agent). A bipartite network
can be represented by a matrix **B**, where B_{ik}
= 1 if agent *i* is connected to artifact *k*, and
otherwise is 0 (e.g., author *i* wrote paper *k*). The row
sums of **B** indicate agent degrees (e.g., the number of
papers written by author *i*), while the column sums of
**B** indicate artifact degrees (e.g., the number of
authors on paper *k*).

A bipartite projection is a network of *agents* that are
connected by the fact that they share *artifacts* in common. A
bipartite projection can be represented by a square symmetric matrix
**P**, which is formed as **P** =
**B** * **B**â€™, where **B**â€™ is
the transpose of **B**. In **P**,
P_{ij} indicates the number of artifacts *k* that are
shared by agents *i* and *j*, and can be viewed as the
weight of the edge connecting them in the network (e.g., the number of
papers co-authored by authors *i* and *j*).

The goal of backbone extraction from bipartite projections is to
identify edges in **P** whose weights are statistically
significantly strong (or weak, for signed backbones), and thus should be
preserved in the backbone. All methods implemented in the backbone
package use the same approach for testing the statistical significance
of edges:

- Compute P
_{ij} - Generate a bipartite network
**B*** that preserves features of**B**, but otherwise is random. All methods preserve**B**â€™s number of rows and columns.`fixedfill()`

preserves the total number of 1s in**B**,`fixedrow()`

preserved the row sums of**B**, and`fixedcol()`

preserved the column sums of**B**. Finally,`fdsm()`

and`sdsm()`

preserve both the row and column sums of**B**; the former preserves them exactly, while the latter preserves them on average. - Compute
**P*** =**B*** ***B***â€™ - Compute and save P*
_{ij} - Repeat steps 2-4
*N*times - Compute the positive p-value associated with edge P
_{ij}as: (number of times P*_{ij}> P_{ij}) /*N*

This approach relies on Monte Carlo simulation, and is used by
`fdsm()`

. However, for the other bipartite backbone null
models, the distribution of P*_{ij} is known and does not
require simulation (Neal, Domagalski, and Sagan
2021), which makes them faster. Additionally, this process
preserves only edges that are statistically significantly
*stronger* than expected under the chosen null model, thus
yielding a binary backbone. However, it is also possible to preserve
edges that are statistically significantly *weaker*, and thus to
obtain a signed backbone.

Because all bipartite backbone functions work similarly, here we will
use the stochastic degree sequence model and `sdsm()`

function as an example, applying it to simulated data.

We begin by creating a simulated bipartite network, stored in a
matrix `B`

. This network contains 30 agents (rows) and 75
artifacts (columns). The agents are connected to the artifacts in a way
that we might expect if the agents form three cohesive communities. For
example, this could represent authors from three distinct disciplines,
and the papers they have written. In practice, `B`

can take
the form of a matrix object (as in this example), but could take other
forms, such as a bipartite network stored in a `statnet`

or
`igraph`

class object.

```
B <- rbind(cbind(matrix(rbinom(250,1,.8),10),
matrix(rbinom(250,1,.2),10),
matrix(rbinom(250,1,.2),10)),
cbind(matrix(rbinom(250,1,.2),10),
matrix(rbinom(250,1,.8),10),
matrix(rbinom(250,1,.2),10)),
cbind(matrix(rbinom(250,1,.2),10),
matrix(rbinom(250,1,.2),10),
matrix(rbinom(250,1,.8),10)))
```

Looking at a portion of the bipartite network, we can see that author 1 was an author on papers 2-5, but not paper 1:

```
B[1:5,1:5]
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0 1 0 0 1
#> [2,] 1 1 1 1 0
#> [3,] 1 0 1 1 1
#> [4,] 1 1 0 1 1
#> [5,] 1 0 0 1 1
```

Looking at the row and column sums, we can also see that each author wrote about 30 papers, and each paper has about 10 authors, but there is some variation:

```
rowSums(B)
#> [1] 28 29 31 31 27 32 33 32 22 28 30 25 31 33 27 26 34 20 32 25 30 33 29 33 29
#> [26] 28 33 31 30 33
colSums(B)
#> [1] 11 14 12 13 10 11 11 12 14 11 11 9 7 5 16 11 15 12 10 10 11 14 13 14 12
#> [26] 8 9 12 12 14 10 10 9 7 16 12 10 13 15 10 9 7 8 12 13 11 13 17 12 13
#> [51] 14 17 13 11 9 12 13 9 8 15 12 12 8 13 13 14 12 14 14 14 16 13 12 11 15
```

We can construct an ordinary weighted bipartite projection, then plot
it using `igraph`

, but the result is a dense and
uninformative network with no obvious communities:

```
P <- B%*%t(B)
plot(igraph::graph_from_adjacency_matrix(P, mode = "undirected", diag = FALSE, weighted = TRUE), vertex.label = NA)
```

Instead, we can extract the backbone of the weighted bipartite
projection using the stochastic degree sequence model. By choosing the
SDSM, we are comparing the observed edge weights to those that would be
observed under a null model where each of the authors wrote
approximately the same number of papers as in `B`

, and each
of the papers have approximately the same number of authors as in
`B`

, but which authors wrote which papers is random:

```
bb <- sdsm(B, alpha = 0.075, narrative = TRUE, class = "igraph")
#>
#> === Suggested manuscript text and citations ===
#> We used the backbone package for R (v2.1.4; Neal, 2022) to extract the unweighted backbone of the weighted projection of an unweighted bipartite network containing 30 agents and 75 artifacts. An edge was retained in the backbone if its weight was statistically significant (alpha = 0.075) using the stochastic degree sequence model (SDSM; Neal, 2014). This reduced the number of edges by 83.2%, and reduced the number of connected nodes by 3.3%.
#>
#> Neal, Z. P. 2022. backbone: An R Package to Extract Network Backbones. PLOS ONE, 17, e0269137. https://doi.org/10.1371/journal.pone.0269137
#>
#> Neal, Z. P. (2014). The backbone of bipartite projections: Inferring relationships from co-authorship, co-sponsorship, co-attendance and other co-behaviors. Social Networks, 39, 84-97. https://doi.org/10.1016/j.socnet.2014.06.001
```

Notice that this function is performed on the original bipartite
network `B`

, not on the bipartite projection `P`

.
We use `alpha = 0.075`

to indicate that edges should be
preserved at the 0.075 level of statistical significance; we use an
unusually large value in this example only because it is a small toy
network. We use `narrative = TRUE`

to request that suggested
manuscript text and citations be displayed. Finally, we use
`class = "igraph"`

to request that the resulting backbone be
returned as an `igraph`

class object, to facilitate further
analysis. When we plot the backbone, the three communities are now
clearly visible:

An unweighted unipartite network contains one type of node, and pairs
of nodes are either connected or not. An unweighted unipartite network
can be represented by a matrix **U** where U_{ij}=1
if nodes *i* and *j* are connected, and otherwise is 0.
Unweighted networks can be undirected (i.e., W_{ij} =
W_{ji}) or directed (i.e., W_{ij} !=
W_{ji}).

The goal of backbone extraction from an unweighted network is to
identify edges in **U** that are important to the networkâ€™s
structure, and thus should be preserved in the backbone. Because the
network is already unweighted, and the goal is simply to reduce the
number of edges, this process is sometimes also called
*sparsification*. The challenge is that, unlike bipartite projections and weighted networks, the edges in unweighted networks
do not have weights that can be used to evaluate their importance.

To overcome this challenge, backbone models designed for unweighted
networks first assign each edge a score that is intended to capture its
importance. Many different possible edge scoring metrics have been
proposed. For example, the *jaccard* coefficient assigns an edge
a score that captures the amount of overlap in its two endpointsâ€™
neighborhoods. In the context of a social network, the jaccard
coefficient of edge U_{ij} would indicate the extent to which
*i*â€™s friends are also *j*â€™s friends. Similarly, the
*triangle count* has also been proposed as an edge scoring
metric, and counts the number of triangles that are completed by an
edge. This metric draws on sociological theory (e.g., Simmel,
Granovetter) to view triangle-completing edges as important because they
form cohesive triads.

Although many sparsification models have been proposed, each one is
defined by four characteristics: * What metric is used for the edge
*score*? * How is the edge score *normalized*? * How are
edges *filtered* based on the (normalized) edge score? * Are
edges in the union of minimum spanning trees (*UMST*) preserved
to ensure connectivity?

The backbone of an unweighted network can be extracted using the
`sparsify()`

function, which includes parameters that answer
each of the four questions that define a sparsification model
(`escore`

, `normalize`

, `filter`

, and
`umst`

), as well as a sparsification parameter `s`

that controls the amount of sparsification.

Specific combinations of these parameters correspond to named
sparsification models that have been proposed in the literature. For
example, the *L-spar* sparsification model (Satuluri,
Parthasarathy, and Ruan 2011) can be obtained using
`sparsify(escore = "jaccard", normalize = "rank", filter = "degree", umst = FALSE)`

.
For this reason, the backbone package includes several wrappers for the
`sparsify()`

function that return specific named
sparsification models. For example, the
`sparsify.with.lspar()`

function is a wrapper that uses the
above `sparsify()`

options.

Different sparsification models are designed to perserve different
structural features believed to be embedded in dense networks. To
illustrate extracting the backbone from an unweighted network believed
to contain communities, we begin by creating a dense unweighted network
`U1`

. This network contains 60 nodes, but is so dense that
the presence of communities is obscured:

```
U.with.communities <- igraph::sbm.game(60, matrix(c(.75,.25,.25,.25,.75,.25,.25,.25,.75),3,3), c(20,20,20))
#> Warning: `sbm.game()` was deprecated in igraph 2.0.0.
#> â„¹ Please use `sample_sbm()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
plot(U.with.communities, vertex.label = NA)
```

However, the backbone of this dense network can be extracted using the L-spar sparsification model, which clearly reveals a three-community structure:

```
bb <- sparsify.with.lspar(U.with.communities, s = 0.6, narrative = TRUE)
#>
#> === Suggested manuscript text and citations ===
#> We used the backbone package for R (v2.1.4; Neal, 2022) to extract the unweighted backbone of an unweighted and undirected unipartite network containing 60 nodes. Specifically, we used Satuluri et al.'s (2011) L-Spar model with a sparsification threshold of 0.6. This reduced the number of edges by 61.8%, and reduced the number of connected nodes by 0%.
#>
#> Neal, Z. P. 2022. backbone: An R Package to Extract Network Backbones. PLOS ONE, 17, e0269137. https://doi.org/10.1371/journal.pone.0269137
#>
#> Satuluri, V., Parthasarathy, S., & Ruan, Y. (2011, June). Local graph sparsification for scalable clustering. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (pp. 721-732). https://doi.org/10.1145/1989323.1989399
plot(bb, vertex.label = NA)
```