This vignette provides an overview of the primary function of the
linkage scenario portion of the phylosamp package: how to estimate the
false discovery rate given a sample size. In the examples provided, we
use the default `assumption`

argument (multiple transmissions
and multiple links, `mtml`

), though alternative assumptions
can also be specified.

The most basic function of the package is
`translink_tdr()`

, which calculates the probability that an
identified link represents a true transmission event. This calculation
relies on the following parameters:

Param | Variable Name | Description |
---|---|---|

\(\eta\) |
sensitivity | the sensitivity of the linkage criteria for identifying transmission links |

\(\chi\) |
specificity | the specificity of the linkage criteria for identifying transmission links |

\(\rho\) |
rho | the proportion of infections sampled |

\(M\) |
M | the number of infections sampled |

\(R\) |
R | the average reproductive number (also denoted \(R_\text{pop}\), see below) |

`library(phylosamp)`

`translink_tdr(sensitivity=0.99, specificity=0.95, rho=0.75, M=100, R=1)`

`## Calculating true discovery rate assuming multiple-transmission and multiple-linkage`

`## [1] 0.2334906`

In other words, given a sample size of 100 infections (representing 75% of the total population), a linkage criteria with a specificity of 99% for identifying infections linked by transmission and a specificity of 95%, fewer than 25% of identified pairs will represent true transmission events. Increasing the specificity to 99.5% has a significant impact on our ability to distinguish linked and unlinked pairs:

`translink_tdr(sensitivity=0.99, specificity=0.995, rho=0.75, M=100, R=1)`

`## Calculating true discovery rate assuming multiple-transmission and multiple-linkage`

`## [1] 0.7528517`

The other core functions are designed to calculate the expected
number of true transmission pairs identified in the sample
(`translink_expected_links_true()`

) and the total number of
linkages one can expect to identify given the sensitivity and
specificity of the linkage criteria and a particular sample size and
proportion (`translink_expected_links_obs()`

).

`translink_expected_links_true(sensitivity=0.99, rho=0.75, M=100, R=1)`

`## Calculating expected number of links assuming multiple-transmission and multiple-linkage`

`## [1] 74.25`

```
translink_expected_links_obs(sensitivity=0.99, specificity=0.95,
rho=0.75, M=100, R=1)
```

`## Calculating expected number of links assuming multiple-transmission and multiple-linkage`

`## [1] 318`

It is important to recognize that \(R\) in these functions represents the average \(R\) in the sampled population (alternatively denoted \(R_\text{pop}\)). Because any sampling frame contains a finite number of cases, there will always be more cases than infection events (at minimum, all infectees in a transmission chain plus a single index case), so \(R_\text{pop}\leq1\). For outbreaks with a single introduction, \(R_\text{pop}\) is approximately equal to 1; sampling frames containing cases from separate introduction events will have lower values of \(R_\text{pop}\).