Reference intervals play a crucial role in the medical interpretation and statistical evaluation of laboratory results. By definition, reference intervals include the central 95% of results measured in non-diseased reference individuals.

In conventional approaches, reference intervals are determined using
*direct methods*, which involve collecting laboratory results
from a minimum of 120 apparently healthy individuals and calculating the
2.5th and 97.5th percentiles with parametric or non-parametric methods.
Although direct methods are currently the gold standard, their
implementation can be difficult due to cost and time issues, limited
resources, or ethical considerations. In addition, the definition of
“obviously healthy” according to medical criteria is rather vague.

In this article, we present an alternative method that is included in
the *reflimR* package. It is an *indirect method* for
deriving reference limits from mixed populations, which contain an
unknown proportion of sick individuals (usually less than 25 percent).
Here, the definition of “obviously healthy” is based on the statistical
criterion of a homogeneous distribution to which a normal distribution
can be fitted after suitable transformation.

Indirect methods are particularly useful when routine data from a laboratory information system are available and direct sampling of non-diseased individuals is not feasible.

To install the *reflimR* package, open R and enter the
following command in the console:

This command will download the package from CRAN. Once the installation is complete, load the package into your R session using the following command:

The package is now ready for use in your R environment (e.g. in RStudio). To see the documentation of the package with all its help files, please enter

*reflimR* comes with the *livertests*
dataset, which has been specifically prepared for the purpose of
reference interval estimation with direct and indirect methods. The
dataset includes eight different biomarkers (laboratory tests) that have
been measured in healthy controls and in patients with liver diseases.
Here is an example excerpt from the dataset.

```
#> Category Age Sex ALB ALT AST BIL CHE CREA GGT PROT
#> 1 reference 32 f 39.9 22.0 29.8 6.3 8.16 60 4.5 72.5
#> 204 patient 53 f 42.4 20.9 42.4 7.7 6.60 67 14.2 70.9
#> 444 reference 54 m 46.4 54.1 39.6 10.6 6.59 85 73.2 75.2
#> 589 patient 53 m 46.0 34.0 43.0 14.0 8.77 112 203.0 76.0
```

In addition, the corresponding reference intervals are stored in the
*targetvalues* dataset. The reference interval table has been
derived partly from the electronic handbook Clinical
Laboratory Diagnostics published by L. Thomas, and partly from the
manufacturer’s assay sheet.

```
#> analyte unit ll.female ul.female ll.male ul.male
#> 1 ALB g/L 35.0 53.0 35.0 53.0
#> 2 ALT U/L 10.0 35.0 10.0 50.0
#> 3 AST U/L 10.0 35.0 10.0 50.0
#> 4 BIL µmol/L 2.0 21.0 2.0 21.0
#> 5 CHE kU/L 3.9 10.8 4.6 11.5
#> 6 CREA µmol/L 41.0 88.0 50.0 104.0
#> 7 GGT U/L 6.0 40.0 10.0 60.0
#> 8 PROT g/L 66.0 83.0 66.0 83.0
```

To adress a specific target interval, e. g. for albumin in women, you may use the following command:

The *reflimR* package includes a total of ten functions, of
which *reflim(x)* is the main function. It calls the other nine
functions that can be arranged in three groups:

group 1 | group 2 | group 2 |
---|---|---|

ri_hist | lognorm | adjust_digits |

permissible_uncertainty | iboxplot | bowley |

interpretation | truncated_qqplot | conf_int95 |

Group 1 includes the three main functions that provide the user with
the final results of *reflim()* and help to interpret them:
*ri_hist()* creates a graphical output (histogram with reference
interval and density curves), *permissible_uncertainty*
calculates the medical tolerance limits of the results [1, 2], and
*interpretation()* assesses the significance of deviations from
given target values.

Group 2 performs the three underlying statistical operations
*modelling*, *truncation*, and *calculation*, and
group 3 contains auxiliary functions for miscellaneous tasks. Details of
each function are available in the respective help files, which can be
addressed with the *help()* function or a question mark followed
by the function name (e.g. *help(reflim)* or
*?lognorm*).

The *reflim* function can be executed by simply calling
*reflim(x)*, where x is a vector of numeric data to be analyzed.
For example, if the reference interval for bilirubin is to be estimated
from the values included in the *livertests* dataset, the
following two lines of code will suffice:

```
#> $stats
#> meanlog sdlog n.total n.trunc
#> 1.96 0.50 612.00 522.00
#>
#> $lognormal
#> [1] TRUE
#>
#> $limits
#> lower.lim upper.lim lower.lim.low lower.lim.upp upper.lim.low
#> 2.67 18.91 2.34 3.00 17.37
#> upper.lim.upp
#> 20.45
#>
#> $targets
#> lower.lim upper.lim lower.lim.low lower.lim.up upper.lim.low
#> NA NA NA NA NA
#> upper.lim.upp
#> NA
#>
#> $perc.norm
#> [1] 89.8
#>
#> $confidence.int
#> lower.lim.low lower.lim.upp upper.lim.low upper.lim.upp
#> 2.38 3.23 15.65 21.20
#>
#> $interpretation
#> lower.limit upper.limit
#> NA NA
#>
#> $remarks
#> [1] NA
```

Please note that any filtering and cleaning (e. g. for removal of
non-numeric values or duplicate results of a single individual) or
partitioning of the data (for sex, age, etc) is not included in the
*reflimR* package. These steps can easily be performed separately
with base R functions such as *as.numeric()*, *unique()*
or *subset()* to achieve the desired results.

As shown above, the main result of the *reflim()* function is
an illustrative graphic showing a histogram of the original data with a
dotted overall density curve. The estimated reference limits are
displayed as dashed vertical lines and the respective medical tolerance
limits as gray bars. The blue solid density curve represents the
theoretical distribution of the assumed reference population, and the
density curves of potential pathological outliers are shown in red.

Below the graphic, there is a list of statistical results such as the
parameters of the theoretical distribution (*stats*), the assumed
distribution model (*lognormal* = TRUE or FALSE), and the
calculated reference limits with tolerance intervals (*limits*).
On the bottom of the list is an estimate of the 95 percent confidence
intervals for the lower and upper reference limits
(*confidence.int*). The remaining results are set to NA if no
target values were specified.

The *reflim* function comes with nine additional arguments
that default to meaningful values for a quick overview. They are
accessible via the help file (*?reflim*) and can be changed if
needed. For example, if you want the graphic to be nicely labeled and
the targets from the *targetvalues* dataset to be verified, you
can specify the *main*, *xlab*, and *targets*
arguments. And if you want to limit the output of the function to some
essentials, you can specify what you want to see with a $ after the
closing parenthesis.

```
#> lower.limit upper.limit
#> "markedly increased" "slightly decreased"
```

**Interpretation with traffic light colors**

If target values have been specified, the other two functions of group 1 are used to check whether the calculated reference limits are within the permitted tolerance limits and assigns traffic light colors to visually indicate any deviations. Green means that the calculated limit is within the tolerance interval of the target, yellow means that the calculated limit falls outside but the two tolerance intervals overlap, and red means that there is no overlap between the tolerance intervals.

*ri_hist()*, *permissible_uncertainty()* and
*interpretation()* are called by the *reflim* function,
but can also be used independently for the evaluation of reference
limits obtained by other methods (e.g., from the *refineR*
package).

**Three steps to the result**

As already mentioned, *reflimR* performs an indirect
three-step approach. The underlying functions form the group 2 in the
above table. Here is an overview with some practical examples.

**Step 1:** To define an appropriate distribution model,
Bowley’s quartile skewness coefficient [3] is calculated for the
original and the logarithmized data, using the *bowley* function
from group 3. The underlying algorithm is quite robust against
pathological outliers, because it is calculated from the central 50
percent of the data (the “box” of Tukey’s boxplot), where the influence
of pathological values is minimized [4]. If the skewness coefficient is
close to 0, the distribution is symmetric and can be approximated by a
normal distribution (at least in the range of interest between the 2.5th
and 97.5th percentiles). If, however, the skewness coefficient is
clearly positive, the distribution is right-skewed. If the shape of the
curve becomes more symmetric after log-transformation of the data, the
*lognorm* function indicates that the distribution can be better
approximated by a lognormal model.

Here is an example of a symmetric distribution from the albumin data
using the *lognorm* function:

```
#> $lognormal
#> [1] FALSE
#>
#> $BowleySkewness
#> normal lognormal delta
#> 0.019 -0.019 0.038
```

The skewness coefficient of the original values is 0.019 and the
shape of the black curve is not much changed after log-transformation
(blue). The difference of the two skewness coefficients is 0.038, i. e.
below the threshold of 0.05, which has been defined as a fixed value in
the *lognorm* function [4].

In contrast, alanine aminotransferase is an example for an asymmetric distribution:

```
#> $lognormal
#> [1] TRUE
#>
#> $BowleySkewness
#> normal lognormal delta
#> 0.207 0.036 0.171
```

Here, the original distribution curve (black) is clearly right-skewed with a skewness coefficient of 0.207, and becomes more symmetrical after logarithmization of the data (blue). The delta of the skewness coefficients is 0.171 and thus above the threshold of 0.05.

If a lognormal distribution is recommended in step 1, the values are
logarithmized by the *reflim* function so that the normal
distribution assumption can be used in the next two steps.

**Step 2:** Subsequently, the data set is truncated in
an iterative way such that the presumably pathological values are
removed as best as possible and the central 95% of the presumably
non-pathological values remain without substantial losses. To achieve
this, we adopt the *iboxplot()* algorithm, [5], which is again
based on Tukey’s boxplot. Starting with the central 50 percent of the
data (i. e. the values between the first and the third quartile), the
theoretical 2.5th and 97.5th percentiles of a Gaussian distribution are
calculated and values beyond these boundaries are removed. This
algorithm is repeated until no more outliers are removed. After the
first truncation step, the calculation of the 2.5th and 97.5th
percentiles is adapted to the fact that the data vector has already been
truncated [5].

Here is an application of the *iboxplot* function using the
bilirubin dataset as an example:

The upper boxplot (white) shows a considerable amount of dots outside
the whiskers, which are effectively removed with two cycles of the
*iboxplot* algorithm (blue boxplot). Between cycles 2 and 3, no
more values are removed, so the algorithm stops at this point:

```
x1$progress
#> n min max
#> cycle0 612 0.8 209.0
#> cycle1 534 2.9 18.6
#> cycle2 522 2.9 17.4
#> cycle3 522 2.9 17.4
```

The *iboxplot* function derives the estimated percentage of
“normal values” from the length of the truncated vector
*x1$trunc* by dividing it by 0.95:

Notably, the lower and upper truncation points are rough estimates of the reference limits:

**Step 3:** To improve the robustness of this estimate
even further, a normal quantile-quantile plot (Q-Q plot) is created from
the truncated data [4]. The function *truncated_qqplot*
determines the mean and standard deviation from the intercept and the
slope of the regression line and extrapolates the 2.5th and 97.5th
percentiles from the equation *mean ± 1.96 sd*. These percentiles
represent the estimated reference interval, and thus the final
result.

```
#> $result
#> meanlog sdlog lower.lim upper.lim
#> 1.96 0.50 2.67 18.91
#>
#> $lognormal
#> [1] TRUE
```

If the calculation was carried out with logarithmized values, the resulting reference limits are automatically delogarithmized.

As already mentioned in step 2, the truncation points of step 2 are close to the final result of step 3, but they are not identical. The reason is that extrapolating the respective percentiles from the linear Q-Q plot obviously stabilizes the result and makes it more robust against minor overlaps with pathological values at the ends of the regression line. To achieve this, only the central 90 percent of the Q-Q plot are used for the calculation of the regression line.

If you want to visualize the whole process at a glance, you can set
the *plot.all* argument of the *reflim* function to TRUE
to display four graphics. Here is an example of GGT results for
women:

```
ggt.f <- livertests$GGT[livertests$Sex == "f"]
reflim(ggt.f, plot.all = TRUE, n.min = 150,
targets = targetvalues[7, 3 : 4],
main = "GGT (f)", xlab = "U/L")$interpretation
```

```
#> lower.limit upper.limit
#> "slightly increased" "within tolerance"
```

In the present case, the Q-Q plot is quite linear, so the reference interval of 6.8 to 38.7 U/L seems credible. The respective target values are 6 to 40 U/L.

The linearity of the Q-Q plot is a visual measure of the data quality. If you repeat the above calculation with the respective values for men, you will obtain a density curve with a small pathological fraction that partially overlaps with the “box” of the boxplot:

```
ggt.m <- livertests$GGT[livertests$Sex == "m"]
ln <- lognorm(ggt.m, main = "GGT (m)", xlab = "U/L")
arrows(46, 0.015, 46, 0.02, code = 1, length = 0.1, lwd = 2)
```

This fraction cannot be completely removed by the *iboxplot*
algorithm:

```
xtrunc.m <- iboxplot(ggt.m, xlab = "U/L")$trunc
arrows(55, 0.007, 55, 0.012, code = 1, length = 0.1, lwd = 2)
```

As a result, the Q-Q plot shows visible deviations from linearity:

```
qq.m <- truncated_qqplot(xtrunc.m)
arrows(1.5, log(30), 1.5, log(45), code = 2, length = 0.1, lwd = 2)
```

Therefore, the estimated reference interval of 9.4 to 61.4 U/L should
be interpreted with caution. In such cases, we recommend to repeat the
test using a comparative method (e.g. *refineR*) and to consult
the literature.

The *reflimR* package provides a simple and robust three-step
procedure for the estimation of reference intervals from routine
laboratory data that may contain an unknown proportion of pathological
results. Due to the fact that the initial two steps of the algorithm are
based on quartiles (i. e. the central 50 percent of the original
results), this proportion should not exceed 25 percent on one side of
the distribution.

Compared to many other algorithms, the *reflim* function is
extremely fast. This is mainly achieved by the fact that it does not use
elaborate procedures to optimize a theoretical distribution model: For
symmetric distributions, it assumes a normal distribution in the range
between the 2.5th and 97.5th percentiles, and for right-skewed
distributions, the data are logarithmized to approximate a lognormal
distribution in the same range. Left-skewed distributions are assumed to
be mixtures of a symmetric distribution and pathological outliers on the
left side. If the distribution type is unclear, you can assume a
lognormal distribution (i. e. set the *lognorm* argument to TRUE)
without risking large errors in the estimation [7].

Due to an intuitive traffic light color scheme, the estimated reference limits can easily be verified against target values from external sources (e. g., from package inserts or handbooks). The assessment of any deviations is based on the degree of overlap between the corresponding tolerance intervals, which makes the interpretation easy and transparent.

Our method falls into the category of the so-called “modified Hoffmann approaches”, which are derived from the original method published by RG Hoffmann in 1963 [8]. The common element of these methods is that they are based on the evaluation of a regression line obtained by comparing the distribution of observed values with a standard normal distribution. In the original Hoffmann method, a probability-probability plot is generated, whereas most of the modifications including ours use a quantile-quantile plot.

Other modifications concern a) the transformation of the original data to approximate a normal distribution, b) the type of truncation to eliminate potentially pathological outliers, and c) the identification of the linear part of the regression line. In the original Hoffmann method [8] as well as in our own original modification [6], this straight part was identified by visual inspection, whereas the method included in this package is fully automated and independent of any subjective judgment.

Haeckel R et al. Permissible limits for uncertainty of measurement in laboratory medicine. Clin Chem Lab Med 2015;53:1161–71. .

Haeckel R et al. Equivalence limits of reference intervals for partitioning of population data. J Lab Med 2016; 40: 199-205. .

Bowley, AL (1920). Elements of Statistics. London : P.S. King & Son, Ltd.

Klawonn F, Hoffmann G, Orth M. Quantitative laboratory results: normal or lognormal distribution. J Lab Med 2020; 44: 143–50. .

Klawonn F, Hoffmann G. Using fuzzy cluster analysis to find interesting clusters. In: L.A. Garcia-Escuderoet al. (eds.): Building bridges between soft and statistical methodologies for data science. Springer, Cham (2023), 231-239. .

Hoffmann G et al. Simple estimation of reference intervals from routine laboratory data. J Lab Med 2016. .

Haeckel R, Wosniok W. Observed, unknown distributions of clinical chemical quantities should be considered to be log-normal: a proposal. Clin Chem Lab Med. 2010; 48: 1393–6 .

Hoffmann RG. Statistics in the practice of medicine. J Am Med Assoc 1963; 185: 864–73.