reflimR: Estimation of reference limits from routine laboratory results

Georg Hoffmann, Sandra Klawitter & Frank Klawonn

  1. Introduction
  2. Installation
  3. Data
  4. Usage
  5. Quality Assessment
  6. Conclusion
  7. References


Reference intervals play a crucial role in the medical interpretation and statistical evaluation of laboratory results. By definition, reference intervals include the central 95% of results measured in non-diseased reference individuals.

In conventional approaches, reference intervals are determined using direct methods, which involve collecting laboratory results from a minimum of 120 apparently healthy individuals and calculating the 2.5th and 97.5th percentiles with parametric or non-parametric methods. Although direct methods are currently the gold standard, their implementation can be difficult due to cost and time issues, limited resources, or ethical considerations. In addition, the definition of “obviously healthy” according to medical criteria is rather vague.

In this article, we present an alternative method that is included in the reflimR package. It is an indirect method for deriving reference limits from mixed populations, which contain an unknown proportion of sick individuals (usually less than 25 percent). Here, the definition of “obviously healthy” is based on the statistical criterion of a homogeneous distribution to which a normal distribution can be fitted after suitable transformation.

Indirect methods are particularly useful when routine data from a laboratory information system are available and direct sampling of non-diseased individuals is not feasible.


To install the reflimR package, open R and enter the following command in the console:


This command will download the package from CRAN. Once the installation is complete, load the package into your R session using the following command:


The package is now ready for use in your R environment (e.g. in RStudio). To see the documentation of the package with all its help files, please enter

help(package = reflimR)


reflimR comes with the livertests dataset, which has been specifically prepared for the purpose of reference interval estimation with direct and indirect methods. The dataset includes eight different biomarkers (laboratory tests) that have been measured in healthy controls and in patients with liver diseases. Here is an example excerpt from the dataset.

#>      Category Age Sex  ALB  ALT  AST  BIL  CHE CREA   GGT PROT
#> 1   reference  32   f 39.9 22.0 29.8  6.3 8.16   60   4.5 72.5
#> 204   patient  53   f 42.4 20.9 42.4  7.7 6.60   67  14.2 70.9
#> 444 reference  54   m 46.4 54.1 39.6 10.6 6.59   85  73.2 75.2
#> 589   patient  53   m 46.0 34.0 43.0 14.0 8.77  112 203.0 76.0

In addition, the corresponding reference intervals are stored in the targetvalues dataset. The reference interval table has been derived partly from the electronic handbook Clinical Laboratory Diagnostics published by L. Thomas, and partly from the manufacturer’s assay sheet.

#>   analyte   unit ll.female ul.female ll.male ul.male
#> 1     ALB    g/L      35.0      53.0    35.0    53.0
#> 2     ALT    U/L      10.0      35.0    10.0    50.0
#> 3     AST    U/L      10.0      35.0    10.0    50.0
#> 4     BIL µmol/L       2.0      21.0     2.0    21.0
#> 5     CHE   kU/L       3.9      10.8     4.6    11.5
#> 6    CREA µmol/L      41.0      88.0    50.0   104.0
#> 7     GGT    U/L       6.0      40.0    10.0    60.0
#> 8    PROT    g/L      66.0      83.0    66.0    83.0

To adress a specific target interval, e. g. for albumin in women, you may use the following command:

targetvalues[1, 3 : 4]
#>   ll.female ul.female
#> 1        35        53


The reflimR package includes a total of ten functions, of which reflim(x) is the main function. It calls the other nine functions that can be arranged in three groups:

group 1 group 2 group 2
ri_hist lognorm adjust_digits
permissible_uncertainty iboxplot bowley
interpretation truncated_qqplot conf_int95

Group 1 includes the three main functions that provide the user with the final results of reflim() and help to interpret them: ri_hist() creates a graphical output (histogram with reference interval and density curves), permissible_uncertainty calculates the medical tolerance limits of the results [1, 2], and interpretation() assesses the significance of deviations from given target values.

Group 2 performs the three underlying statistical operations modelling, truncation, and calculation, and group 3 contains auxiliary functions for miscellaneous tasks. Details of each function are available in the respective help files, which can be addressed with the help() function or a question mark followed by the function name (e.g. help(reflim) or ?lognorm).

The reflim function can be executed by simply calling reflim(x), where x is a vector of numeric data to be analyzed. For example, if the reference interval for bilirubin is to be estimated from the values included in the livertests dataset, the following two lines of code will suffice:

x <- livertests$BIL

#> $stats
#> meanlog   sdlog n.trunc 
#>    1.96    0.50  612.00  522.00 
#> $lognormal
#> [1] TRUE
#> $limits
#>     lower.lim     upper.lim lower.lim.low lower.lim.upp upper.lim.low 
#>          2.67         18.91          2.34          3.00         17.37 
#> upper.lim.upp 
#>         20.45 
#> $targets
#>     lower.lim     upper.lim lower.lim.low  lower.lim.up upper.lim.low 
#>            NA            NA            NA            NA            NA 
#> upper.lim.upp 
#>            NA 
#> $perc.norm
#> [1] 89.8
#> $
#> lower.lim.low lower.lim.upp upper.lim.low upper.lim.upp 
#>          2.38          3.23         15.65         21.20 
#> $interpretation
#> lower.limit upper.limit 
#>          NA          NA 
#> $remarks
#> [1] NA

Please note that any filtering and cleaning (e. g. for removal of non-numeric values or duplicate results of a single individual) or partitioning of the data (for sex, age, etc) is not included in the reflimR package. These steps can easily be performed separately with base R functions such as as.numeric(), unique() or subset() to achieve the desired results.

As shown above, the main result of the reflim() function is an illustrative graphic showing a histogram of the original data with a dotted overall density curve. The estimated reference limits are displayed as dashed vertical lines and the respective medical tolerance limits as gray bars. The blue solid density curve represents the theoretical distribution of the assumed reference population, and the density curves of potential pathological outliers are shown in red.

Below the graphic, there is a list of statistical results such as the parameters of the theoretical distribution (stats), the assumed distribution model (lognormal = TRUE or FALSE), and the calculated reference limits with tolerance intervals (limits). On the bottom of the list is an estimate of the 95 percent confidence intervals for the lower and upper reference limits ( The remaining results are set to NA if no target values were specified.

The reflim function comes with nine additional arguments that default to meaningful values for a quick overview. They are accessible via the help file (?reflim) and can be changed if needed. For example, if you want the graphic to be nicely labeled and the targets from the targetvalues dataset to be verified, you can specify the main, xlab, and targets arguments. And if you want to limit the output of the function to some essentials, you can specify what you want to see with a $ after the closing parenthesis.

reflim(x, main = "bilirubin", xlab = "µmol/L", targets = targetvalues[4, 5 : 6])$interpretation

#>          lower.limit          upper.limit 
#> "markedly increased" "slightly decreased"

Interpretation with traffic light colors

If target values have been specified, the other two functions of group 1 are used to check whether the calculated reference limits are within the permitted tolerance limits and assigns traffic light colors to visually indicate any deviations. Green means that the calculated limit is within the tolerance interval of the target, yellow means that the calculated limit falls outside but the two tolerance intervals overlap, and red means that there is no overlap between the tolerance intervals.

ri_hist(), permissible_uncertainty() and interpretation() are called by the reflim function, but can also be used independently for the evaluation of reference limits obtained by other methods (e.g., from the refineR package).

Three steps to the result

As already mentioned, reflimR performs an indirect three-step approach. The underlying functions form the group 2 in the above table. Here is an overview with some practical examples.

Step 1: To define an appropriate distribution model, Bowley’s quartile skewness coefficient [3] is calculated for the original and the logarithmized data, using the bowley function from group 3. The underlying algorithm is quite robust against pathological outliers, because it is calculated from the central 50 percent of the data (the “box” of Tukey’s boxplot), where the influence of pathological values is minimized [4]. If the skewness coefficient is close to 0, the distribution is symmetric and can be approximated by a normal distribution (at least in the range of interest between the 2.5th and 97.5th percentiles). If, however, the skewness coefficient is clearly positive, the distribution is right-skewed. If the shape of the curve becomes more symmetric after log-transformation of the data, the lognorm function indicates that the distribution can be better approximated by a lognormal model.

Here is an example of a symmetric distribution from the albumin data using the lognorm function:

lognorm(livertests$ALB, main = "albumin", xlab = "g/L")

#> $lognormal
#> [1] FALSE
#> $BowleySkewness
#>    normal lognormal     delta 
#>     0.019    -0.019     0.038

The skewness coefficient of the original values is 0.019 and the shape of the black curve is not much changed after log-transformation (blue). The difference of the two skewness coefficients is 0.038, i. e. below the threshold of 0.05, which has been defined as a fixed value in the lognorm function [4].

In contrast, alanine aminotransferase is an example for an asymmetric distribution:

lognorm(livertests$ALT, main = "alanine aminotransferase", xlab = "µmol/L")

#> $lognormal
#> [1] TRUE
#> $BowleySkewness
#>    normal lognormal     delta 
#>     0.207     0.036     0.171

Here, the original distribution curve (black) is clearly right-skewed with a skewness coefficient of 0.207, and becomes more symmetrical after logarithmization of the data (blue). The delta of the skewness coefficients is 0.171 and thus above the threshold of 0.05.

If a lognormal distribution is recommended in step 1, the values are logarithmized by the reflim function so that the normal distribution assumption can be used in the next two steps.

Step 2: Subsequently, the data set is truncated in an iterative way such that the presumably pathological values are removed as best as possible and the central 95% of the presumably non-pathological values remain without substantial losses. To achieve this, we adopt the iboxplot() algorithm, [5], which is again based on Tukey’s boxplot. Starting with the central 50 percent of the data (i. e. the values between the first and the third quartile), the theoretical 2.5th and 97.5th percentiles of a Gaussian distribution are calculated and values beyond these boundaries are removed. This algorithm is repeated until no more outliers are removed. After the first truncation step, the calculation of the 2.5th and 97.5th percentiles is adapted to the fact that the data vector has already been truncated [5].

Here is an application of the iboxplot function using the bilirubin dataset as an example:

x <- livertests$BIL
x1 <- iboxplot(x, main = "bilirubin", xlab = "µmol/L")

The upper boxplot (white) shows a considerable amount of dots outside the whiskers, which are effectively removed with two cycles of the iboxplot algorithm (blue boxplot). Between cycles 2 and 3, no more values are removed, so the algorithm stops at this point:

#>          n min   max
#> cycle0 612 0.8 209.0
#> cycle1 534 2.9  18.6
#> cycle2 522 2.9  17.4
#> cycle3 522 2.9  17.4

The iboxplot function derives the estimated percentage of “normal values” from the length of the truncated vector x1$trunc by dividing it by 0.95:

#> [1] 89.8

Notably, the lower and upper truncation points are rough estimates of the reference limits:

#> lower upper 
#>   2.9  17.4

Step 3: To improve the robustness of this estimate even further, a normal quantile-quantile plot (Q-Q plot) is created from the truncated data [4]. The function truncated_qqplot determines the mean and standard deviation from the intercept and the slope of the regression line and extrapolates the 2.5th and 97.5th percentiles from the equation mean ± 1.96 sd. These percentiles represent the estimated reference interval, and thus the final result.


#> $result
#>   meanlog     sdlog lower.lim upper.lim 
#>      1.96      0.50      2.67     18.91 
#> $lognormal
#> [1] TRUE

If the calculation was carried out with logarithmized values, the resulting reference limits are automatically delogarithmized.

As already mentioned in step 2, the truncation points of step 2 are close to the final result of step 3, but they are not identical. The reason is that extrapolating the respective percentiles from the linear Q-Q plot obviously stabilizes the result and makes it more robust against minor overlaps with pathological values at the ends of the regression line. To achieve this, only the central 90 percent of the Q-Q plot are used for the calculation of the regression line.

Quality Assessment

If you want to visualize the whole process at a glance, you can set the plot.all argument of the reflim function to TRUE to display four graphics. Here is an example of GGT results for women:

ggt.f <- livertests$GGT[livertests$Sex == "f"] 
reflim(ggt.f, plot.all = TRUE, n.min = 150,
       targets = targetvalues[7, 3 : 4],
       main = "GGT (f)", xlab = "U/L")$interpretation

#>          lower.limit          upper.limit 
#> "slightly increased"   "within tolerance"

In the present case, the Q-Q plot is quite linear, so the reference interval of 6.8 to 38.7 U/L seems credible. The respective target values are 6 to 40 U/L.

The linearity of the Q-Q plot is a visual measure of the data quality. If you repeat the above calculation with the respective values for men, you will obtain a density curve with a small pathological fraction that partially overlaps with the “box” of the boxplot:

ggt.m <- livertests$GGT[livertests$Sex == "m"] 
ln <- lognorm(ggt.m, main = "GGT (m)", xlab = "U/L")
arrows(46, 0.015, 46, 0.02, code = 1, length = 0.1, lwd = 2)

This fraction cannot be completely removed by the iboxplot algorithm:

xtrunc.m <- iboxplot(ggt.m, xlab = "U/L")$trunc 
arrows(55, 0.007, 55, 0.012, code = 1, length = 0.1, lwd = 2)

As a result, the Q-Q plot shows visible deviations from linearity:

qq.m <- truncated_qqplot(xtrunc.m) 
arrows(1.5, log(30), 1.5, log(45), code = 2, length = 0.1, lwd = 2)

Therefore, the estimated reference interval of 9.4 to 61.4 U/L should be interpreted with caution. In such cases, we recommend to repeat the test using a comparative method (e.g. refineR) and to consult the literature.


The reflimR package provides a simple and robust three-step procedure for the estimation of reference intervals from routine laboratory data that may contain an unknown proportion of pathological results. Due to the fact that the initial two steps of the algorithm are based on quartiles (i. e. the central 50 percent of the original results), this proportion should not exceed 25 percent on one side of the distribution.

Compared to many other algorithms, the reflim function is extremely fast. This is mainly achieved by the fact that it does not use elaborate procedures to optimize a theoretical distribution model: For symmetric distributions, it assumes a normal distribution in the range between the 2.5th and 97.5th percentiles, and for right-skewed distributions, the data are logarithmized to approximate a lognormal distribution in the same range. Left-skewed distributions are assumed to be mixtures of a symmetric distribution and pathological outliers on the left side. If the distribution type is unclear, you can assume a lognormal distribution (i. e. set the lognorm argument to TRUE) without risking large errors in the estimation [7].

Due to an intuitive traffic light color scheme, the estimated reference limits can easily be verified against target values from external sources (e. g., from package inserts or handbooks). The assessment of any deviations is based on the degree of overlap between the corresponding tolerance intervals, which makes the interpretation easy and transparent.

Our method falls into the category of the so-called “modified Hoffmann approaches”, which are derived from the original method published by RG Hoffmann in 1963 [8]. The common element of these methods is that they are based on the evaluation of a regression line obtained by comparing the distribution of observed values with a standard normal distribution. In the original Hoffmann method, a probability-probability plot is generated, whereas most of the modifications including ours use a quantile-quantile plot.

Other modifications concern a) the transformation of the original data to approximate a normal distribution, b) the type of truncation to eliminate potentially pathological outliers, and c) the identification of the linear part of the regression line. In the original Hoffmann method [8] as well as in our own original modification [6], this straight part was identified by visual inspection, whereas the method included in this package is fully automated and independent of any subjective judgment.


  1. Haeckel R et al. Permissible limits for uncertainty of measurement in laboratory medicine. Clin Chem Lab Med 2015;53:1161–71. .

  2. Haeckel R et al. Equivalence limits of reference intervals for partitioning of population data. J Lab Med 2016; 40: 199-205. .

  3. Bowley, AL (1920). Elements of Statistics. London : P.S. King & Son, Ltd.

  4. Klawonn F, Hoffmann G, Orth M. Quantitative laboratory results: normal or lognormal distribution. J Lab Med 2020; 44: 143–50. .

  5. Klawonn F, Hoffmann G. Using fuzzy cluster analysis to find interesting clusters. In: L.A. Garcia-Escuderoet al. (eds.): Building bridges between soft and statistical methodologies for data science. Springer, Cham (2023), 231-239. .

  6. Hoffmann G et al. Simple estimation of reference intervals from routine laboratory data. J Lab Med 2016. .

  7. Haeckel R, Wosniok W. Observed, unknown distributions of clinical chemical quantities should be considered to be log-normal: a proposal. Clin Chem Lab Med. 2010; 48: 1393–6 .

  8. Hoffmann RG. Statistics in the practice of medicine. J Am Med Assoc 1963; 185: 864–73.