The `{LST}`

package provides support for the style of R
computing used in the textbook, *Lessons
in Statistical Thinking*. This style seeks to reduce the
cognitive load on students by reducing to a minimum the number of R
functions and the syntax needed to undertake a complete course that
includes (simple) data wrangling, visualization, modeling, and causal
simulation. At the same time, the style supports using statistical
inference in an informal way from the very beginning of the course,
*gradually* formalizing it over the semester.

This document is oriented toward instructors or strongly motivated
students. The *Lessons* textbook and accompanying blog posts provide an
introduction for the typical student. The reader of this document should
already know at least a little about R: basics of data frames as well as
functions and function calls, named arguments, and R “formulas” (such as
`mpg ~ hp + cyl`

) which are called **tilde
expressions** in *Lessons*. (Statistics students need to
use mathematical formulas from time to time, so best not confuse math
formulas with an unneeded name for an R syntactical structure.)

R commands in *Lessons* are well exemplified by the following:
generating a plot of two variables in a data frame and then annotating
the plot with a simple linear model.

The command illustrates several features of the style of commands in
*Lessons*:

The basic structure involves

*piping*a data frame into a function. The pipeline structure is used almost exclusively in*Lessons*. For the reader not acquainted with the R pipe, the object on the left-hand side of the pipe token`|>`

becomes the*first*argument to the function call on the right-hand side.- The left-hand side of the pipe—the
**input end**of the pipe–will very often be a data frame, but a handful of other types are used in that role in*Lessons*. (More on this later.) - The right-hand side—the
**output end**of the pipe–will*always*be a function call.

- The left-hand side of the pipe—the
*Variables*in the data frame are referred to by unquoted name. Such variable names are only used in the role of an un-piped argument to the right-hand side function call.- The
`$`

notation is*never*used in any setting. - For situations in which there is a
*response*variable (as in modeling or graphics), the variable names always appear in*tilde expressions*.

- The
Often, a tilde expression will be the

*only*thing inside the parentheses that follow the function name. But sometimes additional details for the function will be added within the parentheses. In the above example command, the detail to add a statistical model as an annotation is specified by the argument `annot = “model”.`point_plot()`

is a omnibus graphics command sufficient for teaching an entire statistics course that includes inference and covariation.- The output is
`{ggplot2}`

compatible. - Other such omnibus commands from
`{LST}`

seen in*Lessons*are`model_train()`

,`sample()`

, and`trials()`

. - Wrangling functions from
`{dplyr}`

are also occasionally used, especially`mutate()`

and`summarize()`

. - Summaries of statistical models are made with the
`{LST}`

function`conf_interval()`

and occasionally`R2()`

. (Near the end of the course,`regression_summary()`

and`anova_summary()`

are introduced, but these play only a very minor, optional role in the course.)

- The output is

I’m using the term **input end** of a pipe to refer to
the object on the left-hand side of the `|>`

pipe token.
There are only a handful of types presented to the input end of the
pipe:

- A data frame is by far the most common input type.
- A statistical model is an input type used frequently in the second-half of the course.
- A “data simulation” is another kind of input type.
- Optionally, and mainly for enrichment, a data graphic frame (as
produced by
`point_plot()`

) is used at the input of the pipe towards a command to add labels or to add another`{ggplot2}`

layer.

Just as *input end* refers to the object provided at the
left-hand side of the pipe, the object produced by the function call on
the right side is the pipe’s output. Two essential points about pipe
output-ends:

The R command given on the right-hand side of

`|>`

will**always**be a function call. No exceptions. A function call consists of the name of a function (e.g.,`point_plot`

or`model_train`

) followed by an open/closed pair of parentheses. Usually, there is something such as a*tilde expression*contained in the parentheses, but there are often additional**named arguments**such as the`annot = "model"`

in the example command presented in @sec-command-template.The function call in (a) produces an R object. For the

`{LST}`

functions, this object will always be one of the four types presented in the previous section (data frame, model, data simulation, graphics frame).

The object produced by the function call at the output end of the
pipe `|>`

can provide the input, *via* another
pipe, to another function call. This technique is often used for data
wrangling or when summarizing a model. For instance, the following
converts fuel “economy” (`mpg`

) into fuel
*consumption* (liters per 100 km), which is then used as the
response variable in a model.

```
mtcars |>
dplyr::mutate(consumption = 235.2 / mpg) |>
model_train(consumption ~ hp + wt) |>
conf_interval()
#> # A tibble: 3 × 4
#> term .lwr .coef .upr
#> <chr> <dbl> <dbl> <dbl>
#> 1 (Intercept) -0.486 1.48 3.45
#> 2 hp 0.00647 0.0176 0.0287
#> 3 wt 1.92 2.70 3.48
```

For convenience, the `add_plot_labels()`

function will
modify the labels in a plot, taking as input a plot (as produced by
`point_plot()`

, for instance) and returning as output the
modified plot. (Perhaps of interest to those familiar with
`{ggplot2}`

… `add_plot_labels()`

is merely a
wrapper on `ggplot2::labs()`

that avoids the non-standard
`+`

pipe system.)

The flow of computation in a pipeline runs from left to right. The
output object from the last stage of the pipeline will, by default, be
printed. The alternative is to store that output object under a name,
using the “**storage arrow**”, like this:

*storage_name* `<-`

*pipeline*

In R, the form in which an object is printed is controlled by the programmer. Graphics are typically “printed” by displaying the graphic in an appropriate place. Data frames are typically printed as text.

In addition to data frames and graphics, *Lessons* deals
frequently with two other sorts of objects: data simulations and
models.

By default, models are printed as text. There is a wide variety of
formats corresponding to the large number of people who have communally
put together the modeling systems in R. Rather than the hodge-podge of
printed model formats, I encourage users to print specific summaries of
models such as graphs of the model function (use
`model_plot()`

). The numerical model summaries in
`{LST}`

are always printed in data-frame format. Most of the
time in *Lessons*, models are summarized with coefficients and
confidence intervals, a format produced by `conf_interval()`

.
I strongly recommend that model coefficients *always* be shown in
the context of a confidence interval; `conf_interval()`

imposes this policy. Another sometimes useful format of summary is
provided by `R2()`

. Toward the end of the course, the ANOVA
generalization of R^{2} is introduced.
`anova_summary()`

is useful for comparing two or more models.
`regression_summary()`

shows a standard regression report,
but `conf_interval()`

is, I think, a superior format. (If you
feel obliged to show a p-value, use the `show_p = TRUE`

argument to `conf_interval()`

. But I recommend focussing on
whether the confidence interval includes zero, using the
`level =`

argument if you aren’t happy with 0.05.)

Data simulations are printed as text showing the causal formulas relating one variable to the others. Like this:

```
sim_06
#> $names
#> $names[[1]]
#> a
#>
#> $names[[2]]
#> b
#>
#> $names[[3]]
#> c
#>
#> $names[[4]]
#> d
#>
#>
#> $calls
#> $calls[[1]]
#> rnorm(n)
#>
#> $calls[[2]]
#> a + rnorm(n)
#>
#> $calls[[3]]
#> b + rnorm(n)
#>
#> $calls[[4]]
#> c + a + rnorm(n)
#>
#>
#> attr(,"class")
#> [1] "list" "datasim"
```

Instructions for constructing data simulations are given in the
*Simulating data with Directed Acyclic Graphs* vignette of this
package. Many pre-built simulations are provided with this
`{LST}`

package. New ones can be constructed using
`datasim_make()`

. Except in the most straightforward cases,
such construction is an instructor-level task.

A popular feature of the `{mosaic}`

package is the
`do()`

function, which provides syntax and logic for
repeating a command multiple times, accumulating the results into a data
frame. For instance:

`{LST}`

has updated this functionality to take advantage
of the R built-in pipe notation and the style of arranging model
summaries as data frames. The functionality is provided by the
`trials()`

function. To use it, place `trials()`

at the end of a pipeline:

```
mtcars |>
sample(replace = TRUE) |> # resampling here!
model_train(mpg ~ hp) |>
conf_interval() |>
filter(term == "hp") |>
trials(5)
#> .trial term .lwr .coef .upr
#> 1 1 hp -0.05884886 -0.04350439 -0.02815991
#> 2 2 hp -0.07123911 -0.05567061 -0.04010210
#> 3 3 hp -0.10945900 -0.08387168 -0.05828437
#> 4 4 hp -0.11131359 -0.08981114 -0.06830869
#> 5 5 hp -0.08612844 -0.06705820 -0.04798795
```

You can, of course, take the data frame produced by
`trials()`

to use for later wrangling or graphics.

It’s natural to think of the pipeline leading up to the
`trials()`

stage as creating a single output. But
`trials()`

has a seemingly magical ability to grab the whole
pipeline and run it over and over again. (The “magic” is provided by the
“non-standard evaluation” facilities in R, an advanced programming
construct.)

An excellent way to develop the statement to be repeated: write the
pipeline *excluding* the final `trials()`

stage. Each
time you run the truncated pipeline, you will receive one object. When
this object has the format you seek, add the final `trials()`

stage back in.

`{LST}`

Instructors may feel obliged by convention to introduce the menagerie
of plotting modalities, such as bar charts, line charts, histograms,
etc. *Lessons* was written to use only a single primary graphic
modality: the point plot (a.k.a. “scatter plot”) as produced by
`point_plot()`

. Three other modalities—confidence intervals,
confidence bands, and violin plots of density—are provided by the
annotation feature of `point_plot()`

. These are
`annot = "violin"`

and `"annot = "model"`

.

The output of `point_plot()`

is a `{ggplot2}`

graphical object. Consequently, you can use the various
`{ggplot2}`

functions to set a graphics theme, label axes,
and so on. Unfortunately, `{ggplot2}`

functions use the
`+`

pipe rather than `|>`

. This can be
confusing and frustrating for students.

I recommend use of the `{ggformula}`

package which
re-packages the `{ggplot2}`

facilities in a way that can be
used with the R pipe `|>`

and which employe
tilde-expressions for specifying response and explanatory variables.

`{LST}`

, `{mosaic}`

, and
`{ggformula}`

The MOSAIC suite of packages—including `{mosaic}`

and
`{ggformula}`

—are widely used in teaching statistics with R.
Many of the pedagogical principles behind `{LST}`

are shared
with `{mosaic}`

and `{ggformula}`

. You are of
course welcome to use `{mosaic}`

and `{ggformula}`

along with `{LST}`

, particularly if you want to teach other
graphics modalities than those provided by
`LST::point_plot()`

.

Other than shared pedagogical principles, there is no connection of
`{LST}`

with either `{mosaic}`

and
`{ggformula}`

. One reason for this is the pipe
`|>`

, which prominently features in *Lessons*.
`{mosaic}`

is not pipe-ready because many functions require a
`data =`

argument.

Another reason concerns recent developments in deploying R computing
*via* web pages. The system that provides within-a-web-page R
computing is not yet compatible with `{mosaic}`

or
`{ggformula}`

. For `{LST}`

to work with R embedded
in web pages, `{LST}`

cannot make any use of
`{mosaic}`

/`{ggformula}`

. This situation may
change in the future, at which point `{LST}`

will be able to
acknowledge its aunts and uncles.