The syntax of `spdesign`

is made to be intuitive and easy,
while at the same time very flexible. To achieve this, we use a
combination of regular expressions and R’s powerful expression parser
“under the hood” to translate the user-specified utility functions into
objects that can be manipulated by the package.

It is impossible to write checks for all possible edge cases of inputs. Therefore, you are strongly encouraged to follow the recommendations in this manual carefully when defining your utility functions.

In `spdesign`

, all parameters *must* have a prior.
This is a deliberate design choice to make it clear that using a 0 prior
is also an assumption in our design. The name of the prior is followed
by a square bracket, which contains the assumed value of the prior. All
priors have to start with `b_`

. This is to separate them from
attributes and to easily allow for non-linear utility functions.

If no prior is specified for a given parameter, an error is returned. If the parameter is generic, i.e. it enters multiple utility expressions, then you only need to specify the prior once. It is strongly recommended that priors for generic parameters are specified the first time they are used (this is a good tip for attributes as well!).

For example, to specify a parameter `b_x1`

with a prior of
`0`

, we would write:

`b_x1[0]`

We can specify a Bayesian prior by specifying the distribution of the
prior. For example, `uniform_p(-1, 1)`

means that we allow
the prior to follow a uniform distribution between -1 and 1.

`"b_x1[uniform_p(-1, 1)]"`

We can use the following distributions:

- Normal -
`normal_p(mean, sd)`

- Log-normal -
`lognormal_p(mean, sd)`

- Triangular -
`triangular_p(location, spread)`

- Uniform -
`uniform_p(min, max)`

All distributions are specified with a mean (location) or standard deviation (spread). In the case of the log-normal distribution we are specifying the mean and standard deviation of the underlying normal distribution.

The syntax for specifying attributes and levels is similar to how we
specify parameters and priors. The attribute name is specified to the
left and levels are specified inside `[]`

. Below, the
attribute `x_1`

can take on the levels 1, 2, 3, 4 and 5.

`"x1[1:5]"`

When we specify the levels, we can make use of R’s built in functions for generating sequences and vectors. For example, the following three specifications are identical:

```
# Specification 1
"x1[1:5]"
# Specification 2
"x1[seq(1, 5, 1)]"
# Specification 3
"x1[c(1, 2, 3, 4, 5)]"
```

However,

`"x1[1, 2, 3, 4, 5]"`

is not a valid specification and will throw an error. The error will most likely be of the form:

```
Error in parse(text = x) : <text>:1:2: unexpected ','
1: 1,
^
```

While we have not written a new intuitive error message for this misspecification, it will be caught by R’s parser and we feel that this is sufficient. As a compromise, we state the most likely error here to help users catch why their specifications fail.

*Important:* Make sure that all attribute levels are numeric.
Even in the cases where you use the `_dummy`

coding syntax it
is **always** safer to use numeric values for the
levels.

We combine the parameter and attribute as in the example below to
start building our utility expression. We recommend that people use the
naming convention `b_x1`

being the parameter for the
attribute `x1`

. It has no practical implication for how the
code works or interprets the utility function, but it makes it much
easier for the user to see which parameter goes with which attribute. It
is also important that the parameter `b_x1`

precedes the
attribute `x1`

. There is currently no checks written to
ensure this, but failing to do so may cause unintended errors in certain
instances. For example, if you are using the dummy-coding shorthand
detailed below.

Below is an example:

`"b_x1[0] * x1[1:5]"`

It is **strongly** recommended to use identical names
for priors and attributes, with the exception of “b_” for priors. This
makes reading of the code easier.

The utility functions take the form of named lists. Let’s say that we would like to create a design comprising two alternatives with three attributes each and a status quo. We want our first attribute to take on the five levels 1, 2, 3, 4, and 5; our second attribute to take on the levels 0 and 1; and our third attribute to take on the values betwween 0 and 1 in .25 increments. One way to specify this would be the following:

```
utility <- list(
alt1 = "b_x1[0.1] * x1[1:5] + b_x2[0.4] * x2[c(0, 1)] + b_x3[-0.2] * x3[seq(0, 1, 0.25)]",
alt2 = "b_x1 * x1 + b_x2 * x2 + b_x3 * x3",
alt3 = "b_sq[0.15] * sq[1]"
)
```

Where the status quo or opt out alternative has a single constant. It is important that when you add constants the level is 1. You cannot currently specify specific levels of the attributes for the SQ. Since these do not vary, this is equivalent to a constant (which can be calculated beforehand, see more details below) because only differences in utility matter for the model.

To generate a design, we need to pass the utility function along with
other options to the `generate_design`

function. Currently,
we can only generate designs for the MNL model. To see the full set of
options for the function: `?generate_design`

.

```
design <- generate_design(utility, rows = 20,
model = "mnl", efficiency_criteria = "d-error",
algorithm = "rsc", draws = "scrambled-sobol",
control = list(
max_iter = 21000
))
```

The list `control`

contains options that can be passed
along to the function to change default values. The default values and
options are:

```
default_control <- list(
cores = 1,
max_iter = 10000,
max_relabel = 10000,
max_swap = 10000,
efficiency_threshold = 0.1,
sample_with_replacement = FALSE
)
```

No default algorithm for generating a design is implemented because
they have different properties with respect to how easy it is to achieve
attribute level balance (see more details below) and how easy it is to
include restrictions/exclusions in the design (see more details below).
Three algorithms are implemented: `rsc`

,
`federov`

, and `random`

.

The `rsc`

algorithm creates an initial design candidate
based on the provided attributes, levels and number of rows. This design
candidate is attribute level balanced by default (if the number of
attribute levels for all attributes are a multiple of the number of
rows) or near attribute level balanced. It will not deviate from this.
It finds new candidates by relabling or swapping attribute levels. The
default is to cycle between relabling and swapping each 10000
candidates. The cycling part of the algorithm is not included because it
is rarely used in practice. You cannot include restrictions or
exclusions in your design when using the `rsc`

algorithm. For
details: `?rsc`

The `federov`

and `random`

algorithms will
systematically or randomly select potential design candidates from the
candidate set. The candidate set can be either supplied or assumed to be
the full factorial. These algorithms completely lets go of attribute
level balance and you do run the risk of some attribute levels not
showing up in the final design. We show syntax below to include level
occurrence restrictions below. You can easily include restrictions in
your design using these to algorithms by excluding attribute level
combinations from the candidate set. That way, they will never be part
of the design.

Note that in general, both restrictions and attribute level balance may lead to less efficient designs and that if your restrictions are too tight and you are trying to force attribute level balance, no designs may be found. Few checks are made to notify the user of this.

Blocking the design is done after the design has been generate and
will add a blocking column to the design. The following will block the
design into 4 trying to minimize the average squared correlation between
the blocking column and each of the design columns. More information:
`?block`

`design <- block(design, 4)`

The design object is stored as a list of class `spdesign`

.
Individual parts can be accessed using the `$`

operator.
Functions such as `summary`

, `coef`

,
`print`

, `vcov`

and `cor`

all work as
expected.

As stated above, the `rsc`

algorithm wil ensure attribute
level balance, whereas the `federov`

and `random`

algorithms will not. However, there is some flexibility in how this is
specified. The following will only work for the `federov`

and
`random`

algorithms and will be ignored without warning if
the `rsc`

algorithm is used.

The implicit restriction in the package is that each attribute level
can occur between 0 and `rows`

number of times. For example,
if you have a design with 20 rows, then the attribute level can occur
between 0 and 20 times. In practice, anywhere between never and always.
The package does provide the user with the ability to specify a narrower
range.

To do this, we would specify a parenthesis behind the attribute levels where we give the range that the attribute levels can take. Assume that our design has 18 rows. The following specification would force attribute level balance by saying that each level has to occur exactly 6 times.

`"b_x1[0] * x1[c(0, 1, 2)](6)"`

As stated, it is not always possible to get attribute level balance
and indeed it can be quite hard with the `federov`

and
`random`

algorithms. Specifying a range may be better. For
example, we could specify that all attribute levels should occur between
5 and 7 times.

`"b_x1[0] * x1[c(0, 1, 2)](5:7)"`

In very special cases, it may be desirable to let some attribute levels occur more frequently than others and we could specify:

`"b_x1[0] * x1[c(0, 1, 2)](3:5, 5:7, 6:9)"`

which would mean that 0 occurs between 3 and 5 times, 1 between 5 and 7, and 2 between 6 and 9 times. Indeed, in the case where we only specify a single range, it is expanded to be equal to the number of attribute levels such that:

`"b_x1[0] * x1[c(0, 1, 2)](5:7)"`

is equivalent to

`"b_x1[0] * x1[c(0, 1, 2)](5:7, 5:7, 5:7)"`

A few important things to note:

- The sum of minimum level occurrences must be less than
`rows`

and the sum of maximum level occurrences must be larger than`rows`

. Otherwise, no design can be found.*No checks*are implemented for this. - You can only specify either a single number or range, or number or
ranges equal to the number of levels. Deviating from this may have
unintended consequences and
*no checks*are implemented. - You can only specify the number of times an attribute should occur in the design at the same time as you specify the number of levels,

If you believe that there are interaction effects between attributes in your model, then these will have to be considered at the design stage. Any design that is not a full factorial assumes that some higher order interaction effects are insignificant. To ensure that you are able to identify the effect of an interaction in your data, this will have to be included in the design. To include an interaction term in the design, we can use the following syntax.

`"b_x1[0] * x1[c(0, 1)] + b_x2 * x2[c(0, 1)] + b_x1x2 * I(x1 * x2)"`

Two things to note about the specification of the utility function
with interactions. First, the levels of the attributes are specified
when the attribute is first introduced and not in the interaction term
itself, and two, for the parser to correctly interpret an interaction
term, it needs to be wrapped in R’s internal function for handling
interactions `I()`

.

Often we have attributes in mind that are categorical in nature,
e.g., comfort or environmental quality. In the design, it is often
desirable to let these be dummy-coded. We can easily specify an
attribute to dummy-coded using the `_dummy`

syntax. Assume
that we have a single categorical attribute that can take on three
levels. We would then specify the utility functions as follows:

```
V = list(
alt1 = "b_x1_dummy[c(0.2, 0.5)] * x1[c(1, 2, 3)]",
alt2 = "b_x1_dummy * x1"
)
```

Here we have specified 2 priors and 3 levels. The dummy coding expands the attribute levels by transforming the attribute into a factor and expanding it using a formula and the model.matrix. The first level is chosen as the baseline and dropped. Similarly, the priors are expanded into separate priors for each level. This also works with Bayesian priors.

It is recommended to use 1, 2, 3, 4 for the levels of a dummy-coded attribute. Note that the levels don’t matter, but this naming convention secures that there is a correspondence between the names for the priors and attributes.

A few things to note about using the `_dummy`

syntax

- The priors and levels must be specified simultaneously.
- You must specify one more level for the attribute than priors for the parameter.
- You cannot use
`_dummy`

as an extension for your attribute name. - If you are using specified levels for the status-quo or opt out alternative, make sure that this corresponds to the base level and is coded as a 0. See more details below.
- If you are using alternative specific dummy-coded attributes, make
sure that you differentiate the names of the priors
*before*the`_dummy`

extension. Placing anything between`_dummmy`

and`[`

will cause the code to fail.

When using the `federov`

or `random`

algorithms, we can pass along a list of exclusions to
`generate_design`

. The exclusions will exclude the specified
combinations from the full factorial prior to generating design
candidates. The exclusions take the form of an unamed list of strings.
You must ensure that the names of the list corresponds to the names of
the expanded attribute levels (see
`?expand_attribute_levels`

).

For example:

```
exclusions = list(
"alt1_x1 == 2 & alt1_x2 == 0 & alt1_x3 == 0",
"alt2_x2 == 1 & alt2_x3 == 1"
)
```

You cannot currently specify fixed SQ levels. Specifying new attributes that can only take on a single level for the SQ leads to a computationally singular Hessian matrix (because there is no variation). To get around this, you can calculate what the size of the SQ constant would be given the defined attribute levels for the SQ and the assumed prior and include this as a single constant for the SQ alternative. This will ensure that the correct utility differences are used in the design.

A fix with additional level restrictions will be included in a future version of the package.

Remember, that when you specify fixed levels for the SQ, you cannot also specify alternative specific constants for the non-SQ alternatives. This is a violation of the J-1 ‘rule’ and will result in a singular Fisher information matrix.

There is no point specifying SQ levels that only take on the value of 0. These have no bearing on the design and may be problematic when inverting the Fisher information matrix.

Related to the above point, in the case where your attribute is dummy-coded the easiest is to define it such that the base level of your dummy-coded attribute corresponds to the level of the SQ. Remember, that in your design, these only take on the values of 1 and 0 with the base level being zero for all non-base-level levels. Defining the base level to be equal to the SQ obviates the need to explicitly specify the SQ beyond a constant.

Creating large designs can be beneficial, however, if you have many attributes and levels, the size of the full factorial can be too large to hold in memory leading to an error of the type: “Error: vector memory exhausted (limit reached)”. There are two possible solutions to fix this:

- You can supply a candidate set that is of a sufficient size
- You can use the
`rsc`

algorithm instead, which will obviate the need to have the full factorial present.

You can supply a candidate set to use with the modified federov and
random design algorithms. There are 2 important things with respect to
the candidate set that `spdesign`

will check for:

- That the names are compliant with the package. This means that the
names of the variables in the supplied candidate set must follow a
specific naming convention. The name consists of two parts: i) the name
of the utility function, which are the names given to each utility
function element in the list of utility functions, and ii) the name of
the attribute. These are combined using an underscore giving the
following format:
_ . For example, if you have the following set of utility functions:

```
V = list(
alt1 = "b_x1_dummy[c(0.2, 0.5)] * x1[c(1, 2, 3)]",
alt2 = "b_x1_dummy * x1"
)
```

Then the column names of the candidate set would be:
`alt1_x1`

and `alt2_x1`

.

- That all attribute levels specified in the utility functions are in the candidate set. A mismatch here will result in an error.
- That there are not more (unneccesasry) columns in the candidate set that are not used by the utility expression. This is caught by an error message.

The candidate set is passed in as a `data.frame`

using the
`candidate_set`

argument to the function
`generate_design`

.