#### Computation of the Transition Probabilities

```
# note:
# all parameters except for maximum noise D and variance V have default values
ptab <- create_cnt_ptable(D = 2, V = 1)
```

The minimum set of parameters that have to be specified are:

`D`

: the maximum noise/perturbation (a positive integer)
and
`V`

: the noise or perturbation variance (a positive
integer).

The result of the above function is an object of class “ptable” which
contains the following slots:

```
str(ptab)
#> Formal class 'ptable' [package "ptable"] with 8 slots
#> ..@ tMatrix : num [1:3, 1:5] 1 0.3665 0.0638 0 0.3665 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : chr [1:3] "0" "1" "2"
#> .. .. ..$ : chr [1:5] "0" "1" "2" "3" ...
#> ..@ pClasses : num [1:3] 0 1 2
#> ..@ pTable :Classes 'data.table' and 'data.frame': 10 obs. of 7 variables:
#> .. ..$ i : num [1:10] 0 1 1 1 1 2 2 2 2 2
#> .. ..$ j : num [1:10] 0 0 1 2 3 0 1 2 3 4
#> .. ..$ p : num [1:10] 1 0.3665 0.3665 0.1676 0.0995 ...
#> .. ..$ v : num [1:10] 0 -1 0 1 2 -2 -1 0 1 2
#> .. ..$ p_int_lb: num [1:10] 0 0 0.366 0.733 0.901 ...
#> .. ..$ p_int_ub: num [1:10] 1 0.366 0.733 0.901 1 ...
#> .. ..$ type : chr [1:10] "all" "all" "all" "all" ...
#> .. ..- attr(*, ".internal.selfref")=<externalptr>
#> .. ..- attr(*, "intervals")= chr "default"
#> ..@ empResults:Classes 'data.table' and 'data.frame': 3 obs. of 6 variables:
#> .. ..$ i : num [1:3] 0 1 2
#> .. ..$ p_mean: num [1:3] 0 0 0
#> .. ..$ p_var : num [1:3] 0 0.932 1
#> .. ..$ p_sum : num [1:3] 1 1 1
#> .. ..$ p_stay: num [1:3] 1 0.366 0.383
#> .. ..$ iter : int [1:3] 0 20 1
#> .. ..- attr(*, ".internal.selfref")=<externalptr>
#> .. ..- attr(*, "sorted")= chr "i"
#> ..@ pParams :Formal class 'ptable_params' [package "ptable"] with 12 slots
#> .. .. ..@ D : int 2
#> .. .. ..@ V : num 1
#> .. .. ..@ js : int 0
#> .. .. ..@ ncat : int 2
#> .. .. ..@ pstay: num [1:2] NA NA
#> .. .. ..@ optim: int [1:2] 1 1
#> .. .. ..@ mono : logi [1:2] TRUE TRUE
#> .. .. ..@ table: chr "cnts"
#> .. .. ..@ icat : int [1:2] 1 2
#> .. .. ..@ step : int 1
#> .. .. ..@ type : chr "all"
#> .. .. ..@ label: chr "D2V100"
#> ..@ tStamp : chr "20230301164637"
#> ..@ type : chr "all"
#> ..@ table : chr "cnts"
```

The most relevant and important slots of the object are:

`tMatrix`

: A transition matrix that describes the
perturbation probabilities from one state (original frequency count) to
another (target frequency count) .
`pTable`

: Data table needed for the lookup step of a SDC
tool that can apply random noise to statistical tables (e.g. the **cellKey**
package or the software **Tau-Argus**).
However, in the following sections there will be explained, how the
tables can be used or exported by auxiliary functions.
`pParams`

: The input parameters that result from the
preceding function `pt_create_pParams()`

.
`empResults`

: A data frame for output checking of the
constraints.

#### The Transition Matrix

Let’s have a look at the transition matrix (i.e. at the slot
`@tMatrix`

) of the object `ptab`

:

```
# note: to look at a specific slot, just name the object and add the
# corresponding slot with a leading "@"
ptab@tMatrix
#> 0 1 2 3 4
#> 0 1.00000000 0.0000000 0.0000000 0.00000000 0.00000000
#> 1 0.36648551 0.3664855 0.1675725 0.09945652 0.00000000
#> 2 0.06382714 0.2446915 0.3829628 0.24469145 0.06382714
```

Each row of the transition matrix represents the noise distribution
for an original frequency count. The probability that an original
frequency count of 1 becomes a 3 (i.e. the 1 is perturbed with a noise
value of +2) is 0.0994565.

```
diag(ptab@tMatrix)
#> 0 1 2
#> 1.0000000 0.3664855 0.3829628
```

The main diagonal shows the preservation probabilities. These are the
probabilities that the original frequencies remain unchanged. In this
instance, the probability that an original frequency 2 remains unchanged
is 0.3829628.

#### Symmetry - and what does it mean in the context of perturbation
tables?

As you may have recognized, the transition matrix has a finite number
of rows (that represent original frequency counts) and columns (that
represent the target frequency counts).

```
# let's have a look at the number of different original positive frequency
# counts that will be treated
params <- slot(ptab, "pParams")
params@ncat
#> [1] 2
# if this number is added by +1 (for the zero count) we get
params@ncat + 1
#> [1] 3
```

The number depends on both, the maximum perturbation `D`

and the threshold value `js`

. The last row of the transition
matrix delivers a symmetric distribution. This distribution will be
applied to each original frequency larger than this frequency.

```
# the object @pClasses shows all original frequencies
# that have their own distribution
ptab@pClasses
#> [1] 0 1 2
# symmetry is achieved for the original frequency count i=...
max(ptab@pClasses)
#> [1] 2
# or
ptab@pClasses[params@ncat + 1]
#> [1] 2
```

In the given example, each frequency count larger than 2 will be
perturbed according to the distribution for i=2. For example, in case of
i=3 the distribution reads as follows

```
#> 1 2 3 4 5
#> 0.06382714 0.24469145 0.38296282 0.24469145 0.06382714
```

or in case of i=12326

```
#> 12324 12325 12326 12327 12328
#> 0.06382714 0.24469145 0.38296282 0.24469145 0.06382714
```

Given this symmetry, the transition matrix can be displayed in the
reduced form. There is no need to define more rows than up to the case
of symmetry.

#### Output Checking and Troubleshooting

Next, we will check the empirical results that can be used for
troubleshooting:

```
ptab@empResults
#> i p_mean p_var p_sum p_stay iter
#> 1: 0 0 0.00000 1 1.00000 0
#> 2: 1 0 0.93188 1 0.36649 20
#> 3: 2 0 1.00000 1 0.38296 1
```

The matrix has the following columns:

`i`

: indicates the original frequency to which the
remaining columns are referred to
`p_mean`

: shows the bias of the perturbation of an
original frequency (should be zero: unbiasedness)
`p_var`

: shows the noise variance of an original
frequency (**Note:** If p_var differs from the chosen input
parameter `V`

- as it does in the example above - we have a
violation of the fixed variance condition. In that case, we have to set
a different variance parameter or to change other parameters!)
`p_sum`

: the sum of the transition probabilities for an
original frequency count must equal to 1
`p_stay`

: corresponds to the diagonal of the transition
matrix
`iter`

: any value other than 1 points out discrepancies
(e.g. violation of the fixed variance parameter)

As can be seen in the example, the preset variance of
`V=1`

does not hold for the original frequency
`i=1`

. To fulfill the condition of a fixed variance, we
re-run the computation with a different variance parameter. Let’s try a
variance parameter `V=0.9`

:

```
ptab_new <- create_cnt_ptable(D = 2, V = 0.9)
ptab_new@empResults
#> i p_mean p_var p_sum p_stay iter
#> 1: 0 0 0.0 1 1.00000 0
#> 2: 1 0 0.9 1 0.36250 1
#> 3: 2 0 0.9 1 0.40928 1
```

The new computation with the updated variance parameter results in a
perturbation table that fulfills all conditions. Of course, the
resulting transition matrix now differs from the first one:

```
ptab_new@tMatrix
#> 0 1 2 3 4
#> 0 1.00000000 0.0000000 0.0000000 0.0000000 0.00000000
#> 1 0.36250000 0.3625000 0.1875000 0.0875000 0.00000000
#> 2 0.05154609 0.2438156 0.4092765 0.2438156 0.05154609
```