This is a brief introduction to the package
` fdapace` (Gajardo et al.
2021). For a general overview on functional data analysis (FDA)
see (Wang, Chiou, and Müller 2016) and key
references for the PACE approach and the associated dynamics are (Yao et al. 2003; Yao, Müller, and Wang 2005; Liu and
Müller 2009; Müller and Yao 2010; Li and Hsing 2010; Zhang and Wang
2016; Zhang and Wang 2018). The basic work-flow behind the PACE
approach for sparse functional data is as follows (see e.g. (Yao, Müller, and Wang 2005; Liu and Müller
2009) for more information):

- Calculate the smoothed mean \(\hat{\mu}\) (using local linear smoothing) aggregating all the available readings together.
- Calculate for each curve separately its own raw covariance and then aggregate all these raw covariances to generate the sample raw covariance.
- Use the off-diagonal elements of the sample raw covariance to estimate the smooth covariance.
- Perform eigenanalysis on the smoothed covariance to obtain the estimated eigenfunctions \(\hat{\phi}\) and eigenvalues \(\hat{\lambda}\), then project that smoothed covariance on a positive semi-definite surface (Hall, Müller, and Yao 2008).
- Use Conditional Expectation (PACE step) to estimate the corresponding scores \(\hat{\xi}\). ie. \(\hat{\xi}_{ik} = \hat{E}[\hat{\xi}_{ik}|Y_i] = \hat{\lambda}_k \hat{\phi}_{ik}^T \Sigma_{Y_i}^{-1}(Y_i-\hat{\mu}_i)\).

As a working assumption a dataset is treated as sparse if it has on
average less than 20, potentially irregularly sampled, measurements per
subject. A user can manually change the automatically determined
` dataType` if that is necessary. For densely
observed functional data simplified procedures are available to obtain
the eigencomponents and associated functional principal components
scores (see eg. (Castro, Lawton, and Sylvestre
1986) for more information). In particular in this case we:

- Calculate the cross-sectional mean \(\hat{\mu}\).
- Calculate the cross-sectional covariance surface (which is guaranteed to be positive semi-definite).
- Perform eigenanalysis on the covariance to estimate the eigenfunctions \(\hat{\phi}\) and eigenvalues \(\hat{\lambda}\).
- Use numerical integration to estimate the corresponding scores \(\hat{\xi}\). ie. \(\hat{\xi}_{ik} = \int_0^T [ y(t) - \hat{\mu}(t)] \phi_i(t) dt\)

In the case of sparse FPCA the most computational intensive part is the smoothing of the sample’s raw covariance function. For this, we employ a local weighted bilinear smoother.

A sibling MATLAB package for ` fdapace` can be
found here.

The simplest scenario is that one has two lists
` yList` and

`tList`

`yList`

`tList`

The generated ` FPCAobj` will contain all the
basic information regarding the desired FPCA.

```
library(fdapace)
# Set the number of subjects (N) and the
# number of measurements per subjects (M)
N <- 200;
M <- 100;
set.seed(123)
# Define the continuum
s <- seq(0,10,length.out = M)
# Define the mean and 2 eigencomponents
meanFunct <- function(s) s + 10*exp(-(s-5)^2)
eigFunct1 <- function(s) +cos(2*s*pi/10) / sqrt(5)
eigFunct2 <- function(s) -sin(2*s*pi/10) / sqrt(5)
# Create FPC scores
Ksi <- matrix(rnorm(N*2), ncol=2);
Ksi <- apply(Ksi, 2, scale)
Ksi <- Ksi %*% diag(c(5,2))
# Create Y_true
yTrue <- Ksi %*% t(matrix(c(eigFunct1(s),eigFunct2(s)), ncol=2)) + t(matrix(rep(meanFunct(s),N), nrow=M))
```

```
L3 <- MakeFPCAInputs(IDs = rep(1:N, each=M), tVec=rep(s,N), t(yTrue))
FPCAdense <- FPCA(L3$Ly, L3$Lt)
# Plot the FPCA object
plot(FPCAdense)
```

`## [1] 5.050606 1.999073`

```
# Create sparse sample
# Each subject has one to five readings (median: 3)
set.seed(123)
ySparse <- Sparsify(yTrue, s, sparsity = c(1:5))
# Give your sample a bit of noise
ySparse$yNoisy <- lapply(ySparse$Ly, function(x) x + 0.5*rnorm(length(x)))
# Do FPCA on this sparse sample
# Notice that sparse FPCA will smooth the data internally (Yao et al., 2005)
# Smoothing is the main computational cost behind sparse FPCA
FPCAsparse <- FPCA(ySparse$yNoisy, ySparse$Lt, list(plot = TRUE))
```

` FPCA` calculates the bandwidth utilized by each
smoother using generalised cross-validation or \(k\)-fold cross-validation automatically.
Dense data are not smoothed by default. The argument

`methodMuCovEst`

`smooth`

`cross-sectional`

The bandwidth used for estimating the smoothed mean and the smoothed
covariance are available under ` ...bwMu` and

`bwCov`

Visualising the fitted trajectories is a good way to see if the new bandwidth made any sense:

` FPCA` uses a Gaussian kernel when smoothing
sparse functional data; other kernel types (eg.
Epanechnikov/

`epan`

`?FPCA`

`$optns\$kernel`

`gauss`

`rect`

` FPCAsparseRect <- FPCA(ySparse$yNoisy, ySparse$Lt, optns = list(kernel = 'rect')) # Use rectangular kernel`

` FPCA` returns automatically the smallest number
of components required to explain 99% of a sample’s variance. Using the
function

`selectK`

```
## $K
## [1] 3
##
## $criterion
## [1] 0.9876603
```

```
## $K
## [1] 2
##
## $criterion
## [1] 993.4442
```

When working with functional data (usually not very sparse) the
estimation of derivatives is often of interest. Using
` fitted.FPCA` one can directly obtain numerical
derivatives by defining the appropriate order

`p`

`fdapace`

`p =1`

`2`

`?fitted.FPCA`

`FPCAder`

`FPCA`

```
fittedCurvesP0 <- fitted(FPCAsparse) # equivalent: fitted(FPCAsparse, derOptns=list(p = 0));
# Get first order derivatives of fitted curves, smooth using Epanechnikov kernel
fittedCurcesP1 <- fitted(FPCAsparse, derOptns=list(p = 1, kernelType = 'epan'))
```

```
## Warning in fitted.FPCA(FPCAsparse, derOptns = list(p = 1, kernelType = "epan")): Potentially you use too many components to estimate derivatives.
## Consider using SelectK() to find a more informed estimate for 'K'.
```

We use the ` medfly25` dataset that this available
with

`fdapace`

`FPCA`

`medfly25`

```
# load data
data(medfly25)
# Turn the original data into a list of paired amplitude and timing lists
Flies <- MakeFPCAInputs(medfly25$ID, medfly25$Days, medfly25$nEggs)
fpcaObjFlies <- FPCA(Flies$Ly, Flies$Lt, list(plot = TRUE, methodMuCovEst = 'smooth', userBwCov = 2))
```

Based on the scree-plot we see that the first three components appear
to encapsulate most of the relevant variation. The number of
eigencomponents to reach a 99.99% FVE is \(11\) but just \(3\) eigencomponents are enough to reach a
95.0%. We can easily inspect the following visually, using the
` CreatePathPlot` command.

`## Loading required package: ks`

```
par(mfrow=c(1,2))
CreatePathPlot(fpcaObjFlies, subset = c(3,5,135), main = 'K = 11', pch = 4); grid()
CreatePathPlot(fpcaObjFlies, subset = c(3,5,135), K = 3, main = 'K = 3', pch = 4) ; grid()
```

One can perform outlier detection (Febrero,
Galeano, and González-Manteiga 2007) as well as visualize data
using a functional box-plot. To achieve these tasks one can use the
functions ` CreateOutliersPlot` and

`CreateFuncBoxPlot`

`KDE`

`bagplot`

`CreateOutliersPlot`

` CreateFuncBoxPlot(fpcaObjFlies, xlab = 'Days', ylab = '# of eggs laid', optns = list(K =3, variant='bagplot'))`

Functional data lend themselves naturally to questions about their rate
of change; their derivatives. As mentioned previously using
` fdapace` one can generate estimates of the sample’s
derivatives (

`fitted.FPCA`

`FPCAder`

`derOptns`

Carey, JR, P Liedo, H-G Müller, J-L Wang, and J-M Chiou. 1998.
“Relationship of Age Patterns of Fecundity to Mortality,
Longevity, and Lifetime Reproduction in a Large Cohort of Mediterranean
Fruit Fly Females.” *The Journals of Gerontology Series A:
Biological Sciences and Medical Sciences* 53 (4): B245–51.

Castro, PE, WH Lawton, and EA Sylvestre. 1986. “Principal Modes of
Variation for Processes with Continuous Sample Curves.”
*Technometrics* 28 (4): 329–37.

Febrero, M, P Galeano, and W González-Manteiga. 2007. “A
Functional Analysis of NOx Levels: Location and Scale Estimation and
Outlier Detection.” *Computational Statistics* 22 (3):
411–27.

Gajardo, Alvaro, Satarupa Bhattacharjee, Cody Carroll, Yaqing Chen,
Xiongtao Dai, Jianing Fan, Pantelis Z. Hadjipantelis, et al. 2021.
*fdapace: Functional Data Analysis and
Empirical Dynamics*. https://CRAN.R-project.org/package=fdapace.

Hall, P, H-G Müller, and F Yao. 2008. “Modelling Sparse
Generalized Longitudinal Observations with Latent Gaussian
Processes.” *Journal of the Royal Statistical Society: Series
B (Statistical Methodology)* 70 (4): 703–23.

Hyndman, RJ, and HL Shang. 2010. “Rainbow Plots, Bagplots, and
Boxplots for Functional Data.” *Journal of Computational and
Graphical Statistics* 19 (1).

Li, Y, and T Hsing. 2010. “Uniform Convergence Rates for
Nonparametric Regression and Principal Component Analysis in
Functional/Longitudinal Data.” *The Annals of Statistics*
38 (6): 3321–51.

Liu, B, and H-G Müller. 2009. “Estimating Derivatives for Samples
of Sparsely Observed Functions, with Application to Online Auction
Dynamics.” *Journal of the American Statistical
Association* 104 (486): 704–17.

Müller, H-G, and F Yao. 2010. “Empirical Dynamics for Longitudinal
Data.” *The Annals of Statistics* 38 (6): 3458–86.

Rousseeuw, PJ, I Ruts, and JW Tukey. 1999. “The Bagplot: A
Bivariate Boxplot.” *The American Statistician* 53 (4):
382–87.

Wang, J-L, J-M Chiou, and H-G Müller. 2016. “Functional Data
Analysis.” *Annual Review of Statistics and Its
Application* 3: 257–95.

Yao, F, H-G Müller, A J Clifford, S R Dueker, J Follett, Y Lin, B A
Buchholz, and JS Vogel. 2003. “Shrinkage Estimation for Functional
Principal Component Scores with Application to the Population Kinetics
of Plasma Folate.” *Biometrics* 59 (3): 676–85.

Yao, F, H-G Müller, and J-L Wang. 2005. “Functional Data Analysis
for Sparse Longitudinal Data.” *Journal of the American
Statistical Association* 100 (470): 577–90.

Zhang, X, and J-L Wang. 2016. “From Sparse to Dense Functional
Data and Beyond.” *The Annals of Statistics* 44 (5):
2281–321.

Zhang, X, and J–L Wang. 2018. “Optimal Weighting Schemes for
Longitudinal and Functional Data.” *Statistics &
Probability Letters* 138: 165–70.