Code
# Install ctsem / packages if needed
# ensure your system can compile models (rtools may be needed for windows)
# remotes::install_github("cdriveraus/ctsem")
library(ctsem)
library(ggplot2)
library(data.table)
set.seed(1)
ctsem, asymmetric dynamics, sinh-arcsinh transformation, nonlinear measurement model, continuous-time processes, discrete-time processes
# Install ctsem / packages if needed
# ensure your system can compile models (rtools may be needed for windows)
# remotes::install_github("cdriveraus/ctsem")
library(ctsem)
library(ggplot2)
library(data.table)
set.seed(1)
Many psychological, biological, and economic processes change faster when a state is above its equilibrium and slower when it is below (or vice‑versa). This results in asymmetric distributions of the observations and heterogeneity such that individuals who tend to be high on the process show more / less variance than individuals who tend to be low. There are various ways one could attempt to model nonlinear dynamics, e.g. by making the dynamic coefficients / variances dependent on the latent state. However an often simpler approach is to transform the observations – there is usually no ground truth in the specific measurement scale chosen, so we can use a transformation that makes the data more symmetric and the latent relations linear, e.g. a log or square root transformation might be appropriate for positively skewed data, or a logit transformation for bounded proportions. Transforming the raw data to be more normal is simple and often effective, but it is often quite arbitrary and difficult to check / justify. A more sophisticated form of this is to parameterize and estimate the link between the latent state and the observations – rather than the typical linear link, instead we can use a nonlinear function of one or more latent states. This allows the optimizer to find the best match between the non-linear observation model and the (model implied) linear dynamics, allowing the model to capture the asymmetry in the data without having to specify a single specific transformation in advance.
A useful non-linear link function is the sinh-arcsinh transformation (Jones, M. C., & Pewsey, A. (2009). Sinh-arcsinh distributions. Biometrika, 96(4), 761-780.), which allow for the estimation of a stretch / squash parameter controlling the tail width, and an asymmetry parameter controlling skew. This link is relatively straightforward to implement in ctsem using a state-dependent parameter approach, shown below.
To test this approach we will simulate a simple two‑process model, where the first process is measured with a nonlinear transformation, and the second is measured linearly. The measurement model for the first process uses a softplus transformation to create the asymmetry in the data – here we’re specifically not using the same transformation as for fitting, to demonstrate performance of the sinh-arcsinh transformation in conditions where it is not the ‘true’ model.
## Design
<- 50
n_subjects <- 50
Tpoints
## Build continuous‑time model used for simulation
<- ctModel(
sim_mod type = "omx", # use OpenMx for simulation
LAMBDA = diag(2),
Tpoints= Tpoints,
DRIFT = c(
-.2, 0,
3 ,-0.2),
.DIFFUSION = diag(1, 2),
T0MEANS = c(3,2),
manifestNames = c('eta1','eta2'),
T0VAR = diag(1,2)
)
## Generate data, then manually impose nonlinear measurement
<- suppressMessages(
dat data.frame(ctGenerate(sim_mod, n.subjects = n_subjects, burnin = 3))
)
#squash / stretch function
<- function(x, asymm = 0, stretch = 1) {
sinh_arcsinht sinh(asymm + stretch * asinh(x))
}
for(i in seq_len(nrow(dat))) {
$eta1nl <- log1p(exp(dat$eta1))
dat
}
## Add measurement error
$Y1 <- dat$eta1nl + rnorm(nrow(dat), 0, .2)
dat$Y2 <- dat$eta2 + rnorm(nrow(dat), 0, .2)
dat
library(ggplot2)
ggplot(dat, aes(x = eta1, y = eta1nl)) +
geom_smooth()+ggtitle("Relation between linear and nonlinear latent values") +
theme_minimal()
First up we test the non-linear measurement approach, relying heavily on ctsem defaults (of freely estimated order 1 dynamics) for the process model, and using the sinh-arcsinh transformation for the measurement model of the first variable. The asymmetry and stretch parameters are estimated, and the model is fit to the data. Note the ctModel
function allows for the use of complex transforms, such as the sinh-arcsinh transformation, by specifying the transform in the appropriate location (we want to predict the observed values of the first variable so use the first element of the MANIFESTMEANS matrix), and specify any parameter used in the complex transform in the PARS
argument. Because the stretch parameter needs to be positive, we use the log1p_exp
(softplus) transformation for it internally (at the optimizer level, all parameters are unconstrained from -Inf to +Inf). The asymmetry and regular intercept parameter (mm1) is not transformed, as they can take on any value.
<- ctModel(type = "ct",
mod manifestNames = c("Y1","Y2"),
LAMBDA = matrix(c(0,0,0,1),2,2),
MANIFESTMEANS=c(
'mm1 + sinh(asymm + stretch * asinh(eta1))',
'mm2'),
PARS = c( #extra parameters needed for complex transforms
'mm1|param', #mean of Y1, no transform
"asymm|param", #asymmetry param, no transformation
"stretch|log1p_exp(param)" #transform ensures positive for stretch / squash
)
)$pars$indvarying=F #disable individual varying parameters mod
<- ctStanFit(datalong = dat, ctstanmodel = mod, cores=10,plot=10)
fit <- ctStanGenerateFromFit(fit,200,cores=10) #generate data from fit for diagnostics fit
=summary(fit)
sprint(s$popmeans)
mean sd 2.5% 50% 97.5%
T0m_eta1 0.5062 0.1171 0.2821 0.5044 0.7300
T0m_eta2 1.7500 0.2943 1.1677 1.7455 2.3348
drift_eta1 -0.2476 0.0285 -0.3098 -0.2446 -0.1969
drift_eta1_eta2 0.0120 0.0063 -0.0003 0.0120 0.0245
drift_eta2_eta1 0.6272 0.0727 0.4827 0.6294 0.7671
drift_eta2 -0.2044 0.0122 -0.2296 -0.2043 -0.1819
diff_eta1 0.5679 0.0570 0.4665 0.5643 0.6890
diff_eta2_eta1 0.0103 0.0126 -0.0148 0.0105 0.0335
diff_eta2 1.0505 0.0160 1.0195 1.0501 1.0839
mvarY1 0.1686 0.0201 0.1340 0.1672 0.2109
mvarY2 0.0000 0.0000 0.0000 0.0000 0.0000
mm2 0.6105 0.1853 0.2386 0.6076 0.9792
T0var_eta1 0.7336 0.1041 0.5541 0.7264 0.9563
T0var_eta2_eta1 0.2481 0.0692 0.1120 0.2473 0.3857
T0var_eta2 1.7457 0.1726 1.4340 1.7387 2.0973
mm1 -0.1525 0.1291 -0.3955 -0.1541 0.1034
asymm 1.0104 0.0841 0.8454 1.0133 1.1683
stretch 0.7357 0.0541 0.6332 0.7371 0.8461
Empirical vs simulated lagged auto and cross correlations:
ctFitCovCheck(fit,plot=T)
$Y1
$Y2
Kalman filter plots of forward predictions (note slower / less uncertainty at low levels):
ctKalman(fit,plot=T,polygonalpha=.4)
Posterior predictive style plots, showing overall density only:
<- ctPostPredPlots(fit)
pp print(pp[[1]])
What if we had fit a simple linear model to this data?
<- ctModel(type = "ct",
lmod manifestNames = c("Y1","Y2"),
LAMBDA = diag(1,2)
)$pars$indvarying=F #disable individual varying parameters lmod
<- ctStanFit(datalong = dat, ctstanmodel = lmod, cores=10,plot=10,optimcontrol=list(bootstrapUncertainty=T))
lfit <- ctStanGenerateFromFit(lfit,200,cores=10) #generate data from fit for diagnostics lfit
=summary(lfit)
lsprint(ls$popmeans)
mean sd 2.5% 50% 97.5%
T0m_eta1 0.8148 0.1588 0.5037 0.8171 1.1197
T0m_eta2 2.5332 0.3208 1.9155 2.5536 3.1245
drift_eta1 -0.2327 0.0222 -0.2796 -0.2309 -0.1894
drift_eta1_eta2 0.0050 0.0074 -0.0101 0.0048 0.0191
drift_eta2_eta1 0.5842 0.0357 0.5152 0.5831 0.6573
drift_eta2 -0.2039 0.0128 -0.2307 -0.2039 -0.1794
diff_eta1 0.5662 0.0186 0.5310 0.5650 0.6018
diff_eta2_eta1 0.0086 0.0130 -0.0161 0.0085 0.0340
diff_eta2 1.0508 0.0123 1.0274 1.0510 1.0743
mvarY1 0.2328 0.0180 0.1994 0.2319 0.2707
mvarY2 0.0000 0.0000 0.0000 0.0000 0.0000
mm_Y1 0.9013 0.0716 0.7515 0.9013 1.0373
mm_Y2 -0.1510 0.2372 -0.6082 -0.1440 0.3038
T0var_eta1 0.9897 0.1152 0.7794 0.9866 1.2319
T0var_eta2_eta1 0.2535 0.0742 0.0989 0.2554 0.3994
T0var_eta2 1.7429 0.1634 1.4524 1.7342 2.0737
ctFitCovCheck(lfit,plot=T)
$Y1
$Y2
ctStanDiscretePars(lfit,plot=T)
ctKalman(lfit,plot=T,polygonalpha=.4)
<- ctPostPredPlots(lfit)
lpp print(lpp[[1]])
What if we model the nonlinearity at the process layer? If the dynamics for one process are state dependent (i.e., vary depending on whether the process is high or low), we need to make all the dynamic parameters related to the first process state dependent, except for the system noise correlation because correlations are independent of scale. This is because when for instance the auto-effect is changing at different levels of the process, it is highly unlikely and somewhat incoherent to assume any cross-effect is staying constant, as the influence of cross-effects is also dependent on auto-effects.
<- ctModel(type = "ct",
pmod manifestNames = c("Y1","Y2"),
LAMBDA = matrix(c(1,0,0,1),2,2),
DRIFT=c('-log1p_exp(dr11_base + dr11_state * eta1)','dr12_base + dr12_state * eta1',
'dr21_base + dr21_state * eta1','dr22_base'),
DIFFUSION = c('log1p_exp(diff11_base + diff11_state * eta1)', 0,
'diff21','diff22'),
PARS = c( #extra parameters needed for complex transforms
'dr11_base|param', #base drift for eta1 autoeffect
'dr11_state|param', #state dependent drift for eta1 autoeffect
'dr12_base|param', #base drift for eta1 cross effect from eta2
'dr12_state|param', #state dependent drift for eta1 cross effect from eta2
'dr21_base|param', #base drift for eta2 cross effect from eta1
'dr21_state|param', #state dependent drift for eta2 cross effect from eta1
'diff11_base|param', #base diffusion for eta1 autoeffect
'diff11_state|param' #state dependent diffusion for eta1 autoeffect
)
)$pars$indvarying=F #disable individual varying parameters pmod
<- ctStanFit(datalong = dat, ctstanmodel = pmod, cores=10,plot=10)
pfit <- ctStanGenerateFromFit(pfit,200,cores=10) #generate data from fit for diagnostics pfit
=summary(pfit)
psprint(ps$popmeans)
mean sd 2.5% 50% 97.5%
T0m_eta1 1.1758 0.1783 0.8343 1.1641 1.5303
T0m_eta2 3.6687 0.4832 2.7522 3.6613 4.6515
dr22_base -0.1996 0.0113 -0.2222 -0.1993 -0.1781
diff21 0.0189 0.0129 -0.0072 0.0190 0.0425
diff22 1.0488 0.0164 1.0180 1.0486 1.0803
mvarY1 0.1942 0.0164 0.1642 0.1937 0.2285
mvarY2 0.0000 0.0000 0.0000 0.0000 0.0000
mm_Y1 0.5302 0.1066 0.3215 0.5281 0.7271
mm_Y2 -1.2848 0.4043 -2.0902 -1.2987 -0.4904
T0var_eta1 0.9950 0.1095 0.7989 0.9900 1.2276
T0var_eta2_eta1 0.2549 0.0692 0.1191 0.2553 0.3932
T0var_eta2 1.7511 0.1758 1.4538 1.7347 2.1296
dr11_base -1.9140 0.5484 -2.9998 -1.9343 -0.8642
dr11_state -2.9950 0.5382 -4.0384 -2.9838 -1.9824
dr12_base 0.0117 0.0081 -0.0041 0.0117 0.0273
dr12_state -0.0324 0.0045 -0.0412 -0.0324 -0.0236
dr21_base 0.6486 0.0551 0.5339 0.6493 0.7491
dr21_state -0.0411 0.0209 -0.0809 -0.0412 -0.0005
diff11_base -0.5496 0.0633 -0.6707 -0.5495 -0.4290
diff11_state 0.4552 0.0366 0.3841 0.4550 0.5253
ctFitCovCheck(pfit,plot=T)
$Y1
$Y2
ctKalman(pfit,plot=T,polygonalpha=.4)
<- ctPostPredPlots(pfit)
ppp print(ppp[[1]])
AIC will somewhat work in such cases but cross-validation approaches are much more reliable. Comparing the 3 models using an approximate leave-one-out cross validation log likelhood:
$loo <- ctLOO(fit, cores=10, folds = length(unique(dat$id)),
fitparallelFolds = TRUE,casewiseApproximation = TRUE)
$loo <- ctLOO(lfit, cores=10, folds = length(unique(dat$id)),
lfitparallelFolds = TRUE,casewiseApproximation = TRUE)
$loo <- ctLOO(pfit, cores=10, folds = length(unique(dat$id)),
pfitparallelFolds = TRUE,casewiseApproximation = TRUE)
print(c(fit$loo$outsampleLogLik, lfit$loo$outsampleLogLik, pfit$loo$outsampleLogLik))
[1] -5495.912 -5701.475 -5495.426
The nonlinear measurement model has a cross validated log likelihood of -5495.9121412, the linear model has a cross validated log likelihood of -5701.4750682, and the nonlinear dynamics model has a cross validated log likelihood of -5495.425655. The nonlinear models are clearly better than the linear model, and including the non-linearity at the measurement layer is approximately similar in predictive performance, in this formulation it has fewer parameters and doesn’t capture the overall distribution of the empirical data quite as well.