The ctsem marketing department seems to have done a poor job getting out the message that all of the dynamic systems stuff of ctsem – fluctuating / coupled latent processes over time, intervention effects, higher order models, simple and complicated forms of heterogeneity across subjects and or time, latent interactions / state dependent parameters – are also possible in the computationally simpler discrete time (e.g. vector autoregression / structural equation model) setting.
This post is motivated by a recent question about how to handle accelerated longitudinal designs in ctsem. In these designs growth over a longer time span is approximated by tracking multiple cohorts (i.e. age ranges) for a shorter time span, which is nice but brings with it worries about cohort differences. The original form of ctsem, now ctsemOMX, contained specific functions for modelling with multiple-groups, but the improved handling of all manner of heterogeneity in ctsem as it stands these days makes this unnecessary – groups just need to be included as covariates to moderate system parameters.
It seems a quite common understanding that unit level ‘heterogeneity’ (i.e. individual differences) when using statistical models reflects some innate characteristic of the individuals. Based on a tiny twitter discussion I thought it might be helpful to demonstrate that such apparent ‘heterogeneity’ can arise very easily as a result of model limitations or choices in how data is structured. Let’s observe a group of individuals at two occasions, with a space of 1 year between occasions.
What is a dynamic system? Why are they interesting? How do we take bits of the bumbling buzzing confusion around us and pack them into a statistical model of change to make nuanced predictions and test interesting hypotheses about ‘stuff that happens’ so we can better adjust the buzzing confusion to our tastes? I’m turning the class I recently taught into some blog posts, so for some of my opinion, not intended as rigorous philosophy of science / statistics, but as a start to thinking about systems modelling, read on…
Latent growth curves are a nice, relatively straightforward model for estimating overall patterns of change from multiple, noisy, indicator variables. While the classic formulations of this model can be easily fit in most SEM packages, it provides a nice basis for understanding the differential equation formulation of systems, and also a good starting point for more complex model development not possible in the SEM framework – as a peek into these possibilities I’ll also show a growth curve model where the measurement error depends on the latent variable, as would be typical of floor or ceiling effects.
ctsem is R software for statistical modelling using hierarchical state space models, of discrete or continuous time formulations, with possible non-linearities (ie state / time dependence) in the parameters. This is a super brief demo to show the basic intuition for Kalman filtering / smoother, and missing data imputation – for a quick start see https://cdriver.netlify.com/post/ctsem-quick-start/ , and for more details see the current manual at https://github.com/cdriveraus/ctsem/raw/master/vignettes/hierarchicalmanual.pdf Data Lets load ctsem (if you haven’t installed it see the quick start post!
ctsem is R software for statistical modelling using hierarchical state space models, of discrete or continuous time formulations, with possible non-linearities in the parameters. In this post I’ll walk through some of the basics of ctsem usage, for more details see the current manual at https://github.com/cdriveraus/ctsem/raw/master/vignettes/hierarchicalmanual.pdf Installation and loading of ctsem Within R, for the CRAN version of ctsem simply run: install.packages('ctsem') Or for the github version, with extra setup for stan that may be useful if stan is not installed:
Generate some data Here we specify a bivariate latent process where the 1st process affects the 2nd, and there are stable individual differences in the processes. library(ctsem) gm <- ctModel(LAMBDA=diag(2), #diagonal factor loading, 2 latents 2 observables Tpoints = 50, DRIFT=matrix(c(-1,.5,0,-1),2,2), #temporal dynamics TRAITVAR = diag(.5,2), #stable latent intercept variance (cholesky factor) DIFFUSION=diag(2)) #within person covariance ctModelLatex(gm) #to view latex system equations #when generating data, free pars are set to 0 d <- data.
Representing Dynamic Systems Building on from the dynamic systems intro in a previous post, to build formal models of dynamic systems, we need some way to represent them mathematically. A common approach used in the social sciences is that of a ‘difference equation’, or ‘structural equation model’ (SEM), both of which are forms of a ‘discrete time’ representation. While such an approach is computationally appealing and familiar to most social scientists, it often does not represent the causal system that people think it does, and is problematic for scenarios where certain paths are fixed to zero, as with hypothesis testing or some forms of data driven structure determination.