Dynamic Causal Modelling

Dynamic Causal Modelling, or DCM for short, is a modelling approach to understand the interactions between different brain regions. The DCM approach was invented by Karl Friston et al. in 2003 (Friston, Harrison, & Penny, 2003b; K. E. Stephan & Roebroeck, 2012a, for a comprehensive tutorial see Kahan & Foltynie, 2013; K. Stephan et al., 2010). Its primary purpose is to determine the effective connectivity, which represents the causal relationship between brain regions. For short, it tells us how much one region influences another or itself by either increasing or decreasing its activity. This is achieved by assuming a certain causal architecture (a network) of the system under study.

How experiments drive neural networks¶

A network consists of various nodes, i.e. brain regions and edges, i.e. their connections. In DCM the edges of a network are directed and non-reciprocal, indicating that the link from node $A$ to node $B$ is not the same as the connection from $B$ to $A$ . When a network undergoes changes due to specific input, such as experimental stimuli, then it transforms into a system that can assume different states, like the excitation of a particular region. To find out how such system changes in response to different stimulus types, is the main goal of DCM. At the heart of DCM lies the neural state equation (Equation 1) that mathematically formalizes the neural changes as a bilinear system. In other words, it tells how much influence (per second) the regions have on eachother’s average synaptic activity (L. Lee, Friston, & Horwitz, 2006).

$z˙ = Az + \sum u_j B_j z + Cu (1)$

The alteration of neural activity over time z depends on the system’s architecture, inputs, and input sensitivity. Three matrixes portray this relationship. Matrix

A

or endogenous connectivity describes the extend of neural alteration based on the network’s structure without any additional input. On regional level, this alteration is characterized by the amount of influence from the source region’s activity. Matrices

B_j

or modulatory connectivity characterises the amount of alteration in response to a number of experimental manipulations

u_j

. Finally, matrix

C

or extrinsic connectivity represents the amount of activity change due to driving inputs

u

. Together, these matrixes reflect the total changes in a previously defined system.

From neural states to BOLD response and back again¶

The neural state equation might be the most important one for the user, but it is not the sole equation in DCM. DCM consists of several layers of equations that derive the BOLD response (see the fMRI section) from the changes in neural activity. Initially, the neural activity of one region/node enters the regional hemodynamic state equation, which calculates the region’s vasodilatory signal and blood flow. These parameters are inserted into the balloon model to retrieve the changes in blood volume and deoxygenated hemoglobin. This information allows for a complete description of the BOLD signal change equation (K. E. Stephan, Weiskopf, Drysdale, Robinson, & Friston, 2007). Such that the user is provided with the BOLD signal over the time course of the experiment. The only information needed to generate the BOLD signal in such a way is the architecture of the system and the region’s neural activity over time. Unfortunately, researchers often do not have access to the underlying neural activity. Therefore, DCM applies Bayesian statistic to infer the neural activity from the measured signal. Hereby, DCM searches for the best parameters for all its equations, from the BOLD signal change equation to the neural state equation, to find good approximations for the underlying neural activity. In short, DCM finds those parameters $θ$ that are most probable for a certain model $m$ , given the data $y$ . This is achieved by using a machine learning tool called the EM (Expectation-maximization) algorithm (Do & Batzoglou, 2008). By systematically adjusting the parameter values we obtain different outputs from the BOLD signal change equation. The goal is to minimize the difference between this output and the BOLD signal we observe from our fMRI experiment, by choosing the best fit parameters (K. E. Stephan et al., 2007). In Bayesian terms, we retrieve the maximum of the posterior probability. That is a density distribution for each parameter, built from the parameter’s likelihood $p(y|θ,m)$ and its prior belief $p(θ|m)$ . While the likelihood tells us which values the parameters can take on, the prior determines to what extend these can vary. The parameters have a prior probability to take on certain values before data is even observed. How to derive the posterior probability from the likelihood and the prior is formalized by the Bayes rule (equation 2) (W. Penny, 2015). The Bayes rule is essential for retrieving the posterior, that is the probability for the parameters given the data and a certain model $p(θ|y,m)$ .

$p(θ|y,m) = \frac{p(y|θ,m)p(θ|m)}{p(θy|m)} (2)$

What we call the best model¶

In the end we are most interested in the goodness of our model, the model evidence. The model evidence is the probability to observe the data given a certain model $p(y|m)$ . This is difficult to calculate directly but can be approximated via different approaches. DCM uses the Negative Free Energy principle (Friston et al., 2003a; Friston, Li, Daunizeau, & Stephan, 2011) for this and finds the best trade-off between model accuracy and complexity. Model accuracy or the explained variance describes how well the data can be explained (the two BOLD signals fit). It resembles the likelihood $p(y|θ,m)$ described above. Model complexity is the difference between the prior and posterior. Models with higher complexity often have a higher chance to explain the data better but then end up in overfitting the dataset (Lohmann, Erfurth, Mu¨ller, & Turner, 2012; K. E. Stephan et al., 2007). This means that the model cannot explain data that it has not encountered before. This is why the model evidence accounts for the model’s complexity, providing for generalization of the results. Concretely, model complexity is the Kullback-Leibler ( $KL$ ) Divergence that measures the difference between two probability distributions. As the posterior probability diverges from the prior, when $KL > 0$ , we can easily determine the best fit model by comparison. One possibility is to divide the model evidences in order to calculate the Bayes factor ( $BF$ ) (see Eq. 3) (W. Penny, 2012; K. Stephan et al., 2010; W. D. Penny, Stephan, Mechelli, & Friston, 2004)

$B = \frac{p(y|m_1)}{p(y|m_2)} (3)$

The model with the highest $BF$ is then selected as the best model. This process is called Bayesian Model Selection ( $BMS$ ) and is often used to answer ones hypotheses (K. E. Stephan, Penny, Daunizeau, Moran, & Friston, 2009).

Alternative Model Evidences¶

As mentioned above, DCM uses the Free Energy principle to approximate the model evidence. However other alternative approximations exists in the field as well. Two commonly applied methods are the Bayesian Information Criterion ( $BIC$ ) and the Akaike’s Information Criterion ( $AIC$ ). Both of these methods take the model accuracy into account together with a penalty according to the amount of parameters $p$ (K. E. Stephan et al., 2009). Even though $BIC$ additionally considers the number of data points N, it seldom matches the Free Energy approximation used by DCM (W. Penny, 2012).

$BIC = Accuracy(m) - \frac{p}{2} log N (4)$

$AIC = Accuracy(m) − p (5)$

Penny (2012) has shown that

BIC

and

AIC

cannot consider parameter magnitude, which can lead to incorrect model evidence estimations. Free Energy on the other hand estimates higher complexities for parameters that diverge more from their prior values and thereby empathizes the use of appropriate priors (W. Penny, 2012). Overall,

BIC

and

AIC

are both biased measures (especially for low Signal-to-Noise Ratio (SNR)), possibly resulting in different model selections in comparison to the BMS approach.

Resources¶

Do, C. B., & Batzoglou, S. (2008, August). What is the expectation maximization algorithm? Nature Biotechnology, 26(8), 897–899. Retrieved 2024-02-09, from FACE NETWORK INTEGRATION https://www.nature.com/articles/nbt1406 (Number: 8 Publisher: Nature Publishing Group) doi: 10.1038/nbt1406

Friston, K. J. (Ed.). (2007). Statistical parametric mapping: the analysis of funtional brain images (1^st ed ed.). Amsterdam ; Boston: Elsevier/Academic Press.

Friston, K. J., Harrison, L., & Penny, W. (2003a, August). Dynamic causal modelling. NeuroImage, 19(4), 1273–1302. Retrieved 2023-12-28, from https://www.sciencedirect.com/science/article/pii/S1053811903002027 doi: 10.1016/S1053-8119(03)00202-7

Friston, K. J., Harrison, L., & Penny, W. (2003b, August). Dynamic causal modelling. NeuroImage, 19(4), 1273–1302. Retrieved 2023-12-28, from https://wwwCHAPTER 4. DISCUSSION.sciencedirect.com/science/article/pii/S1053811903002027 doi: 10.1016/ S1053-8119(03)00202-7

Friston, K. J., Li, B., Daunizeau, J., & Stephan, K. E. (2011, June). Network discovery with DCM. NeuroImage, 56(3), 1202–1221. Retrieved 2023-09-19, from https://linkinghub.elsevier.com/retrieve/pii/S105381191001623X doi: 10 .1016/j.neuroimage.2010.12.039

Lee, L., Friston, K., & Horwitz, B. (2006, May). Large-scale neural models and dynamic causal modelling. NeuroImage, 30(4), 1243–1254. Retrieved 2023-12-05, from https://linkinghub.elsevier.com/retrieve/pii/S1053811905024596 doi: 10.1016/j.neuroimage.2005.11.007

Lohmann, G., Erfurth, K., Mu¨ller, K., & Turner, R. (2012, February). Critical comments on dynamic causal modelling. NeuroImage, 59(3), 2322–2329. Retrieved 2024-0130, from https://linkinghub.elsevier.com/retrieve/pii/S1053811911010718 doi: 10.1016/j.neuroimage.2011.09.025

Penny, W. (2012, January). Comparing Dynamic Causal Models using AIC, BIC and Free Energy. Neuroimage, 59(1), 319–330. Retrieved 2024-01-08, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3200437/ doi: 10.1016/j.neuroimage FACE NETWORK INTEGRATION.2011.07.039

Penny, W. (2015). Bayesian Models in Neuroscience. In International Encyclopedia of the Social & Behavioral Sciences (pp. 368–372). Elsevier. Retrieved 2024-02-08, from https://linkinghub.elsevier.com/retrieve/pii/B9780080970868560358 doi: 10.1016-B978-0-08-097086-8.56035-8

Penny, W. D., Stephan, K. E., Mechelli, A., & Friston, K. J. (2004, July). Comparing dynamic causal models. NeuroImage, 22(3), 1157–1172. doi: 10.1016/j.neuroimage .2004.03.026

Stephan, K., Penny, W., Moran, R., den Ouden, H., Daunizeau, J., & Friston, K. (2010, February). Ten simple rules for dynamic causal modeling. Neuroimage, 49(4), 3099–3109. Retrieved 2023-09-16, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2825373/ doi: 10.1016/j.neuroimage.2009.11.015

Stephan, K. E., Penny, W. D., Daunizeau, J., Moran, R. J., & Friston, K. J. (2009, July). Bayesian model selection for group studies. NeuroImage, 46(4), 1004–1017.Retrieved 2023-12-28, from https://www.sciencedirect.com/science/article/pii/S1053811909002638 doi: 10.1016/j.neuroimage.2009.03.025

Stephan, K. E., & Roebroeck, A. (2012a, August). A short history of causal modeling of fMRI data. NeuroImage, 62(2), 856–863. Retrieved 2023-12-02, from https://linkinghub.elsevier.com/retrieve/pii/S1053811912000511 doi: 10.1016/j.neuroimage.2012.01.034

Stephan, K. E., Weiskopf, N., Drysdale, P. M., Robinson, P. A., & Friston, K. J. (2007, November). Comparing hemodynamic models with DCM. NeuroImage, 38(3), 387–401. Retrieved 2024-02-09, from https://www.sciencedirect.com/science/article/pii/S1053811907006489 doi: 10.1016/j.neuroimage.2007.07.040