This paper demonstrates use of the gold standard method in behavioral genetics, the Cholesky decomposition method, to estimate unique, overlapping genetic and environmental influences on different variables to answer longitudinally motivated research questions.
The Cholesky decomposition method is the gold standard used in the field of behavioral genetics. The method is popular because it is easy to program and solve. Using this method, researchers can explore individual differences in longitudinal relations of different variables across multiple time points. The method allows investigators to decompose variance into (1) unique genetic, shared and non-shared environmental effects that arise at specific time points as well as (2) overlapping genetic, shared and non-shared environmental effects that carry over from one time point to another. However, the method does not identify the mechanisms or origins underlying these effects. The current report focuses on application of the Cholesky decomposition method in the field of educational psychology. Specifically, it discusses individual differences in longitudinal relations between kindergarten letter knowledge, kindergarten phonological awareness, first grade word-level reading skills, and seventh grade reading comprehension.
Becoming a skilled reader with the ability to fluently read and comprehend text is important for children’s school outcomes. To prevent the development of reading problems, it is vital to understand the extent to which different reading skills predict reading comprehension. Existing research has shown that pre-reading and word-level reading skills in elementary school longitudinally predict reading comprehension in middle school1,2. Individual differences in these predictions mostly point to underlying genetic (and to some extent, environmental) factors from kindergarten up to grade four3,4. However, there is a need to explore whether these same genetic and environmental factors continue to influence these predictions up to middle school grades.
One method to gain a better understanding of individual differences underlying the associations between elementary and middle school reading skills is using behavioral genetic methodology, specifically the Cholesky decomposition method. The Cholesky decomposition method is considered one of the gold standard analyses in behavioral genetics. This method is easy to program and solve and allows for the decomposition of variance and covariance into (A) genetic, (C) shared environmental, and (E) non-shared environmental influences, usually in a sample of twins. An example of a univariate (one variable) Cholesky decomposition is indicated in Figure 1. The A latent factor refers to genetic effects, which are genetic influences inherited from parents. The C latent factor refers to shared environmental effects, which are aspects of the environment that serve to make twins more similar, such as home and school environments. Lastly, the E latent factor refers to non-shared environmental effects, which are environmental influences that are unique to each twin and contribute to differences between twins, such as each’s own experience. The E factor also captures measurement error.
Figure 1: Decomposition into (A) genetic, (C) shared environmental, and (E) non-shared environmental influences. Please click here to view a larger version of this figure.
The A, C, and E factors in Figure 1 estimate the extent to which genes and environments influence one (reading) variable. Still, to investigate individual differences underlying longitudinal associations between more than one reading skill from elementary to middle school, longitudinal analysis is necessary. To answer longitudinally motivated research questions, a multivariate Cholesky decomposition method is used here5. Conceptually, the multivariate Cholesky decomposition method is similar to hierarchical multiple regression, such that independent contribution of genetic and environmental factors is assessed after the contributions of previous factors have been taken into account.
For instance, in a multivariate Cholesky decomposition with longitudinal data at four time points (see Figure 2), the first set of factors [genetic (A1), shared environmental (C1), and non-shared environmental (E1)] contributes to the variance of all variables, represented as paths a11, a21, a31, a41, c11, c21, …, e11, etc., from A1, C1, E1 factors to each variable. The second set of factors (A2, C2, E2) contributes to the variance of the second and subsequent variables after controlling for the first time point. The second set of factors is represented as paths a22, a32, a42, c22, c32, …, e22, etc. Then, influences of the third set of factors (A3, C3, E3) are estimated for the third and fourth variables after controlling for the previous two time points. They are represented as paths a33, a43, c33, c43, e33, e43. Finally, influences of the fourth set of factors (A4, C4, E4) are measured for the final time point after controlling for all previous time points. They are represented as paths a44, c44, e44.
Figure 2: Multivariate Cholesky decomposition model for four time points. Please click here to view a larger version of this figure.
In this longitudinal application of the multivariate Cholesky decomposition method, genetic and environmental influences at each time point are estimated after the effects of previous time points have been controlled for. As such, this method allows determination of the extent to which unique genetic and environmental influences come online at each particular time point, independent of influences from previous time points (these effects are estimated by paths a11, a22, a33, a44, c11, c22, …, e11, e22, etc.). In addition, the method also enables examination of the degree to which the same (overlapping) genetic and environmental influences are shared between time points. In other words, it can be determined to which extent genetic and environmental influences carry over from one time point to another (i.e., these effects are estimated by paths a21, a31, a41, a32, a42, a43, c21, c31, …, e21, etc.). It should be noted that paths a11, c11, and e11 represent all possible genetic and environmental influences up to and including the first time point, which can be either unique or overlapping with previous time points. However, time points prior to the first time point are not estimated; hence, it cannot be accurately determined whether they represent unique or overlapping influences. For simplification purposes, they are included as unique influences in the current report.
The order of the measured variables entered into a Cholesky decomposition is arbitrary. However, the order is usually driven by a theoretical perspective. This is also the case in the current study, in which the order was based on the development of reading skills, such that reading skills in elementary school are predictive of reading comprehension in middle school.
There are several reports in the literature investigating genetic and environmental factors underlying longitudinal associations of reading skills utilizing the Cholesky decomposition method. These prior studies mostly focused on investigating relations between reading skills among elementary schoolers6,7. There is only one published study examining individual differences associated with reading from elementary grades into middle school grades using the multivariate Cholesky decomposition method8. This protocol details the multivariate Cholesky decomposition method from that specific report to explore individual differences in longitudinal relations between kindergarten letter knowledge, kindergarten phonological awareness, first grade word-level reading skills, and seventh grade reading comprehension.
The study findings focus on using the multivariate Cholesky decomposition method to distinguish between two types of genetic and environmental influences. First, it is shown how to estimate genetic and environmental influences that carry over (overlap) from elementary to middle school reading (e.g., estimating paths a43, c43, and e43, which are genetic and environmental influences on word-level reading skills from first grade that affect reading comprehension in seventh grade). Second, it is demonstrated how to estimate unique genetic and environmental influences that come online at each particular grade (e.g., estimating paths a33, c33, and e33, which are unique genetic and environmental influences on word-level reading skills that arise in first grade).
The steps below describe the process of estimating individual differences underlying longitudinal associations between elementary and middle school reading skills into (A) genetic, (C) shared environmental, and (E) non-shared environmental factors using a statistical modeling program, word processor, and software with a graphical user interface (GUI). This study has been approved by the Institutional Review Board at Florida State University.
1. Preparing data for the statistical modeling program
2. Reading data into the statistical modeling program, running the script, and estimating the effects
3. Creating a table with generated estimates
Figure 3: Multivariate Cholesky decomposition modeling standardized path estimates of genetic and environmental influences. Please click here to view a larger version of this figure.
4. Plotting genetic, shared environmental, and non-shared environmental influences
Figure 4: Entering of estimates into the software with a GUI. Please click here to view a larger version of this figure.
Figure 5: Illustration of steps 4.3 and 4.4. Please click here to view a larger version of this figure.
Figure 6: Illustration of steps 4.5–4.8. Please click here to view a larger version of this figure.
Standardized estimates for genetic, shared environmental, and non-shared environmental influences from the multivariate Cholesky decomposition model are depicted in Figure 7. In general, results revealed that individual differences in kindergarten pre-reading and first grade word-level reading skills accounted for a large proportion of the variance of genetic (40%) as well as shared environmental (39%) influences on seventh grade reading comprehension. In addition, results alluded to a degree of unique sources coming into play for each individual reading skill at each grade.
Figure 7: Full multivariate Cholesky decomposition model with standardized path estimates of genetic and environmental influences. Measured variables are depicted as rectangles, and a latent variable as an oval. LNF = kindergarten letter naming fluency, PSF = kindergarten phoneme segmentation fluency, WLRS = first grade word-level reading skills, RC = seventh grade reading comprehension. Please click here to view a larger version of this figure.
As indicated in Figure 8, it appears there was a large share of unique genetic influences (dark green) on letter naming fluency in kindergarten (36%), phoneme segmentation fluency in kindergarten (40%), and reading comprehension in seventh grade (30%). In contrast, word-level reading skills were to a lesser extent associated with unique genetic influences that arise in first grade (20%). Genetic influences on word-level reading skills were mostly overlapping (light green) with genetic influences on letter naming fluency and phoneme segmentation fluency (40%).
Figure 8: Percentage of unique and overlapping genetic influences on each reading skill. Please click here to view a larger version of this figure.
Focusing on the shared environmental influences (see Figure 9), the results implied that overlapping (light blue) shared environment influenced letter naming fluency and phoneme segmentation fluency in kindergarten (9%). Similarly, overlapping shared environmental effects were reflected in word-level reading skills in first grade (15%) and reading comprehension in seventh grade (39%) that were also shared with kindergarten reading skills. Unique shared environmental factors (dark blue) were found for first grade word-level reading skills (15%). These influences were independent of shared environmental influences in kindergarten.
Figure 9: Percentage of unique and overlapping shared environmental influences on each reading skill. Please click here to view a larger version of this figure.
For the non-shared environmental influences (see Figure 10), the results suggested very little overlap between factors (light yellow). Most non-shared environmental influences indicated unique influences (dark yellow) at each individual time point (i.e., grade).
Figure 10: Percentage of unique and overlapping non-shared environmental influences on each reading skill. Please click here to view a larger version of this figure.
The general representation of genetic and environmental factors underlying reading skills from elementary to middle school is shown in Figure 11. In general, it was shown that reading skills appear to be influenced by both genetic and environmental factors across this developmental period.
Figure 11: Total percentage of genetic, shared environmental, and non-shared environmental influences on each reading skill. Please click here to view a larger version of this figure.
The objective of this study was to demonstrate how the well-established method within behavioral genetics, the multivariate Cholesky decomposition method, can effectively be used for understanding relations across variables in temporal context. Specifically, this method allows estimation of the extent to which unique genetic and environmental influences arise during particular time points (e.g., school grade), as well as demonstrating the overlap of genetic and environmental influences across many time points.
There is one critical step in the protocol, which is estimating the multivariate Cholesky decomposition model. Occasionally, the statistical modeling program script requires adjustments in starting values based on inputted data. Researchers can use different starting values, suggested by the output of the statistical modeling program, to enable smoother iteration processes in generating genetic and environmental estimates.
Modification to the protocol (i.e., the statistical modeling program script) may be necessary if latent variables replaced measured variables. In Mplus, a latent variable is defined using a “BY” statement with two or more measured variables. For example, in this annotated input script, the latent variable for each twin is defined in the section “MODEL:” as “FLU0 BY nneworf0* nnewnwf0 (3-4);” for the first twin in the twin pair, and “FLU1 BY nneworf1* nnewnwf1 (3-4);” for the second twin in the twin pair. Another modification refers to the graphical representation of results. Results may be plotted using alternative techniques, such as pie charts. Stacked columns were used here since this is the typical visualization technique in the field of behavior genetics.
There are limitations to the multivariate Cholesky decomposition method. The multivariate Cholesky decomposition is the gold standard analysis in behavior genetics if there is interest in exploring individual differences in longitudinal relations among different variables at different time points. If instead there is interest in testing developmentally driven questions aimed at exploring individual differences in development of one (the same) variable across multiple time points, then a simplex model (depicted in Figure 12) can be used. In the simplex model, A1, C1, and E1 factors present at time 1 partially persist until time 2 (the regression paths from A1 to A2, A2 to A3, …, C1 to C2, …, etc.), at which time new factors may enter (the residuals, labeled “Res.” in Figure 12).
The multivariate Cholesky decomposition model and simplex model produce exactly the same variance-covariance matrix if there are two time points, but any additional time points separate the two models. With more than two time points, a Cholesky decomposition method produces estimates between all time points. This can result in estimation of relations that are not as developmentally meaningful (e.g., genetic and environmental influences from grade 1 on grade 7). The simplex model, on the other hand, estimates relations that are developmentally relevant (e.g., genetic and environmental influences from grade 1 to grade 2, grade 2 to grade 3, etc.). The latter pattern reflects the natural trajectory of children through school, where they progress from one grade to the next. A more thorough description of the simplex model has been described10.
Figure 12: The simplex model. Please click here to view a larger version of this figure.
An additional limitation of the multivariate Cholesky decomposition method is that while it enables quantification of genetic and environmental influences, it does not identify them. Therefore, it cannot be determined specifically which genes influence the traits measured by the variables used in the analysis. Similarly, it can only be surmised which specific environments contributed to the shared environmental or non-shared environmental influences. In the current study, there was some empirical evidence about which potential specific environments may have been at play. For example, classroom reading environment in elementary school has been shown to have longitudinal effects on reading comprehension in middle school (study findings are based on this report)11. However, further progress must rely on additional analyses to identify other environments.
Despite these limitations, the Cholesky decomposition method is a popular approach to longitudinal, multivariate, behavioral genetic research questions. The technique is easy to program and solve. It offers a unique perspective into decomposing genetic and environmental relations among time points, thereby quantifying influences that are time point-specific while distinguishing them from influences that are overlapping across multiple time points.
The authors have nothing to disclose.
This research was supported in part by a grant from the National Institute of Child Health and Human Development (P50 HD052120). Views expressed herein are those of the authors and have neither been reviewed nor approved by the granting agencies.
Microsoft Office Excel | Microsoft | ||
Microsoft Office Powerpoint | Microsoft | ||
Microsoft Office Visio | Microsoft | ||
Microsoft Office Word | Microsoft | ||
Mplus Statistical Program | Mplus |