Summary

Using Cholesky Decomposition to Explore Individual Differences in Longitudinal Relations between Reading Skills

Published: September 17, 2019
doi:

Summary

This paper demonstrates use of the gold standard method in behavioral genetics, the Cholesky decomposition method, to estimate unique, overlapping genetic and environmental influences on different variables to answer longitudinally motivated research questions.

Abstract

The Cholesky decomposition method is the gold standard used in the field of behavioral genetics. The method is popular because it is easy to program and solve. Using this method, researchers can explore individual differences in longitudinal relations of different variables across multiple time points. The method allows investigators to decompose variance into (1) unique genetic, shared and non-shared environmental effects that arise at specific time points as well as (2) overlapping genetic, shared and non-shared environmental effects that carry over from one time point to another. However, the method does not identify the mechanisms or origins underlying these effects. The current report focuses on application of the Cholesky decomposition method in the field of educational psychology. Specifically, it discusses individual differences in longitudinal relations between kindergarten letter knowledge, kindergarten phonological awareness, first grade word-level reading skills, and seventh grade reading comprehension.

Introduction

Becoming a skilled reader with the ability to fluently read and comprehend text is important for children’s school outcomes. To prevent the development of reading problems, it is vital to understand the extent to which different reading skills predict reading comprehension. Existing research has shown that pre-reading and word-level reading skills in elementary school longitudinally predict reading comprehension in middle school1,2. Individual differences in these predictions mostly point to underlying genetic (and to some extent, environmental) factors from kindergarten up to grade four3,4. However, there is a need to explore whether these same genetic and environmental factors continue to influence these predictions up to middle school grades.

One method to gain a better understanding of individual differences underlying the associations between elementary and middle school reading skills is using behavioral genetic methodology, specifically the Cholesky decomposition method. The Cholesky decomposition method is considered one of the gold standard analyses in behavioral genetics. This method is easy to program and solve and allows for the decomposition of variance and covariance into (A) genetic, (C) shared environmental, and (E) non-shared environmental influences, usually in a sample of twins. An example of a univariate (one variable) Cholesky decomposition is indicated in Figure 1. The A latent factor refers to genetic effects, which are genetic influences inherited from parents. The C latent factor refers to shared environmental effects, which are aspects of the environment that serve to make twins more similar, such as home and school environments. Lastly, the E latent factor refers to non-shared environmental effects, which are environmental influences that are unique to each twin and contribute to differences between twins, such as each’s own experience. The E factor also captures measurement error.

Figure 1
Figure 1: Decomposition into (A) genetic, (C) shared environmental, and (E) non-shared environmental influences. Please click here to view a larger version of this figure.

The A, C, and E factors in Figure 1 estimate the extent to which genes and environments influence one (reading) variable. Still, to investigate individual differences underlying longitudinal associations between more than one reading skill from elementary to middle school, longitudinal analysis is necessary. To answer longitudinally motivated research questions, a multivariate Cholesky decomposition method is used here5. Conceptually, the multivariate Cholesky decomposition method is similar to hierarchical multiple regression, such that independent contribution of genetic and environmental factors is assessed after the contributions of previous factors have been taken into account.

For instance, in a multivariate Cholesky decomposition with longitudinal data at four time points (see Figure 2), the first set of factors [genetic (A1), shared environmental (C1), and non-shared environmental (E1)] contributes to the variance of all variables, represented as paths a11, a21, a31, a41, c11, c21, …, e11, etc., from A1, C1, E1 factors to each variable. The second set of factors (A2, C2, E2) contributes to the variance of the second and subsequent variables after controlling for the first time point. The second set of factors is represented as paths a22, a32, a42, c22, c32, …, e22, etc. Then, influences of the third set of factors (A3, C3, E3) are estimated for the third and fourth variables after controlling for the previous two time points. They are represented as paths a33, a43, c33, c43, e33, e43. Finally, influences of the fourth set of factors (A4, C4, E4) are measured for the final time point after controlling for all previous time points. They are represented as paths a44, c44, e44.

Figure 2
Figure 2: Multivariate Cholesky decomposition model for four time points. Please click here to view a larger version of this figure.

In this longitudinal application of the multivariate Cholesky decomposition method, genetic and environmental influences at each time point are estimated after the effects of previous time points have been controlled for. As such, this method allows determination of the extent to which unique genetic and environmental influences come online at each particular time point, independent of influences from previous time points (these effects are estimated by paths a11, a22, a33, a44, c11, c22, …, e11, e22, etc.). In addition, the method also enables examination of the degree to which the same (overlapping) genetic and environmental influences are shared between time points. In other words, it can be determined to which extent genetic and environmental influences carry over from one time point to another (i.e., these effects are estimated by paths a21, a31, a41, a32, a42, a43, c21, c31, …, e21, etc.). It should be noted that paths a11, c11, and e11 represent all possible genetic and environmental influences up to and including the first time point, which can be either unique or overlapping with previous time points. However, time points prior to the first time point are not estimated; hence, it cannot be accurately determined whether they represent unique or overlapping influences. For simplification purposes, they are included as unique influences in the current report.

The order of the measured variables entered into a Cholesky decomposition is arbitrary. However, the order is usually driven by a theoretical perspective. This is also the case in the current study, in which the order was based on the development of reading skills, such that reading skills in elementary school are predictive of reading comprehension in middle school.

There are several reports in the literature investigating genetic and environmental factors underlying longitudinal associations of reading skills utilizing the Cholesky decomposition method. These prior studies mostly focused on investigating relations between reading skills among elementary schoolers6,7. There is only one published study examining individual differences associated with reading from elementary grades into middle school grades using the multivariate Cholesky decomposition method8. This protocol details the multivariate Cholesky decomposition method from that specific report to explore individual differences in longitudinal relations between kindergarten letter knowledge, kindergarten phonological awareness, first grade word-level reading skills, and seventh grade reading comprehension.

The study findings focus on using the multivariate Cholesky decomposition method to distinguish between two types of genetic and environmental influences. First, it is shown how to estimate genetic and environmental influences that carry over (overlap) from elementary to middle school reading (e.g., estimating paths a43, c43, and e43, which are genetic and environmental influences on word-level reading skills from first grade that affect reading comprehension in seventh grade). Second, it is demonstrated how to estimate unique genetic and environmental influences that come online at each particular grade (e.g., estimating paths a33, c33, and e33, which are unique genetic and environmental influences on word-level reading skills that arise in first grade).

Protocol

The steps below describe the process of estimating individual differences underlying longitudinal associations between elementary and middle school reading skills into (A) genetic, (C) shared environmental, and (E) non-shared environmental factors using a statistical modeling program, word processor, and software with a graphical user interface (GUI). This study has been approved by the Institutional Review Board at Florida State University.

1. Preparing data for the statistical modeling program

  1. Prepare the data in a format that can be read by the statistical modeling program of choice. Popular statistical modeling programs include Mx, OpenMx in the platform R, and MPlus9. Mx can read data files in .vl or .dat data formats, OpenMx in any data format, and Mplus in a .dat data format. The example demonstrated here is executed in the program MPlus9.
    NOTE: A sample data file in a .dat format for six randomly chosen participants is available in supplemental files. Variables used in a sample data file reflect variables used in the input coding file.

2. Reading data into the statistical modeling program, running the script, and estimating the effects

  1. Open the statistical modeling program.
  2. Locate the relevant data file to be read into the statistical modeling program by typing “File is [insert location of your data file on your computer]”.
  3. Click on the icon RUN on the ribbon of the statistical modeling program to obtain estimates for genetic, shared environmental, and non-shared environmental influences from the multivariate Cholesky decomposition method. The annotated input script for the multivariate Cholesky decomposition model for four time points as well as its output using MPlus can be found in supplemental coding files.
  4. Once the statistical modeling program generates estimates for genetic, shared environmental, and non-shared environmental influences, locate the estimates in the output file under stx11 for path a11, stx21 for path a21, …, sty11 for path c11, sty21 for path c21, …, stz11 for path e11, stz21 for path e21, etc.

3. Creating a table with generated estimates

  1. Open the word processor.
  2. Copy the generated estimates into a table in a word processor. The table can be created in a format as indicated in Figure 3. For example, in this case, the estimates for the paths a11, a21, a31, and a41 have values of 0.60, 0.24, 0.63, and 0.18, respectively.

Figure 3
Figure 3: Multivariate Cholesky decomposition modeling standardized path estimates of genetic and environmental influences. Please click here to view a larger version of this figure.

4. Plotting genetic, shared environmental, and non-shared environmental influences

  1. Open the software with a GUI.
  2. Enter the estimates from the created table into cells F3-F16, G4-G16, H5-H16, and I6-I16. A screenshot from the software with a GUI is depicted in Figure 4.

Figure 4
Figure 4: Entering of estimates into the software with a GUI. Please click here to view a larger version of this figure.

  1. Calculate the variance of genetic, shared environmental, and non-shared environmental influences by squaring the estimates in cells F3-F16, G4-G16, H5-H16, and I6-I16. Type the squared values in cells J3-J16, K4-K16, L5-L16, and M6-M16.
  2. Calculate the percentage variance by multiplying values in cells J3-J16, K4-K16, L5-L16, and M6-M16 by 100. Type the percentage values in cells N3-N16, O4-O16, P5-P16, and Q6-Q16. Steps 4.3 and 4.4 are depicted in Figure 5.

Figure 5
Figure 5: Illustration of steps 4.3 and 4.4. Please click here to view a larger version of this figure.

  1. Calculate the extent to which genetic influences carry over (overlap) from elementary to middle school.
    1. In cell R3, type “0”.
    2. In cell R4, type “=N4”. This is the extent to which genetic influences from the first time point carry over to the second time point. In this case, it indicates genetic influences from letter naming fluency in kindergarten carrying over to phoneme segmentation fluency in kindergarten.
    3. In cell R5, type “= N5+O5”. This is the degree to which genetic influences from the first two time points carry over to the third time point. In this case, it indicates genetic influences from letter naming fluency in kindergarten and phoneme segmentation fluency in kindergarten carrying over to word-level reading skills in grade 1.
    4. In cell R6, type “= N6+O6+P6”. This is the extent to which genetic influences from the first three time points carry over to the fourth time point. In this case, it indicates genetic influences from letter naming fluency in kindergarten, phoneme segmentation fluency in kindergarten, and word-level reading skills in grade 1 carrying over to reading comprehension in grade 7.
  2. Calculate the extent to which shared environmental and non-shared environmental influences carry over (overlap) from elementary to middle school much the way as in step 4.5.
  3. Calculate the extent to which unique genetic, shared environmental, and non-shared environmental factors come online at each particular time point (i.e., grade).
    1. Copy the percentages from cells N3, O4, P5, and Q6 into cells S3, S4, S5, and S6, respectively, to obtain the extent to which unique genetic factors come online at each grade.
    2. Copy the percentages from cells N8, O9, P10, and Q11 into cells U3, U4, U5, and U6, respectively, to obtain the extent to which unique shared environmental factors come online at each grade.
    3. Copy the percentages from cells N13, O14, P15, and Q16 into cells W3, W4, W5, and W6, respectively, to obtain the extent to which unique non-shared environmental factors come online at each grade.
  4. To ensure all calculations are correct, the values in cells R3-W3, R4-W4, R5-W5, and R6-W6 should each add up to 100. Steps 4.5–4.7 are depicted in Figure 6.

Figure 6
Figure 6: Illustration of steps 4.54.8. Please click here to view a larger version of this figure.

  1. Plot genetic overlapping as well as genetic unique influences by clicking and dragging the mouse over cells R2–R6 and S2–S6 to highlight the data.
  2. Click on the Insert menu.
  3. Click on Charts > Stacked Column.
  4. Repeat steps 4.9–4.11 for shared environmental and non-shared environmental overlapping as well as unique influences. Choose cells T2–T6 and U2–U6 to plot shared environmental influences, and choose cells V2–V6 and W2–W6 for non-shared environmental influences.

Representative Results

Standardized estimates for genetic, shared environmental, and non-shared environmental influences from the multivariate Cholesky decomposition model are depicted in Figure 7. In general, results revealed that individual differences in kindergarten pre-reading and first grade word-level reading skills accounted for a large proportion of the variance of genetic (40%) as well as shared environmental (39%) influences on seventh grade reading comprehension. In addition, results alluded to a degree of unique sources coming into play for each individual reading skill at each grade.

Figure 7
Figure 7: Full multivariate Cholesky decomposition model with standardized path estimates of genetic and environmental influences. Measured variables are depicted as rectangles, and a latent variable as an oval. LNF = kindergarten letter naming fluency, PSF = kindergarten phoneme segmentation fluency, WLRS = first grade word-level reading skills, RC = seventh grade reading comprehension. Please click here to view a larger version of this figure.

As indicated in Figure 8, it appears there was a large share of unique genetic influences (dark green) on letter naming fluency in kindergarten (36%), phoneme segmentation fluency in kindergarten (40%), and reading comprehension in seventh grade (30%). In contrast, word-level reading skills were to a lesser extent associated with unique genetic influences that arise in first grade (20%). Genetic influences on word-level reading skills were mostly overlapping (light green) with genetic influences on letter naming fluency and phoneme segmentation fluency (40%).

Figure 8
Figure 8: Percentage of unique and overlapping genetic influences on each reading skill. Please click here to view a larger version of this figure.

Focusing on the shared environmental influences (see Figure 9), the results implied that overlapping (light blue) shared environment influenced letter naming fluency and phoneme segmentation fluency in kindergarten (9%). Similarly, overlapping shared environmental effects were reflected in word-level reading skills in first grade (15%) and reading comprehension in seventh grade (39%) that were also shared with kindergarten reading skills. Unique shared environmental factors (dark blue) were found for first grade word-level reading skills (15%). These influences were independent of shared environmental influences in kindergarten.

Figure 9
Figure 9: Percentage of unique and overlapping shared environmental influences on each reading skill. Please click here to view a larger version of this figure.

For the non-shared environmental influences (see Figure 10), the results suggested very little overlap between factors (light yellow). Most non-shared environmental influences indicated unique influences (dark yellow) at each individual time point (i.e., grade).

Figure 10
Figure 10: Percentage of unique and overlapping non-shared environmental influences on each reading skill. Please click here to view a larger version of this figure.

The general representation of genetic and environmental factors underlying reading skills from elementary to middle school is shown in Figure 11. In general, it was shown that reading skills appear to be influenced by both genetic and environmental factors across this developmental period.

Figure 11
Figure 11: Total percentage of genetic, shared environmental, and non-shared environmental influences on each reading skill. Please click here to view a larger version of this figure.

Discussion

The objective of this study was to demonstrate how the well-established method within behavioral genetics, the multivariate Cholesky decomposition method, can effectively be used for understanding relations across variables in temporal context. Specifically, this method allows estimation of the extent to which unique genetic and environmental influences arise during particular time points (e.g., school grade), as well as demonstrating the overlap of genetic and environmental influences across many time points.

There is one critical step in the protocol, which is estimating the multivariate Cholesky decomposition model. Occasionally, the statistical modeling program script requires adjustments in starting values based on inputted data. Researchers can use different starting values, suggested by the output of the statistical modeling program, to enable smoother iteration processes in generating genetic and environmental estimates.

Modification to the protocol (i.e., the statistical modeling program script) may be necessary if latent variables replaced measured variables. In Mplus, a latent variable is defined using a “BY” statement with two or more measured variables. For example, in this annotated input script, the latent variable for each twin is defined in the section “MODEL:” as “FLU0 BY nneworf0* nnewnwf0 (3-4);” for the first twin in the twin pair, and “FLU1 BY nneworf1* nnewnwf1 (3-4);” for the second twin in the twin pair. Another modification refers to the graphical representation of results. Results may be plotted using alternative techniques, such as pie charts. Stacked columns were used here since this is the typical visualization technique in the field of behavior genetics.

There are limitations to the multivariate Cholesky decomposition method. The multivariate Cholesky decomposition is the gold standard analysis in behavior genetics if there is interest in exploring individual differences in longitudinal relations among different variables at different time points. If instead there is interest in testing developmentally driven questions aimed at exploring individual differences in development of one (the same) variable across multiple time points, then a simplex model (depicted in Figure 12) can be used. In the simplex model, A1, C1, and E1 factors present at time 1 partially persist until time 2 (the regression paths from A1 to A2, A2 to A3, …, C1 to C2, …, etc.), at which time new factors may enter (the residuals, labeled “Res.” in Figure 12).

The multivariate Cholesky decomposition model and simplex model produce exactly the same variance-covariance matrix if there are two time points, but any additional time points separate the two models. With more than two time points, a Cholesky decomposition method produces estimates between all time points. This can result in estimation of relations that are not as developmentally meaningful (e.g., genetic and environmental influences from grade 1 on grade 7). The simplex model, on the other hand, estimates relations that are developmentally relevant (e.g., genetic and environmental influences from grade 1 to grade 2, grade 2 to grade 3, etc.). The latter pattern reflects the natural trajectory of children through school, where they progress from one grade to the next. A more thorough description of the simplex model has been described10.

Figure 12
Figure 12: The simplex model. Please click here to view a larger version of this figure.

An additional limitation of the multivariate Cholesky decomposition method is that while it enables quantification of genetic and environmental influences, it does not identify them. Therefore, it cannot be determined specifically which genes influence the traits measured by the variables used in the analysis. Similarly, it can only be surmised which specific environments contributed to the shared environmental or non-shared environmental influences. In the current study, there was some empirical evidence about which potential specific environments may have been at play. For example, classroom reading environment in elementary school has been shown to have longitudinal effects on reading comprehension in middle school (study findings are based on this report)11. However, further progress must rely on additional analyses to identify other environments.

Despite these limitations, the Cholesky decomposition method is a popular approach to longitudinal, multivariate, behavioral genetic research questions. The technique is easy to program and solve. It offers a unique perspective into decomposing genetic and environmental relations among time points, thereby quantifying influences that are time point-specific while distinguishing them from influences that are overlapping across multiple time points.

Disclosures

The authors have nothing to disclose.

Acknowledgements

This research was supported in part by a grant from the National Institute of Child Health and Human Development (P50 HD052120). Views expressed herein are those of the authors and have neither been reviewed nor approved by the granting agencies.

Materials

Microsoft Office Excel Microsoft
Microsoft Office Powerpoint Microsoft
Microsoft Office Visio Microsoft
Microsoft Office Word Microsoft
Mplus Statistical Program Mplus

References

  1. Muter, V., Hulme, C., Snowling, M. J., Stevenson, J. Phonemes, rimes, vocabulary and grammatical skills as foundations of early reading development: Evidence from a longitudinal study. Developmental Psychology. 40 (5), 665-681 (2004).
  2. Schatschneider, C., Fletscher, J. M., Francis, D. J., Carlson, C. D., Foorman, B. R. Kindergarten prediction of reading skills: A longitudinal comparative analysis. Journal of Educational Psychology. 96 (2), 265-282 (2004).
  3. Byrne, B., et al. Longitudinal twin study of early literacy development: Preschool and kindergarten phases. Scientific Studies of Reading. 9 (3), 219-235 (2005).
  4. Christopher, M. E., et al. Genetic and environmental etiologies of the longitudinal relations between prereading skills and. Child Development. 86 (2), 342-361 (2015).
  5. Neale, M. C., Cardon, L. R. . Methodology for Genetic Studies of Twins and Families. , (1992).
  6. Byrne, B., et al. Genetic and environmental influences on early literacy. Journal of Research in Reading. 29 (1), 33-49 (2006).
  7. Byrne, B., et al. Genetic and environmental influences on aspects of literacy and language in early childhood: Continuity and change from preschool to grade 2. Journal of Neurolinguistics. 22 (3), 219-236 (2009).
  8. Erbeli, F., Hart, S. A., Taylor, J. Longitudinal associations among reading related skills and reading comprehension: A twin study. Child Development. 89 (6), e480-e493 (2018).
  9. Muthén, L. K., Muthén, B. O. . Mplus. The comprehensive modeling program for applied researchers: User’s guide. , (2012).
  10. Hart, S. A., et al. Exploring how nature and nurture affect the development of reading: An analysis of the Florida Twin Project on Reading. Developmental Psychology. 49 (10), 1971-1981 (2013).
  11. Taylor, J., Roehrig, A. D., Hensler, B. S., Connor, C. M., Schatschneider, C. Teacher quality moderates the genetic effects on early reading. Science. 328 (5977), 512-514 (2010).

Play Video

Cite This Article
Erbeli, F., Campbell, A. R., Hart, S. A. Using Cholesky Decomposition to Explore Individual Differences in Longitudinal Relations between Reading Skills. J. Vis. Exp. (151), e60061, doi:10.3791/60061 (2019).

View Video