Detecting gene–gene interactions that underlie human diseases

Cordell, Heather J

doi:10.1038/nrg2579

Review Article
Published: June 2009

Detecting gene–gene interactions that underlie human diseases

Heather J Cordell¹

Nature Reviews Genetics volume 10, pages 392–404 (2009)Cite this article

14k Accesses
26 Altmetric
Metrics details

Key Points

Interactions between genetic loci might reduce the power to detect genetic effects in genetic association studies, if these interactions are not allowed for.
Statistical interaction corresponds to a departure from the additive effects of two or more variables in a linear model describing the relationship between an outcome and predictor variables.
A variety of methods can be used to test for statistical interaction between predictor variables that encode the genotype and an outcome variable corresponding to the disease phenotype.
Logistic regression is one method that can be used either to test for interaction, or to test for association while allowing for interaction.
Given genome-wide data, an exhaustive search is feasible for investigating two-way interactions (that is, all pairwise combinations of loci) but not for investigation of higher-order interactions.
Filtering approaches allow one to reduce the number of loci considered and thus the number of interaction tests performed.
Data-mining or machine-learning methods, such as random forests and Multifactor Dimensionality Reduction (MDR), can allow one to search through the space of possible interactions.
Bayesian model selection approaches offer an alternative approach for searching through the space of possible interactions.
The biological interpretation of statistical interactions is complex. The degree to which statistical interaction implies interaction or synergism in a causal sense might be extremely limited.

Abstract

Following the identification of several disease-associated polymorphisms by genome-wide association (GWA) analysis, interest is now focusing on the detection of effects that, owing to their interaction with other genetic or environmental factors, might not be identified by using standard single-locus tests. In addition to increasing the power to detect associations, it is hoped that detecting interactions between loci will allow us to elucidate the biological and biochemical pathways that underpin disease. Here I provide a critical survey of the methods and related software packages currently used to detect the interactions between genetic loci that contribute to human genetic disease. I also discuss the difficulties in determining the biological relevance of statistical interactions.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Semi-exhaustive search of pairwise interactions between 89,294 SNPs.**

**Figure 2: Random Jungle analysis of 89,294 SNPs.**

**Figure 3: Multifactor Dimensionality Reduction (MDR) and Tuned ReliefF (TuRF) analysis of 6,113 SNPs.**

**Figure 4: Bayesian Epistasis Association Mapping (BEAM) analysis of 47,727 SNPs.**

Genome-wide association studies

Article 26 August 2021

High-definition likelihood inference of genetic correlations across human complex traits

Article 29 June 2020

Genotype × environment interactions in gene regulation and complex traits

Article 10 June 2024

References

WTCCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007). In this study of 17,000 individuals, many new complex trait loci were identified and key methodological and technical issues related to GWA studies were explored.
Easton, D. F. et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447, 1087–1093 (2007).
Article CAS PubMed PubMed Central Google Scholar
Frayling, T. M. et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316, 889–894 (2007).
Article CAS PubMed PubMed Central Google Scholar
Plenge, R. M. et al. TRAF1-C5 as a risk locus for rheumatoid arthritis — a genome-wide study. N. Engl. J. Med. 357, 1199–1209 (2007).
Article CAS PubMed PubMed Central Google Scholar
Fellay, J. et al. A whole-genome association study of major determinants for host control of HIV-1. Science 317, 944–947 (2007).
Article CAS PubMed PubMed Central Google Scholar
Culverhouse, R., Suarez, B. K., Lin, J. & Reich, T. A perspective on epistasis: limits of models displaying no main effect. Am. J. Hum. Genet. 70, 461–471 (2002).
Article PubMed PubMed Central Google Scholar
Moore, J. H. The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum. Hered. 56, 73–82 (2003).
Article PubMed Google Scholar
Ritchie, M. D. et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69, 138–147 (2001). This was the original paper describing the popular MDR method.
Article CAS PubMed PubMed Central Google Scholar
Hahn, L. W., Ritchie, M. D. & Moore, J. H. Multifactor dimensionality reduction software for detecting gene–gene and gene–environment interactions. Bioinformatics 19, 376–382 (2003).
Article CAS PubMed Google Scholar
Moore, J. H. Computational analysis of gene–gene interactions using multifactor dimensionality reduction. Expert Rev. Mol. Diagn. 4, 795–803 (2004).
Article CAS PubMed Google Scholar
Chung, Y., Lee, S. Y., Elston, R. C. & Park, T. Odds ratio based multifactor-dimensionality reduction method for detecting gene–gene interactions. Bioinformatics 23, 71–76 (2007).
Article CAS PubMed Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. & Liu, J. S. Bayesian inference of epistatic interactions in case–control studies. Nature Genet. 39, 1167–1173 (2007). This paper proposed a new Bayesian approach for the detection of loci that might interact in the context of GWA studies. The related BEAM software package provides a computationally efficient implementation of the proposed algorithm.
Article CAS PubMed Google Scholar
Ferreira, T., Donnelly, P. & Marchini, J. Powerful Bayesian gene–gene interaction analysis. Am. J. Hum. Genet. 81 (Suppl.), 32 (2007).
Google Scholar
Gayan, J. et al. A method for detecting epistasis in genome-wide studies using case–control multi-locus association analysis. BMC Genomics 9, 360 (2008).
Article CAS PubMed PubMed Central Google Scholar
Kraft, P., Yen, Y. C., Stram, D. O., Morrison, J. & Gauderman, W. J. Exploiting gene–environment interaction to detect genetic associations. Hum. Hered. 63, 111–119 (2007).
Article CAS PubMed Google Scholar
Fisher, R. The correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edin. 52, 399–433 (1918).
Article Google Scholar
Hayman, B. I. & Mather, K. The description of genetic interactions in continuous variation. Biometrics 11, 69–82 (1955).
Article Google Scholar
Zeng, Z. B., Wang, T. & Zou, W. Modeling quantitative trait loci and interpretation of models. Genetics 169, 1711–1725 (2005). This paper includes an excellent discussion of issues in the definition and interpretation of interaction in quantitative genetic studies of derived populations (inbred lines).
Article CAS PubMed PubMed Central Google Scholar
Phillips, P. C. Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems. Nature Rev. Genet. 9, 855–867 (2008). An excellent review describing the differing definitions and interpretations of epistasis.
Article CAS PubMed Google Scholar
Cordell, H. J. Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum. Mol. Genet. 11, 2463–2468 (2002).
Article CAS PubMed Google Scholar
Cordell, H. J., Todd, J. A., Bennett, S. T., Kawaguchi, Y. & Farrall, M. Two-locus maximum lod score analysis of a multifactorial trait: joint consideration of IDDM2 and IDDM4 with IDDM1 in type 1 diabetes. Am. J. Hum. Genet. 57, 920–934 (1995).
CAS PubMed PubMed Central Google Scholar
Cox, N. J. et al. Loci on chromosomes 2 (NIDDM1) and 15 interact to increase susceptibility to diabetes in Mexican Americans. Nature Genet. 21, 213–215 (1999).
Article CAS PubMed Google Scholar
Cordell, H. J., Wedig, G. C., Jacobs, K. B. & Elston, R. C. Multilocus linkage tests based on affected relative pairs. Am. J. Hum. Genet. 66, 1273–1286 (2000).
Article CAS PubMed PubMed Central Google Scholar
Strauch, K., Fimmers, R., Baur, M. & Wienker, T. F. How to model a complex trait 2. Analysis with two disease loci. Hum. Hered. 56, 200–211 (2003).
Article PubMed Google Scholar
Armitage, P., Berry, G. & Matthews, J. N. S. Statistical Methods in Medical Research 4th edn (Blackwell Science, Chichester, 2002).
Book Google Scholar
McCullagh, P. & Nelder, J. A. Generalized Linear Models (Chapman & Hall, London, 1989).
Book Google Scholar
Neuman, R. J. & Rice, J. P. Two-locus models of disease. Genet. Epidemiol. 9, 347–365 (1992).
Article CAS PubMed Google Scholar
Li, W. & Reich, J. A complete enumeration and classification of two-locus disease models. Hum. Hered. 50, 334–349 (2000).
Article CAS PubMed Google Scholar
Hallgrimsdottir, I. B. & Yuster, D. S. A complete classification of epistatic two-locus models. BMC Genet. 9, 17 (2008).
Article PubMed PubMed Central Google Scholar
McKinney, B. A., Reif, D. M., Ritchie, M. D. & Moore, J. H. Machine learning for detecting gene–gene interactions: a review. Appl. Bioinformatics 5, 77–88 (2006).
Article CAS PubMed PubMed Central Google Scholar
Piegorsch, W. W., Weinberg, C. R. & Taylor, J. A. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case–control studies. Stat. Med. 13, 153–162 (1994). An important paper showing the use of case-only designs for detection of gene–environment interactions in epidemiological studies.
Article CAS PubMed Google Scholar
Yang, Q., Khoury, M. J., Sun, F. & Flanders, W. D. Case-only design to measure gene–gene interaction. Epidemiology 10, 167–170 (1999).
Article CAS PubMed Google Scholar
Weinberg, C. R. & Umbach, D. M. Choosing a retrospective design to assess joint genetic and environmental contributions to risk. Am. J. Epidemiol. 152, 197–203 (2000).
Article CAS PubMed Google Scholar
Mukherjee, B. et al. Tests for gene–environment interaction from case–control data: a novel study of type I error, power and designs. Genet. Epidemiol. 32, 615–626 (2008).
Article PubMed Google Scholar
Zhao, J., Jin, L. & Xiong, M. Test for interaction between two unlinked loci. Am. J. Hum. Genet. 79, 831–845 (2006).
Article CAS PubMed PubMed Central Google Scholar
Hoh, J. & Ott, J. Mathematical multi-locus approaches to localizing complex human trait genes. Nature Rev. Genet. 4, 701–709 (2003).
Article CAS PubMed Google Scholar
Mukherjee, B. & Chatterjee, N. Exploiting gene–environment independence for analysis of case–control studies: an empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics 64, 685–694 (2008).
Article PubMed Google Scholar
Yang, Y., Houle, A. M., Letendre, J. & Richter, A. RET Gly691Ser mutation is associated with primary vesicoureteral reflux in the French-Canadian population from Quebec. Hum. Mutat. 29, 695–702 (2008).
Article CAS PubMed Google Scholar
Moore, J. H. et al. A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J. Theor. Biol. 241, 252–261 (2006).
Article PubMed Google Scholar
Chanda, P. et al. Information-theoretic metrics for visualizing gene–environment interactions. Am. J. Hum. Genet. 81, 939–963 (2007).
Article CAS PubMed PubMed Central Google Scholar
Kang, G. et al. An entropy-based approach for testing genetic epistasis underlying complex diseases. J. Theor. Biol. 250, 362–374 (2008).
Article CAS PubMed Google Scholar
Dong, C. et al. Exploration of gene–gene interaction effects using entropy-based methods. Eur. J. Hum. Genet. 16, 229–235 (2008).
Article CAS PubMed Google Scholar
Zwick, M. An overview of reconstructability analysis. Kybernetes 33, 877–905 (2004). An excellent overview of some of the principles and techniques used in information-theory modelling of frequency and probability distributions.
Article Google Scholar
Cordell, H. J. & Clayton, D. G. A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes. Am. J. Hum. Genet. 70, 124–141 (2002).
Article CAS PubMed Google Scholar
Cordell, H. J., Barratt, B. J. & Clayton, D. G. Case/pseudocontrol analysis in genetic association studies: a unified framework for detection of genotype and haplotype associations, gene–gene and gene–environment interactions and parent-of-origin effects. Genet. Epidemiol. 26, 167–185 (2004). This paper describes a regression-based framework for the analysis of family-based data that allows tests of interaction that are similar to the tests often used in case–control studies to be performed.
Article PubMed Google Scholar
Martin, E. R., Ritchie, M. D., Hahn, L., Kang, S. & Moore, J. H. A novel method to identify gene–gene effects in nuclear families: the MDR-PDT. Genet. Epidemiol. 30, 111–123 (2006).
Article CAS PubMed Google Scholar
Kotti, S., Bickeboller, H. & Clerget-Darpoux, F. Strategy for detecting susceptibility genes with weak or no marginal effect. Hum. Hered. 63, 85–92 (2007).
Article CAS PubMed Google Scholar
Lou, X. Y. et al. A combinatorial approach to detecting gene–gene and gene–environment interactions in family studies. Am. J. Hum. Genet. 83, 457–467 (2008).
Article CAS PubMed PubMed Central Google Scholar
Gauderman, W. J. Sample size requirements for association studies of gene–gene interaction. Am. J. Epidemiol. 155, 478–484 (2002).
Article PubMed Google Scholar
Hein, R., Beckmann, L. & Chang-Claude, J. Sample size requirements for indirect association studies of gene–environment interactions (G x E). Genet. Epidemiol. 32, 235–245 (2008).
Article PubMed Google Scholar
Marchini, J., Donnelly, P. & Cardon, L. R. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genet. 37, 413–417 (2005). This paper highlights the importance and feasibility of fitting interaction models using GWA data.
Article CAS PubMed Google Scholar
Chapman, J. & Clayton, D. Detecting association using epistatic information. Genet. Epidemiol. 31, 894–909 (2007).
Article PubMed Google Scholar
Motsinger, A., Lee, S., Mellick, G. & Ritchie, M. GPNN: power studies and applications of a neural network method for detecting gene–gene interactions in studies of human disease. BMC Bioinformatics 7, 39 (2006).
Article CAS PubMed PubMed Central Google Scholar
Motsinger-Reif, A. A., Dudek, S. M., Hahn, L. W. & Ritchie, M. D. Comparison of approaches for machine-learning optimization of neural networks for detecting gene–gene interactions in genetic epidemiology. Genet. Epidemiol. 32, 325–340 (2008).
Article PubMed Google Scholar
Lunn, D. J., Whittaker, J. C. & Best, N. A Bayesian toolkit for genetic association studies. Genet. Epidemiol. 30, 231–247 (2006).
Article PubMed Google Scholar
Hoh, J. et al. Selecting SNPs in two-stage analysis of disease association data: a model-free approach. Ann. Hum. Genet. 64, 413–417 (2000).
Article CAS PubMed Google Scholar
Millstein, J., Conti, D. V., Gilliland, F. D. & Gauderman, W. J. A testing framework for identifying susceptibility genes in the presence of epistasis. Am. J. Hum. Genet. 78, 15–27 (2006).
Article CAS PubMed Google Scholar
ochdanovits, Z. et al. Genome-wide prediction of functional gene–gene interactions inferred from patterns of genetic differentiation in mice and men. PLoS ONE 3, e1593 (2008).
Article CAS Google Scholar
Emily, M., Mailund, T., Schauser, L. & Schierup, M. H. Using biological networks to search for interacting loci in genomewide association studies. Eur. J. Hum. Genet. 11 Mar 2009 (doi: 10.1038/ejhg.2009.15).
Article CAS PubMed PubMed Central Google Scholar
Moore, J. H. & Williams, S. M. New strategies for identifying gene–gene interactions in hypertension. Ann. Med. 34, 88–95 (2002).
Article CAS PubMed Google Scholar
Golub, G., Heath, M. & Wahba, G. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21, 215–224 (1979).
Article Google Scholar
Velez, D. R. et al. A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet. Epidemiol. 31, 306–315 (2007).
Article PubMed Google Scholar
Copas, J. B. Regression, prediction and shrinkage. J. Roy. Stat. Soc., Series B 45, 311–354 (1983).
Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. H. The Elements of Statistical Learning: Data Mining, Inference and Prediction (Springer, New York, 2001).
Book Google Scholar
Lee, A. & Silvapulle, M. Ridge estimation in logistic regression. Comm. Stat. Simul. Comput. 17, 1231–1257 (1988).
Article Google Scholar
Le Cessie, S. & Van Houwelingen, J. Ridge estimators in logistic regression. Appl. Stat. 41, 191–201 (1992).
Article Google Scholar
Efron, B., Hastie, T., Johnstone, I. & Tibshirani, R. Least angle regression. Ann. Statist. 32, 407–499 (2004).
Article Google Scholar
Park, M. Y. & Hastie, T. Penalized logistic regression for detecting gene interactions. Biostatistics 9, 30–50 (2008).
Article PubMed Google Scholar
Zhang, Z., Zhang, S., Wong, M. Y., Wareham, N. H. & Sha, Q. An ensemble learning approach jointly modelling main and interaction effects in genetic association studies. Genet. Epidemiol. 32, 285–300 (2008).
Article PubMed PubMed Central Google Scholar
Zhang, H. & Bonney, G. Use of classification trees for association studies. Genet. Epidemiol. 19, 323–332 (2000).
Article CAS PubMed Google Scholar
Nelson, M. R., Kardia, S. L., Ferrell, R. E. & Sing, C. F. A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Res. 11, 458–470 (2001).
Article CAS PubMed PubMed Central Google Scholar
Culverhouse, R., Klein, T. & Shannon, W. Detecting epistatic interactions contributing to quantitative traits. Genet. Epidemiol. 27, 141–152 (2004).
Article PubMed Google Scholar
McKinney, B. A., Crowe, J. E., Guo, J. & Tian, D. Capturing the spectrum of interaction effects in genetic association studies by simulated evaporative cooling network analysis. PLoS Genet. 5, e1000432 (2009).
Article CAS PubMed PubMed Central Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Lunetta, K. L., Hayward, L. B., Segal, J. & Van Eerdewegh, P. Screening large-scale association study data: exploiting interactions using random forests. BMC Genet. 5, 32 (2004).
Article CAS PubMed PubMed Central Google Scholar
Bureau, A. et al. Identifying SNPs predictive of phenotype using random forests. Genet. Epidemiol. 28, 171–182 (2005).
Article PubMed Google Scholar
Schwartz, D. F., Ziegler, A. & König, I. R. Beyond the results of genome-wide association studies. Genet. Epidemiol. 32, 671 (2008).
Google Scholar
Kooperberg, C., Ruczinski, I., LeBlanc, M. & Hsu, L. Sequence analysis using logic regression. Genet. Epidemiol. 21, S626–S631 (2001).
Article PubMed Google Scholar
Kooperberg, C. & Ruczinski, I. Identifying interacting SNPs using Monte Carlo logic regression. Genet. Epidemiol. 28, 157–170 (2005).
Article PubMed Google Scholar
Nunkesser, R., Bernholt, T., Schwender, H., Ickstadt, K. & Wegener, I. Detecting high-order interactions of single nucleotide polymorphisms using genetic programming. Bioinformatics 23, 3280–3288 (2007).
Article CAS PubMed Google Scholar
Li, Z., Zheng, T., Califano, A. & Floratos, A. Pattern-based mining strategy to detect multi-locus association and gene × environment interaction. BMC Proc. 1(Suppl. 1), S16 (2007).
Article PubMed PubMed Central Google Scholar
Long, Q., Zhang, Q. & Ott, J. Detecting disease-associated genotype patterns. BMC Bioinform. 10(Suppl. 1), S75 (2009).
Article CAS Google Scholar
Cho, Y. M. et al. Multifactor-dimensionality reduction shows a two-locus interaction associated with type 2 diabetes mellitus. Diabetologia 47, 549–554 (2004).
Article CAS PubMed Google Scholar
Julia, A. et al. Identification of a two-loci epistatic interaction associated with susceptibility to rheumatoid arthritis through reverse engineering and multifactor dimensionality reduction. Genomics 90, 6–13 (2007).
Article CAS PubMed Google Scholar
Tsai, C. T. et al. Renin–angiotensin system gene polymorphisms and coronary artery disease in a large angiographic cohort: detection of high order gene–gene interaction. Atherosclerosis 195, 172–180 (2007).
Article CAS PubMed Google Scholar
Lee, S. Y., Chung, Y., Elston, R. C., Kim, Y. & Park, T. Log-linear model based multifactor-dimensionality reduction method to detect gene–gene interactions. Bioinformatics 23, 2589–2595 (2007).
Article CAS PubMed Google Scholar
Lou, X. Y. et al. A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence. Am. J. Hum. Genet. 80, 1125–1137 (2007).
Article CAS PubMed PubMed Central Google Scholar
Robnik-Sikonja, M. & Kononenko, I. Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53, 23–69 (2003).
Article Google Scholar
Moore, J. H. & White, B. C. Tuning ReliefF for genome-wide genetic analysis. Lect. Notes Comp. Sci. 4447, 166–175 (2007).
Article Google Scholar
McKinney, B. A., Reif, D. M., White, B. C., Crowe, J. & Moore, J. H. Evaporative cooling feature selection for genotypic data involving interactions. Bioinformatics 23, 2113–2120 (2007).
Article CAS PubMed PubMed Central Google Scholar
Gelman, A., Carlin, J. B., Stern, H. S. & Rubin, D. B. Bayesian Data Analysis (Chapman and Hall, London, 1995).
Google Scholar
Gilks, W. R., Richardson, S. & Spiegelhalter, D. J. Markov Chain Monte Carlo in Practice (Chapman and Hall, London, 1996).
Google Scholar
Hoggart, C. J., Whittaker, J. C., De Iorio, M. & Balding, D. J. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet. 4, e1000130 (2008).
Article CAS PubMed PubMed Central Google Scholar
Phillips, P. C. The language of gene interaction. Genetics 149, 1167–1171 (1998). An important paper that describes the differing definitions and interpretations of epistasis used in different fields and the lack of equivalence between these definitions.
CAS PubMed PubMed Central Google Scholar
Moore, J. H. & Williams, S. M. Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. Bioessays 27, 637–646 (2005).
Article CAS PubMed Google Scholar
Cheverud, J. M. & Routman, E. J. Epistasis and its contribution to genetic variance components. Genetics 139, 1455–1461 (1995).
CAS PubMed PubMed Central Google Scholar
Alvarez-Castro, J. M. & Carlborg, O. A unified model for functional and statistical epistasis and its application in quantitative trait loci analysis. Genetics 176, 1151–1167 (2007).
Article PubMed PubMed Central Google Scholar
McClay, J. L. & van den Oord, E. J. Variance component analysis of polymorphic metabolic systems. J. Theor. Biol. 240, 149–159 (2006).
Article CAS PubMed Google Scholar
Thompson, W. D. Effect modification and the limits of biological inference from epidemiologic data. J. Clin. Epidemiol. 44, 221–232 (1991).
Article CAS PubMed Google Scholar
Siemiatycki, J. & Thomas, D. C. Biological models and statistical interactions: an example from multistage carcinogenesis. Int. J. Epidemiol. 10, 383–387 (1981).
Article CAS PubMed Google Scholar
Greenland, S. Interactions in epidemiology: relevance, identification, and estimation. Epidemiology 20, 14–17 (2009). A useful commentary on the relationship between statistical and biological interaction assessed from epidemiological studies.
Article PubMed Google Scholar
Gibson, G. Epistasis and pleiotropy as natural properties of transcriptional regulation. Theor. Popul. Biol. 49, 58–89 (1996).
Article CAS PubMed Google Scholar
Vanderweele, T. J. Sufficient cause interactions and statistical interactions. Epidemiology 20, 6–13 (2009).
Article PubMed Google Scholar
Todd, J. et al. Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nature Genet. 39, 857–864 (2007).
Article CAS PubMed Google Scholar
Zeggini, E. et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316, 1336–1341 (2007).
Article CAS PubMed PubMed Central Google Scholar
Sepulveda, N., Paulino, C. D., Carneiro, J. & Penha-Goncalves, C. Allelic penetrance approach as a tool to model two-locus interaction in complex binary traits. Heredity 99, 173–184 (2007).
Article CAS PubMed Google Scholar
Sepulveda, N., Paulino, C. D. & Penha-Goncalves, C. Bayesian analysis of allelic penetrance models for complex binary traits. Comp. Stat. Data Anal. 53, 1271–1283 (2009).
Article Google Scholar
Aylor, D. L. & Zeng, Z. B. From classical genetics to quantitative genetics to systems biology: modeling epistasis. PLoS Genet. 4, e1000029 (2008).
Article CAS PubMed PubMed Central Google Scholar
Curtis, D. Allelic association studies of genome wide association data can reveal errors in marker position assignments. BMC Genet. 8, 30 (2007).
Article CAS PubMed PubMed Central Google Scholar
Breiman, L., Freidman, J. H., Olshen, R. A. & Stone, C. J. Classification and Regression Trees (Chapman and Hall/CRC, New York, 1984).
Google Scholar
Bastone, L., Reilly, M., Rader, D. J. & Foulkes, A. S. MDR and PRP: a comparison of methods for high-order genotype–phenotype associations. Hum. Hered. 58, 82–92 (2004).
Article CAS PubMed Google Scholar
Strobl, C., Boulesteix, A. L., Zeileis, A. & Hothorn, T. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics 8, 25 (2007). This paper gives an overview of some of the strengths and limitations of random forests analysis for measuring variable importance.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Support for this work was provided by the Wellcome Trust (Grant reference 074524). I thank J. Barrett for assistance with interpretation of the WTCCC Crohn's results, and the WTCCC for making their data freely available. I also thank J. Moore for useful discussions of data-mining methods in general and MDR in particular, and K. Keen for pointing out the origins of the term epistasis.

Author information

Authors and Affiliations

Institute of Human Genetics, Newcastle University, International Centre for Life, Central Parkway, Newcastle upon Tyne NE1, 3BZ, UK
Heather J Cordell

Authors

Heather J Cordell
View author publications
You can also search for this author inPubMed Google Scholar

Glossary

Data mining: The process of extracting hidden patterns and potentially useful information from large amounts of data.
Machine learning: The ability of a program to learn from experience, that is, to modify its execution on the basis of newly acquired information. A major focus of machine-learning research is to automatically produce models (rules and patterns) from data.
Bayesian model selection: A statistical approach for selecting models by incorporating both prior distributions for parameters of the models and the observed experimental data.
Maximum likelihood: A statistical approach that is used to make inferences about the combination of parameter values that gives the greatest probability of obtaining the observed data.
Saturated: A term for a statistical model that is as full as possible (saturated) with parameters. Such a model is sometimes useful as it serves as a benchmark to quantify how well a simpler model (one with fewer parameters) fits the data.
Penetrance: The probability of displaying a particular phenotype (for example, succumbing to a disease) given that one has a specific genotype.
Marginal effects: The average effects (for example, penetrances) of a single variable, averaged over the possible values taken by other variables. These could be calculated for one locus of a two-locus system as the average of the two-locus penetrances, averaged over the three possible genotypes at the other locus.
Logistic regression model: A statistical model that is used when the outcome is binary. It relates the log odds of the probability of an event to a linear combination of the predictor variables.
Multinomial regression: A statistical approach, similar to logistic regression, which is used when the outcome takes one of several possible categorical values.
Confounding: A phenomenon whereby the measure of association between two variables is distorted because other variables, associated with both variables of interest, are not controlled for in the calculation.
Empirical Bayes procedure: A hierarchical model in which the hyperparameter is not a random variable but is estimated by another (often classical) method.
Information theory: A branch of applied mathematics involving the quantification of information.
Entropy: A key measure used in information theory that quantifies the uncertainty associated with a random variable. For example, a variable indicating the outcome from a toss of a coin will have less entropy than a variable indicating the outcome from a roll of a die (two versus six equally likely outcomes).
Permutation: This method is often used in hypothesis testing. An empirical distribution of a test statistic is obtained by permuting the original sample many times and recalculating the value of the test statistic in each permuted data set. Each permuted sample is considered to be a sample of the population under the null hypothesis.
Multiple testing: An analysis in which multiple independent hypotheses are tested. If a large number of tests are performed, the significance level (p value) of any particular test must be interpreted in light of this fact, as the overall combined probability of making a type I error will increase.
Bonferroni correction: The simplest correction of individual p values for multiple hypothesis testing can be calculated using p_corrected = 1 – (1 – p_uncorrected)ⁿ, in which n is the number of hypotheses tested. This formula assumes that the hypotheses are all independent, and simplifies to p_corrected = np_uncorrected when np_uncorrected <<1.
Q–Q plot: A quantile–quantile plot is a diagnostic plot that can be used to compare the distribution of observed test statistics with the distribution expected under the null hypothesis. Those tests that lie significantly above the line of equality between observed and expected quantiles are considered significant in the context of the number of tests performed.
High-dimensional data: Data that contain information on a large number of variables, albeit possibly measured in a small number of subjects or replicates.
Cross-validation: This approach involves partitioning a data set into smaller subsamples, performing an analysis in one subsample and using the other subsample to measure or validate how well the analysis has performed. To reduce variability, multiple rounds of cross-validation are often performed using different partitions of the data and the validation results are averaged over the rounds.
Overfitting: The phenomenon in which a complex model might provide a good fit to the current data set but is overfitted to the random quirks present in that particular data set and therefore cannot be generalized to future data sets in the way that a simpler model might be.
Bootstrap samples: These are data sets obtained by taking a random sample of the original data, usually with replacement. One then applies the same analysis as was applied to the real data. This is repeated many times, allowing one to assess the variability in results incurred owing to random sampling.
Frequentist: A statistical approach for testing hypotheses by assessing the strength of evidence for the hypothesis provided by the data.
Burn-in period: In Markov chain Monte Carlo analysis, a period at the start of the computation in which the values taken by the parameters are ignored when constructing the posterior distribution.
Compositional epistasis: The blocking of one allelic effect by an allele at another locus.
Statistical epistasis: The average effect of substitution of alleles at combinations of loci, with respect to the average genetic background of the population.
Functional epistasis: The molecular interactions that proteins and other genetic elements have with one another.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cordell, H. Detecting gene–gene interactions that underlie human diseases. Nat Rev Genet 10, 392–404 (2009). https://doi.org/10.1038/nrg2579

Download citation

Issue Date: June 2009
DOI: https://doi.org/10.1038/nrg2579

Detecting gene–gene interactions that underlie human diseases

Key Points

Abstract

Access options

Similar content being viewed by others

Genome-wide association studies

High-definition likelihood inference of genetic correlations across human complex traits

Genotype × environment interactions in gene regulation and complex traits

References

Acknowledgements

Author information

Authors and Affiliations

Supplementary information

Supplementary Box S1

Supplementary Box S2

Supplementary Table 1

Related links

DATABASES

OMIM

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

Search

Quick links

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Key Points

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Supplementary information

Related links

Related links

DATABASES

OMIM

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.