Performance of epistasis detection methods in semi-simulated GWAS - Sorbonne Université
Journal Articles BMC Bioinformatics Year : 2018

Performance of epistasis detection methods in semi-simulated GWAS

Clément Chatelain
  • Function : Author
Vincent Thuillier
  • Function : Author
Franck Augé

Abstract

Background: Part of the missing heritability in Genome Wide Association Studies (GWAS) is expected to be explained by interactions between genetic variants, also called epistasis. Various statistical methods have been developed to detect epistasis in case-control GWAS. These methods face major statistical challenges due to the number of tests required, the complexity of the Linkage Disequilibrium (LD) structure, and the lack of consensus regarding the definition of epistasis. Their limited impact in terms of uncovering new biological knowledge might be explained in part by the limited amount of experimental data available to validate their statistical performances in a realistic GWAS context. In this paper, we introduce a simulation pipeline for generating real scale GWAS data, including epistasis and realistic LD structure. We evaluate five exhaustive bivariate interaction methods, fastepi, GBOOST, SHEsisEpi, DSS, and IndOR. Two hundred thirty four different disease scenarios are considered in extensive simulations. We report the performances of each method in terms of false positive rate control, power, area under the ROC curve (AUC), and computation time using a GPU. Finally we compare the result of each methods on a real GWAS of type 2 diabetes from the Welcome Trust Case Control Consortium. Results: GBOOST, SHEsisEpi and DSS allow a satisfactory control of the false positive rate. fastepi and IndOR present an increase in false positive rate in presence of LD between causal SNPs, with our definition of epistasis. DSS performs best in terms of power and AUC in most scenarios with no or weak LD between causal SNPs. All methods can exhaustively analyze a GWAS with 6.10 5 SNPs and 15,000 samples in a couple of hours using a GPU. Conclusion: This study confirms that computation time is no longer a limiting factor for performing an exhaustive search of epistasis in large GWAS. For this task, using DSS on SNP pairs with limited LD seems to be a good strategy to achieve the best statistical performance. A combination approach using both DSS and GBOOST is supported by the simulation results and the analysis of the WTCCC dataset demonstrated that this approach can detect distinct genes in epistasis. Finally, weak epistasis between common variants will be detectable with existing methods when GWAS of a few tens of thousands cases and controls are available.
Fichier principal
Vignette du fichier
s12859-018-2229-8.pdf (1.32 Mo) Télécharger le fichier
Origin Publication funded by an institution
Loading...

Dates and versions

hal-01832976 , version 1 (09-07-2018)

Licence

Identifiers

Cite

Clément Chatelain, Guillermo Durand, Vincent Thuillier, Franck Augé. Performance of epistasis detection methods in semi-simulated GWAS. BMC Bioinformatics, 2018, 19, pp.231. ⟨10.1186/s12859-018-2229-8⟩. ⟨hal-01832976⟩
314 View
85 Download

Altmetric

Share

More