Performance of epistasis detection methods in semi-simulated GWAS - Sorbonne Université
Article Dans Une Revue BMC Bioinformatics Année : 2018

Performance of epistasis detection methods in semi-simulated GWAS

Clément Chatelain
  • Fonction : Auteur
Vincent Thuillier
  • Fonction : Auteur
Franck Augé

Résumé

Background: Part of the missing heritability in Genome Wide Association Studies (GWAS) is expected to be explained by interactions between genetic variants, also called epistasis. Various statistical methods have been developed to detect epistasis in case-control GWAS. These methods face major statistical challenges due to the number of tests required, the complexity of the Linkage Disequilibrium (LD) structure, and the lack of consensus regarding the definition of epistasis. Their limited impact in terms of uncovering new biological knowledge might be explained in part by the limited amount of experimental data available to validate their statistical performances in a realistic GWAS context. In this paper, we introduce a simulation pipeline for generating real scale GWAS data, including epistasis and realistic LD structure. We evaluate five exhaustive bivariate interaction methods, fastepi, GBOOST, SHEsisEpi, DSS, and IndOR. Two hundred thirty four different disease scenarios are considered in extensive simulations. We report the performances of each method in terms of false positive rate control, power, area under the ROC curve (AUC), and computation time using a GPU. Finally we compare the result of each methods on a real GWAS of type 2 diabetes from the Welcome Trust Case Control Consortium. Results: GBOOST, SHEsisEpi and DSS allow a satisfactory control of the false positive rate. fastepi and IndOR present an increase in false positive rate in presence of LD between causal SNPs, with our definition of epistasis. DSS performs best in terms of power and AUC in most scenarios with no or weak LD between causal SNPs. All methods can exhaustively analyze a GWAS with 6.10 5 SNPs and 15,000 samples in a couple of hours using a GPU. Conclusion: This study confirms that computation time is no longer a limiting factor for performing an exhaustive search of epistasis in large GWAS. For this task, using DSS on SNP pairs with limited LD seems to be a good strategy to achieve the best statistical performance. A combination approach using both DSS and GBOOST is supported by the simulation results and the analysis of the WTCCC dataset demonstrated that this approach can detect distinct genes in epistasis. Finally, weak epistasis between common variants will be detectable with existing methods when GWAS of a few tens of thousands cases and controls are available.
Fichier principal
Vignette du fichier
s12859-018-2229-8.pdf (1.32 Mo) Télécharger le fichier
Origine Publication financée par une institution
Loading...

Dates et versions

hal-01832976 , version 1 (09-07-2018)

Licence

Identifiants

Citer

Clément Chatelain, Guillermo Durand, Vincent Thuillier, Franck Augé. Performance of epistasis detection methods in semi-simulated GWAS. BMC Bioinformatics, 2018, 19, pp.231. ⟨10.1186/s12859-018-2229-8⟩. ⟨hal-01832976⟩
322 Consultations
86 Téléchargements

Altmetric

Partager

More