On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins - Sorbonne Université
Journal Articles PLoS Computational Biology Year : 2021

On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins

Abstract

Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. A comparison between the results of Direct Coupling Analysis applied to real and to resampled data shows that the largest coevolutionary couplings, i.e. those used for contact prediction, are only weakly influenced by phylogeny. However, the phylogeny-induced spurious couplings in the resampled data are compatible in size with the first false-positive contact predictions from real data. Dissecting functional from phylogeny-induced couplings might therefore extend accurate contact predictions to the range of intermediate-size couplings.
Fichier principal
Vignette du fichier
journal.pcbi.1008957.pdf (3.32 Mo) Télécharger le fichier
Origin Publication funded by an institution

Dates and versions

hal-03236225 , version 1 (26-05-2021)

Identifiers

Cite

Edwin Rodriguez Horta, Martin Weigt. On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins. PLoS Computational Biology, 2021, 17 (5), pp.e1008957. ⟨10.1371/journal.pcbi.1008957⟩. ⟨hal-03236225⟩
59 View
92 Download

Altmetric

Share

More