On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins - Sorbonne Université Accéder directement au contenu
Article Dans Une Revue PLoS Computational Biology Année : 2021

On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins

Résumé

Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. A comparison between the results of Direct Coupling Analysis applied to real and to resampled data shows that the largest coevolutionary couplings, i.e. those used for contact prediction, are only weakly influenced by phylogeny. However, the phylogeny-induced spurious couplings in the resampled data are compatible in size with the first false-positive contact predictions from real data. Dissecting functional from phylogeny-induced couplings might therefore extend accurate contact predictions to the range of intermediate-size couplings.
Fichier principal
Vignette du fichier
journal.pcbi.1008957.pdf (3.32 Mo) Télécharger le fichier
Origine : Publication financée par une institution

Dates et versions

hal-03236225 , version 1 (26-05-2021)

Identifiants

Citer

Edwin Rodriguez Horta, Martin Weigt. On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins. PLoS Computational Biology, 2021, 17 (5), pp.e1008957. ⟨10.1371/journal.pcbi.1008957⟩. ⟨hal-03236225⟩
54 Consultations
79 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More