On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins

Edwin Rodriguez Horta; Martin Weigt

doi:10.1371/journal.pcbi.1008957

Article Dans Une Revue PLoS Computational Biology Année : 2021

On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins

(1, 2) , (2)

1
2

Edwin Rodriguez Horta

Fonction : Auteur

University of Havana = Universidad de la Habana

Biologie Computationnelle et Quantitative = Laboratory of Computational and Quantitative Biology

Martin Weigt

Fonction : Auteur
PersonId : 1099728

Biologie Computationnelle et Quantitative = Laboratory of Computational and Quantitative Biology

Résumé

Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. A comparison between the results of Direct Coupling Analysis applied to real and to resampled data shows that the largest coevolutionary couplings, i.e. those used for contact prediction, are only weakly influenced by phylogeny. However, the phylogeny-induced spurious couplings in the resampled data are compatible in size with the first false-positive contact predictions from real data. Dissecting functional from phylogeny-induced couplings might therefore extend accurate contact predictions to the range of intermediate-size couplings.

Domaines

Sciences du Vivant [q-bio]

Fichier principal

journal.pcbi.1008957.pdf (3.32 Mo)

Origine	Publication financée par une institution

Gestionnaire HAL 4 Sorbonne Université : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-03236225

Soumis le : mercredi 26 mai 2021-10:47:50

Dernière modification le : mardi 12 novembre 2024-13:38:06

Archivage à long terme le : vendredi 27 août 2021-18:47:41

Dates et versions

hal-03236225 , version 1 (26-05-2021)

Identifiants

HAL Id : hal-03236225 , version 1
DOI : 10.1371/journal.pcbi.1008957
PUBMED : 34029316

Citer

Edwin Rodriguez Horta, Martin Weigt. On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins. PLoS Computational Biology, 2021, 17 (5), pp.e1008957. ⟨10.1371/journal.pcbi.1008957⟩. ⟨hal-03236225⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LCQB IBPS SORBONNE-UNIVERSITE SU-SCIENCES

60 Consultations

99 Téléchargements

On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager