Skip to Main content Skip to Navigation
Journal articles

On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins

Abstract : Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. A comparison between the results of Direct Coupling Analysis applied to real and to resampled data shows that the largest coevolutionary couplings, i.e. those used for contact prediction, are only weakly influenced by phylogeny. However, the phylogeny-induced spurious couplings in the resampled data are compatible in size with the first false-positive contact predictions from real data. Dissecting functional from phylogeny-induced couplings might therefore extend accurate contact predictions to the range of intermediate-size couplings.
Document type :
Journal articles
Complete list of metadata

https://hal.sorbonne-universite.fr/hal-03236225
Contributor : Hal Sorbonne Université Gestionnaire Connect in order to contact the contributor
Submitted on : Wednesday, May 26, 2021 - 10:47:50 AM
Last modification on : Thursday, September 2, 2021 - 9:08:50 AM
Long-term archiving on: : Friday, August 27, 2021 - 6:47:41 PM

File

journal.pcbi.1008957.pdf
Publication funded by an institution

Identifiers

`

Citation

Edwin Rodriguez Horta, Martin Weigt. On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins. PLoS Computational Biology, Public Library of Science, 2021, 17 (5), pp.e1008957. ⟨10.1371/journal.pcbi.1008957⟩. ⟨hal-03236225⟩

Share

Metrics

Record views

69

Files downloads

71