FilterDCA: Interpretable supervised contact prediction using inter-domain coevolution

Predicting three-dimensional protein structure and assembling protein complexes using sequence information belongs to the most prominent tasks in computational biology. Recently substantial progress has been obtained in the case of single proteins using a combination of unsupervised coevolutionary sequence analysis with structurally supervised deep learning. While reaching impressive accuracies in predicting residue-residue contacts, deep learning has a number of disadvantages. The need for large structural training sets limits the applicability to multi-protein complexes; and their deep architecture makes the interpretability of the convolutional neural networks intrinsically hard. Here we introduce Fil-terDCA, a simpler supervised predictor for inter-domain and inter-protein contacts. It is based on the fact that contact maps of proteins show typical contact patterns, which results from secondary structure and are reflected by patterns in coevolutionary analysis. We explicitly integrate averaged contacts patterns with coevolutionary scores derived by Direct Coupling Analysis, improving performance over standard coevolutionary analysis, while remaining fully transparent and interpretable. The FilterDCA code is available at The de novo prediction of tertiary and quaternary protein structures has recently seen important advances, by combining unsupervised, purely sequence-based coevolutionary analyses with structure-based supervision using deep learning for contact-map prediction. While showing impressive performance, deep-learning methods require large training sets and pose severe obstacles for their interpretability. Here we construct a simple, transparent and therefore fully interpretable inter-domain contact predictor, which uses the results of coevolutionary Direct Coupling Analysis in combination with explicitly constructed filters reflecting typical contact patterns in a training set of known protein structures , and which improves the accuracy of predicted contacts significantly. Our approach thereby sheds light on the question how contact information is encoded in coevolutionary signals. PLOS COMPUTATIONAL BIOLOGY PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.

Domaines

Sciences du Vivant [q-bio]

Fichier principal

journal.pcbi.1007621.pdf (4.03 Mo)

Origine	Publication financée par une institution

Gestionnaire HAL 5 Sorbonne Université : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-02996567

Soumis le : lundi 9 novembre 2020-16:43:02

Dernière modification le : mercredi 30 octobre 2024-13:10:58

Archivage à long terme le : mercredi 10 février 2021-19:33:34

Dates et versions

hal-02996567 , version 1 (09-11-2020)

Identifiants

HAL Id : hal-02996567 , version 1
DOI : 10.1371/journal.pcbi.1007621

Citer

Maureen Muscat, Giancarlo Croce, Edoardo Sarti, Martin Weigt. FilterDCA: Interpretable supervised contact prediction using inter-domain coevolution. PLoS Computational Biology, 2020, 16 (10), pp.e1007621. ⟨10.1371/journal.pcbi.1007621⟩. ⟨hal-02996567⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LCQB IBPS SORBONNE-UNIVERSITE SU-SCIENCES ANR

70 Consultations

41 Téléchargements