Expert-guided protein Language Models enable accurate and blazingly fast fitness prediction - Sorbonne Université
Article Dans Une Revue Bioinformatics Année : 2024

Expert-guided protein Language Models enable accurate and blazingly fast fitness prediction

Résumé

MOTIVATION Exhaustive experimental annotation of the effect of all known protein variants remains daunting and expensive, stressing the need for scalable effect predictions. We introduce VespaG, a blazingly fast missense amino acid variant effect predictor, leveraging protein Language Model (pLM) embeddings as input to a minimal deep learning model. RESULTS To overcome the sparsity of experimental training data, we created a dataset of 39 million single amino acid variants from the human proteome applying the multiple sequence alignment-based effect predictor GEMME as a pseudo standard-of-truth. This setup increases interpretability compared to the baseline pLM and is easily retrainable with novel or updated pLMs. Assessed against the ProteinGym benchmark(217 multiplex assays of variant effect— MAVE— with 2.5 million variants), VespaG achieved a mean Spearman correlation of 0.48±0.02, matching top-performing methods evaluated on the same data. VespaG has the advantage of being orders of magnitude faster, predicting all mutational landscapes of all proteins in proteomes such as Homo sapiens or Drosophila melanogaster in under 30 minutes on a consumer laptop (12-core CPU, 16 GB RAM).
Fichier principal
Vignette du fichier
btae621.pdf (1.63 Mo) Télécharger le fichier
btae621_supplementary_data.pdf (2.21 Mo) Télécharger le fichier
Origine Publication financée par une institution
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04801953 , version 1 (25-11-2024)

Licence

Identifiants

Citer

Céline Marquet, Julius Schlensok, Marina Abakarova, Burkhard Rost, Elodie Laine. Expert-guided protein Language Models enable accurate and blazingly fast fitness prediction. Bioinformatics, 2024, ⟨10.1093/bioinformatics/btae621⟩. ⟨hal-04801953⟩
0 Consultations
0 Téléchargements

Altmetric

Partager

More