Comparative assessment of long-read error correction software applied to Nanopore RNA-sequencing data - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Article Dans Une Revue Briefings in Bioinformatics Année : 2019

Comparative assessment of long-read error correction software applied to Nanopore RNA-sequencing data

Résumé

Motivation: Nanopore long-read sequencing technology offers promising alternatives to high-throughput short read sequencing, especially in the context of RNA-sequencing. However this technology is currently hindered by high error rates in the output data that affect analyses such as the identification of isoforms, exon boundaries, open reading frames, and the creation of gene catalogues. Due to the novelty of such data, computational methods are still actively being developed and options for the error-correction of Nanopore RNA-sequencing long reads remain limited. Results: In this article, we evaluate the extent to which existing long-read DNA error correction methods are capable of correcting cDNA Nanopore reads. We provide an automatic and extensive benchmark tool that not only reports classical error-correction metrics but also the effect of correction on gene families, isoform diversity, bias towards the major isoform, and splice site detection. We find that long read error-correction tools that were originally developed for DNA are also suitable for the correction of Nanopore RNA-sequencing data, especially in terms of increasing base-pair accuracy. Yet investigators should be warned that the correction process perturbs gene family sizes and isoform diversity. This work provides guidelines on which (or whether) error-correction tools should be used, depending on the application type. Benchmarking software: https://gitlab.com/leoisl/LR_EC_analyser Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.
Fichier principal
Vignette du fichier
LR_EC_analyser_paper.pdf (845.27 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02394395 , version 1 (11-12-2019)

Identifiants

Citer

Leandro Lima, Camille Marchet, Ségolène Caboche, Corinne da Silva, Benjamin Istace, et al.. Comparative assessment of long-read error correction software applied to Nanopore RNA-sequencing data. Briefings in Bioinformatics, 2019, pp.1-18. ⟨10.1093/bib/bbz058⟩. ⟨hal-02394395⟩
542 Consultations
888 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More