WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data - Sorbonne Université
Article Dans Une Revue BMC Bioinformatics Année : 2015

WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data

Résumé

Background: The sequencing depth provided by high-throughput sequencing technologies has allowed a rise in the number of de novo sequenced genomes that could potentially be closed without further sequencing. However, genome scaffolding and closure require costly human supervision that often results in genomes being published as drafts. A number of automatic scaffolders were recently released, which improved the global quality of genomes published in the last few years. Yet, none of them reach the efficiency of manual scaffolding. Results: Here, we present an innovative semi-automatic scaffolder that additionally helps with chimerae resolution and generates valuable contig maps and outputs for manual improvement of the automatic scaffolding. This software was tested on the newly sequenced marine cyanobacterium Synechococcus sp. WH8103 as well as two reference datasets used in previous studies, Rhodobacter sphaeroides and Homo sapiens chromosome 14 (http://gage.cbcb.umd.edu/). The quality of resulting scaffolds was compared to that of three other stand-alone scaffolders: SSPACE, SOPRA and SCARPA. For all three model organisms, WiseScaffolder produced better results than other scaffolders in terms of contiguity statistics (number of genome fragments, N50, LG50, etc.) and, in the case of WH8103, the reliability of the scaffolds was confirmed by whole genome alignment against a closely related reference genome. We also propose an efficient computer-assisted strategy for manual improvement of the scaffolding, using outputs generated by WiseScaffolder, as well as for genome finishing that in our hands led to the circularization of the WH8103 genome. Conclusion: Altogether, WiseScaffolder proved more efficient than three other scaffolders for both prokaryotic and eukaryotic genomes and is thus likely applicable to most genome projects. The scaffolding pipeline described here should be of particular interest to biologists wishing to take advantage of the high added value of complete genomes.
Fichier principal
Vignette du fichier
Farrant et al_2015_WiseScaffolder.pdf (2.2 Mo) Télécharger le fichier
Origine Publication financée par une institution
Loading...

Dates et versions

hal-01193001 , version 1 (04-09-2015)

Licence

Identifiants

Citer

Gregory K. Farrant, Mark Hoebeke, Frédéric Partensky, Gwendoline Andres, Erwan Corre, et al.. WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data. BMC Bioinformatics, 2015, 16 pp.281. ⟨10.1186/s12859-015-0705-y⟩. ⟨hal-01193001⟩
457 Consultations
209 Téléchargements

Altmetric

Partager

More