Jointly aligning a group of DNA reads improves accuracy of identifying large deletions
Résumé
Performing sequence alignment to identify structural variants, such as large deletions, from genome se-quencing data is a fundamental task, but current methods are far from perfect. The current practice is to independently align each DNA read to a reference genome. We show that the propensity of ge-nomic rearrangements to accumulate in repeat-rich regions imposes severe ambiguities in these alignments , and consequently on the variant calls––with current read lengths, this affects more than one third of known large deletions in the C. Venter genome. We present a method to jointly align reads to a genome, whereby alignment ambiguity of one read can be disambiguated by other reads. We show this leads to a significant improvement in the accuracy of identifying large deletions (≥20 bases), while imposing minimal computational overhead and maintaining an overall running time that is at par with current tools. A software implementation is available as an open-source Python program called JRA at https://bitbucket.org/jointreadalignment/jra-src.
Origine | Publication financée par une institution |
---|
Loading...