Repositories for Taxonomic Data: Where We Are and What is Missing

Aurélien Miralles; Teddy Bruy; Katherine Wolcott; Mark D Scherz; Dominik Begerow; Bank Beszteri; Michael Bonkowski; Janine Felden; Birgit Gemeinholzer; Frank Glaw; Frank Oliver Glöckner; Oliver Hawlitschek; Ivaylo Kostadinov; Tim W Nattkemper; Christian Printzen; Jasmin Renz; Nataliya Rybalka; Marc Stadler; Tanja Weibulat; Thomas Wilke; Susanne S Renner; Miguel Vences

doi:10.1093/sysbio/syaa026

Article Dans Une Revue Systematic Biology Année : 2020

Repositories for Taxonomic Data: Where We Are and What is Missing

(1, 2) , (1, 2) , (2, 3) , (4, 5) , (6) , (7) , (8) , (9, 10) , (11) , (12, 4) , (10) , (13, 4) , (14) , (15) , (16) , (13) , (17) , (18, 19) , (14) , (11) , (2) , (12)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Aurélien Miralles

Fonction : Auteur
PersonId : 780260
ORCID : 0000-0002-2538-7710
IdRef : 117336874

Institut de Systématique, Evolution, Biodiversité

Technische Universität Munchen - Technical University Munich - Université Technique de Munich

Teddy Bruy

Fonction : Auteur

Institut de Systématique, Evolution, Biodiversité

Technische Universität Munchen - Technical University Munich - Université Technique de Munich

Katherine Wolcott

Fonction : Auteur

Technische Universität Munchen - Technical University Munich - Université Technique de Munich

Smithsonian Institution

Mark D Scherz

Fonction : Auteur

Zoologische Staatssammlung Muenchen

Universität Konstanz

Dominik Begerow

Fonction : Auteur
PersonId : 779194
ORCID : 0000-0002-8286-1597

Ruhr University Bochum = Ruhr-Universität Bochum

Bank Beszteri

Fonction : Auteur

Universität Duisburg-Essen = University of Duisburg-Essen [Essen]

Michael Bonkowski

Fonction : Auteur

Universität zu Köln = University of Cologne

Janine Felden

Fonction : Auteur

Universität Bremen [Deutschland] = University of Bremen [Germany] = Université de Brême [Allemagne]

Alfred Wegener Institute for Polar and Marine Research

Birgit Gemeinholzer

Fonction : Auteur

Justus-Liebig-Universität Gießen = Justus Liebig University

Frank Glaw

Fonction : Auteur

Technische Universität Braunschweig = Technical University of Braunschweig [Braunschweig]

Zoologische Staatssammlung Muenchen

Frank Oliver Glöckner

Fonction : Auteur

Alfred Wegener Institute for Polar and Marine Research

Oliver Hawlitschek

Fonction : Auteur

Universität Hamburg

Zoologische Staatssammlung Muenchen

Ivaylo Kostadinov

Fonction : Auteur

Fachbereich Geowissenschaften [Bremen]

Tim W Nattkemper

Fonction : Auteur

Universität Bielefeld = Bielefeld University

Christian Printzen

Fonction : Auteur

Senckenberg Research Institutes and Natural History Museums

Jasmin Renz

Fonction : Auteur

Universität Hamburg

Nataliya Rybalka

Fonction : Auteur

Georg-August-University = Georg-August-Universität Göttingen

Marc Stadler

Fonction : Auteur

Helmholtz Centre for Infection Research

German Centre for Infection Research

Tanja Weibulat

Fonction : Auteur

Fachbereich Geowissenschaften [Bremen]

Thomas Wilke

Fonction : Auteur

Justus-Liebig-Universität Gießen = Justus Liebig University

Susanne S Renner

Fonction : Auteur
PersonId : 1086188

Technische Universität Munchen - Technical University Munich - Université Technique de Munich

Miguel Vences

Fonction : Auteur
PersonId : 762455
ORCID : 0000-0003-0747-0817
IdRef : 14337270X

Technische Universität Braunschweig = Technical University of Braunschweig [Braunschweig]

Résumé

Natural history collections are leading successful large-scale projects of specimen digitization (images, metadata, DNA barcodes), thereby transforming taxonomy into a big data science. Yet, little effort has been directed towards safeguarding and subsequently mobilizing the considerable amount of original data generated during the process of naming 15,000–20,000 species every year. From the perspective of alpha-taxonomists, we provide a review of the properties and diversity of taxonomic data, assess their volume and use, and establish criteria for optimizing data repositories. We surveyed 4113 alpha-taxonomic studies in representative journals for 2002, 2010, and 2018, and found an increasing yet comparatively limited use of molecular data in species diagnosis and description. In 2018, of the 2661 papers published in specialized taxonomic journals, molecular data were widely used in mycology (94%), regularly in vertebrates (53%), but rarely in botany (15%) and entomology (10%). Images play an important role in taxonomic research on all taxa, with photographs used in >80% and drawings in 58% of the surveyed papers. The use of omics (high-throughput) approaches or 3D documentation is still rare. Improved archiving strategies for metabarcoding consensus reads, genome and transcriptome assemblies, and chemical and metabolomic data could help to mobilize the wealth of high-throughput data for alpha-taxonomy. Because long-term—ideally perpetual—data storage is of particular importance for taxonomy, energy footprint reduction via less storage-demanding formats is a priority if their information content suffices for the purpose of taxonomic studies. Whereas taxonomic assignments are quasifacts for most biological disciplines, they remain hypotheses pertaining to evolutionary relatedness of individuals for alpha-taxonomy. For this reason, an improved reuse of taxonomic data, including machine-learning-based species identification and delimitation pipelines, requires a cyberspecimen approach—linking data via unique specimen identifiers, and thereby making them findable, accessible, interoperable, and reusable for taxonomic research. This poses both qualitative challenges to adapt the existing infrastructure of data centers to a specimen-centered concept and quantitative challenges to host and connect an estimated ≤2 million images produced per year by alpha-taxonomic studies, plus many millions of images from digitization campaigns. Of the 30,000–40,000 taxonomists globally, many are thought to be nonprofessionals, and capturing the data for online storage and reuse therefore requires low-complexity submission workflows and cost-free repository use. Expert taxonomists are the main stakeholders able to identify and formalize the needs of the discipline; their expertise is needed to implement the envisioned virtual collections of cyberspecimens.

Mots clés

Big data cyberspecimen new species omics repositories specimen identifier taxonomy taxonomic data

Domaines

Sciences du Vivant [q-bio]

Fichier principal

syaa026.pdf (1.19 Mo)

Origine	Publication financée par une institution

Gestionnaire HAL 3 Sorbonne Université : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-03071704

Soumis le : mercredi 16 décembre 2020-10:25:39

Dernière modification le : vendredi 13 décembre 2024-14:12:17

Archivage à long terme le : mercredi 17 mars 2021-18:37:44

Dates et versions

hal-03071704 , version 1 (16-12-2020)

Licence

Paternité - Pas d'utilisation commerciale

Identifiants

HAL Id : hal-03071704 , version 1
DOI : 10.1093/sysbio/syaa026

Citer

Aurélien Miralles, Teddy Bruy, Katherine Wolcott, Mark D Scherz, Dominik Begerow, et al.. Repositories for Taxonomic Data: Where We Are and What is Missing. Systematic Biology, 2020, 69 (6), pp.1231-1253. ⟨10.1093/sysbio/syaa026⟩. ⟨hal-03071704⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

MNHN EPHE UNIV-AG CNRS ISYEB PSL SORBONNE-UNIVERSITE SU-SCIENCES INEE-CNRS

313 Consultations

137 Téléchargements

Repositories for Taxonomic Data: Where We Are and What is Missing

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager