Imperfect automatic image classification successfully describes plankton distribution patterns
Résumé
Imaging systems were developed to explore the fine scale distributions of plankton (<10 m), but they generate huge datasets that are still a challenge to handle rapidly and accurately. So far, imaged organisms have been either classified manually or pre-classified by a computer program and later verified by human operators. In this paper, we post-process a computer-generated classification, obtained with the common ZooProcess and PlanktonIdentifier toolchain developed for the ZooScan, and test whether the same ecological conclusions can be reached with this fully automatic dataset and with a reference, manually sorted, dataset. The Random Forest classifier outputs the probabilities that each object belongs in each class and we discard the objects with uncertain predictions, i.e. under a probability threshold defined based on a 1% error rate in a self-prediction of the learning set. Keeping only well-predicted objects enabled considerable improvements in average precision, 84% for biological groups, at the cost of diminishing recall (by 39% on average). Overall, it increased accuracy by 16%. For most groups, the automatically-predicted distributions were comparable to the reference distributions and resulted in the same size-spectra. Automatically-predicted distributions also resolved ecologically-relevant patterns, such as differences in abundance across a mesoscale front or fine-scale vertical shifts between day and night. This post-processing method is tested on the classification of plankton images through Random Forest here, but is based on basic features shared by all machine learning methods and could thus be used in a broad range of applications.
Origine | Fichiers produits par l'(les) auteur(s) |
---|
Loading...