Multivariate methods for the analysis of complex and big data in forensic sciences. Application to age estimation in living persons
Résumé
Researchers handle increasingly higher dimensional datasets, with many variables to explore. Such datasets pose several problems, since they are difficult to handle and present unexpected features. As dimensionality increases, classical statistical analysis becomes inoperative. Variables can present redundancy, and the reduction of dataset dimensionality to its lowest possible value is often needed. Principal components analysis (PCA) has proven useful to reduce dimensionality but present several shortcomings. As others, forensic sciences will face the issues specific related to an evergrowing quantity of data to be integrated. Age estimation in living persons, an unsolved problem so far, could benefit from the integration of various sources of data, e.g. clinical, dental and radiological data. We present here novel multivariate techniques (nonlinear dimensionality reduction techniques, NLDR), applied to a theoretical example. Results were compared to those of PCA. NLDR techniques were then applied to clinical, dental and radiological data (13 variables) used for age estimation. The correlation dimension of these data was estimated. NLDR techniques outperformed PCA results. They showed that two living persons sharing similar characteristics may present rather different estimated ages. Moreover, data presented a very high informational redundancy, i.e. a correlation dimension of 2. NLDR techniques should be used with or preferred to PCA techniques to analyze complex and big data. Data routinely used for age estimation may not be considered suitable for this purpose. How integrating other data or approaches could improve age estimation in living persons is still uncertain.
Origine | Fichiers produits par l'(les) auteur(s) |
---|
Loading...