Efficient similarity-based alignment of temporally-situated graph nodes with Apache Spark - Sorbonne Université Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Efficient similarity-based alignment of temporally-situated graph nodes with Apache Spark

Hubert Naacke
Ke Li
  • Fonction : Auteur
  • PersonId : 1063176
Bernd Amann
Olivier Curé

Résumé

Topic evolution networks are widely used to represent the evolution of research topics in scientific document archives. These networks might contain thousands of topics and alignment edges which are computed by comparing millions of topic pairs with some similarity function. In this work, we are addressing the problem of computing a very large number cosine-based topic alignments on top of Apache Spark. We present the native map-reduce implementation proposed by Spark and a more efficient implementation which is tuned for alignment computation. Both implementations are evaluated on three real-world datasets.
Fichier non déposé

Dates et versions

hal-02444359 , version 1 (17-01-2020)

Identifiants

Citer

Hubert Naacke, Ke Li, Bernd Amann, Olivier Curé. Efficient similarity-based alignment of temporally-situated graph nodes with Apache Spark. IEEE International Conference on Big Data, High Performance Big Graph Data Management, Analysis, and Mining, Dec 2019, Los Angeles, CA, United States. pp.4793-4798, ⟨10.1109/BigData47090.2019.9005483⟩. ⟨hal-02444359⟩
134 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More