Efficient similarity-based alignment of temporally-situated graph nodes with Apache Spark - Sorbonne Université
Conference Papers Year : 2019

Efficient similarity-based alignment of temporally-situated graph nodes with Apache Spark

Hubert Naacke
Ke Li
  • Function : Author
  • PersonId : 1063176
Bernd Amann
Olivier Curé

Abstract

Topic evolution networks are widely used to represent the evolution of research topics in scientific document archives. These networks might contain thousands of topics and alignment edges which are computed by comparing millions of topic pairs with some similarity function. In this work, we are addressing the problem of computing a very large number cosine-based topic alignments on top of Apache Spark. We present the native map-reduce implementation proposed by Spark and a more efficient implementation which is tuned for alignment computation. Both implementations are evaluated on three real-world datasets.
No file

Dates and versions

hal-02444359 , version 1 (17-01-2020)

Identifiers

Cite

Hubert Naacke, Ke Li, Bernd Amann, Olivier Curé. Efficient similarity-based alignment of temporally-situated graph nodes with Apache Spark. IEEE International Conference on Big Data, High Performance Big Graph Data Management, Analysis, and Mining, Dec 2019, Los Angeles, CA, United States. pp.4793-4798, ⟨10.1109/BigData47090.2019.9005483⟩. ⟨hal-02444359⟩
190 View
0 Download

Altmetric

Share

More