Efficient similarity-based alignment of temporally-situated graph nodes with Apache Spark
Abstract
Topic evolution networks are widely used to represent the evolution of research topics in scientific document archives. These networks might contain thousands of topics and alignment edges which are computed by comparing millions of topic pairs with some similarity function. In this work, we are addressing the problem of computing a very large number cosine-based topic alignments on top of Apache Spark. We present the native map-reduce implementation proposed by Spark and a more efficient implementation which is tuned for alignment computation. Both implementations are evaluated on three real-world datasets.