Deep dynamic neural networks for temporal language modeling in author communities
Abstract
Language models are at the heart of numerous works, notably in the text mining and information retrieval communities. These statistical models aim at extracting word distributions, from simple unigram models to recurrent approaches with latent variables that capture subtle dependencies in texts. However, those models are learned from word sequences only, and authors’ identities, as well as publication dates, are seldom considered. We propose a neural model, based on recurrent language modeling (e.g., LSTM), which aims at capturing language diffusion tendencies in author communities through time. By conditioning language models with author and dynamic temporal vector states, we are able to leverage the latent dependencies between the text contexts. The model captures language evolution of authors via a shared temporal prediction function in a latent space, which allows to handle a variety of modeling tasks, including completion and prediction of language models through time. Experiments show the performances of the approach, compared to several temporal and non-temporal language baselines on two real-world corpora.