Offline versus Online Representation Learning of Documents Using External Knowledge
Résumé
An intensive recent research work investigated the combined use of hand-curated knowledge resources and corpus-driven resources to learn effective text representations. The overall learning process could be run by online revising the learning objective or by offline refining an original learned representation. The differentiated impact of each of the learning approaches on the quality of the learned representations has not been studied so far in the literature. This article focuses on the design of comparable offline vs. online knowledge-enhanced document representation learning models and the comparison of their effectiveness using a set of standard IR and NLP downstream tasks. The results of quantitative and qualitative analyses show that (1) offline vs. online learning approaches have dissimilar result trends regarding the task as well as the dataset distribution counts with regard to domain application; (2) while considering external knowledge resources is undoubtedly beneficial, the way used to express relational constraints could affect semantic inference effectiveness. The findings of this work present opportunities for the design of future representation learning models, but also for providing insights about the evaluation of such models.