Community Recovery in the Degree-Heterogeneous Stochastic Block Model
Abstract
We consider the problem of recovering communities in a random directed graph with planted communities. To model real-world directed graphs such as the Twitter or Instagram graphs that exhibit very heterogeneous degree sequences, we introduce the Degree-Heterogeneous Stochastic Block Model (DHSBM), a generalization of the classic Stochastic Block Model (SBM), where the vertex set is partitioned into communities and each vertex u has two (unknown) associated probabilities, p u and q u , p u > q u. An arc from u to v is generated with probability p u if u and v are in the same community and with probability q u otherwise. Given a graph generated from this model, the goal is to retrieve the communities. The DHSBM allows to generate graphs with planted communities while allowing heterogeneous degree distributions, a quite important feature of real-world networks. In the case where there are two communities, we present an iterative greedy linear-time algorithm that recovers them whenever min u pu−qu / √pu ≥ C log(n)/n, for some absolute constant C. We also show that, up to a constant, this condition is necessary. Our results also extend to the standard (undirected) SBM , where p u = p and q u = q for all nodes u. Our algorithm presents the first linear-time algorithm that recovers exactly the communities at the asymptotic information-theoretic threshold, improving over previous near-linear time spectral approaches.
Origin | Files produced by the author(s) |
---|