Theoretical analysis of git bisect
Abstract
In this paper, we consider the problem of finding a regression in a version
control system (VCS), such as git.
The set of versions is modelled by a Directed Acyclic Graph (DAG) where
vertices represent versions of the software, and arcs are the changes between different versions. We
assume that somewhere in the DAG, a bug was introduced, which persists
in all of its subsequent versions. It is possible to query a vertex to
check whether the corresponding version carries the bug. Given a DAG and
a bugged vertex, the Regression Search Problem consists in finding the
first vertex containing the bug in a minimum number of queries in the
worst-case scenario. This problem is known to be NP-complete.
We study the algorithm used in git to address this problem, known as
git bisect. We prove that in a general setting, git bisect can use an
exponentially larger number of queries than an optimal algorithm. We
also consider the restriction where all vertices have indegree at most
2 (i.e. where merges are made between at most two branches at a time in the VCS),
and prove that in this case, git bisect is a
$\frac{1}{\log_2(3/2)}$-approximation algorithm, and that this bound is
tight. We also provide a better approximation algorithm for this case.
Finally, we give an alternative proof of the NP-completeness of the Regression Search Problem, via a variation with bounded indegree.
Origin | Files produced by the author(s) |
---|---|
licence |