Boosting fault localization of statements by combining topic modeling and Ochiai
Résumé
Context:
Reducing the cost of maintenance tasks by fixing bugs automatically is the cornerstone of Automated Program Repair (APR). To do this, automated Fault Localization (FL) is essential. Two families of FL techniques are Spectrum-based Fault Localization (SBFL) and Information Retrieval Fault Localization (IRFL). In SBFL, the coverage information and execution results of test cases are utilized. Ochiai is one of the most effective and used SBFL strategies. In IRFL, the bug report information is utilized as well as the identifier names and comments in source code files. Latent Dirichlet Allocation (LDA) is a generative statistical model and one of the most popular topic modeling methods. However, LDA has been used at the method level of granularity as IRFL technique, whereas most existing APR tools are focused on the statement level.
Objective:
This paper presents our approach that combines topic modeling and Ochiai to boost FL at the statement level.
Method:
We evaluate our approach considering five different projects in Defects4J benchmark. We report the performance of our approach in terms of hit@k and MRR. To study the impact on the results, we compare our approach against five baselines: two SBFL approaches (Ochiai and Dstar), two IRFL approaches (LDA and Blues), and one hybrid approach (SBIR). In addition, we compare the number of bugs that are found by our approach with the baselines.
Results:
Our approach significantly outperforms the baselines in all metrics. Especially, when hit@1, hit@3 and hit@5 are compared. Also, our approach locates more bugs than Ochiai and Blues.
Conclusion:
The results of our approach indicate that the integration of topic modeling with Ochiai boosts FL. This uncovers the potential of topic modeling for FL at statement level, which is valuable for the APR community.
Origine | Fichiers produits par l'(les) auteur(s) |
---|