Ph.D., Computer Science
Christopher S. Corley, Kostadin Damevski, Nicholas A. Kraft
The standard approach to applying text retrieval models to code repositories is to train models on documents representing program elements. However, code changes lead to model obsolescence and to the need to retrain the model from the latest snapshot. To address this, we previously introduced an approach that trains a model on documents representing changesets from a repository and demonstrated its feasibility for feature location. In this paper, we expand our work by investigating: a second task (developer identification), the effects of including different changeset parts in the model, the repository characteristics that affect the accuracy of our approach, and the effects of the time invariance assumption on evaluation results. Our results demonstrate that our approach is as accurate as the standard approach for projects with most changes localized to a subset of the code, but less accurate when changes are highly distributed throughout the code. Moreover, our results demonstrate that context and messages are key to the accuracy of changeset-based models and that the time invariance assumption has a statistically significant effect on evaluation results, providing overly-optimistic results. Our findings indicate that our approach is a suitable alternative to the standard approach, providing comparable accuracy while eliminating retraining costs.view more
In this paper, we present a semi-automatic approach for mining a large-scale dataset of IDE interactions to extract usage smells, i.e., inefficient IDE usage patterns exhibited by developers in the field. The approach outlined in this paper first mines frequent IDE usage patterns, filtered via a set of thresholds and by the authors, that are subsequently supported (or disputed) using a developer survey, in order to form usage smells. In contrast with conventional mining of IDE usage data, our approach identifies time-ordered sequences of developer actions that are exhibited by many developers in the field. This pattern mining workflow is resilient to the ample noise present in IDE datasets due to the mix of actions and events that these datasets typically contain. We identify usage patterns and smells that contribute to the understanding of the usability of Visual Studio for debugging, code search, and active file navigation, and, more broadly, to the understanding of developer behavior during these software development activities. Among our findings is the discovery that developers are reluctant to use conditional breakpoints when debugging, due to perceived IDE performance problems as well as due to the lack of error checking in specifying the conditional.