Ph.D., Computer Science
In this paper, we present a semi-automatic approach for mining a large-scale dataset of IDE interactions to extract usage smells, i.e., inefficient IDE usage patterns exhibited by developers in the field. The approach outlined in this paper first mines frequent IDE usage patterns, filtered via a set of thresholds and by the authors, that are subsequently supported (or disputed) using a developer survey, in order to form usage smells. In contrast with conventional mining of IDE usage data, our approach identifies time-ordered sequences of developer actions that are exhibited by many developers in the field. This pattern mining workflow is resilient to the ample noise present in IDE datasets due to the mix of actions and events that these datasets typically contain. We identify usage patterns and smells that contribute to the understanding of the usability of Visual Studio for debugging, code search, and active file navigation, and, more broadly, to the understanding of developer behavior during these software development activities. Among our findings is the discovery that developers are reluctant to use conditional breakpoints when debugging, due to perceived IDE performance problems as well as due to the lack of error checking in specifying the conditional.
Searching for relevant code in the local code base is a common activity during software maintenance. However, previous research indicates that 88% of manually-composed search queries retrieve no relevant results. One reason that many searches fail is existing search tools’ dependence on string matching algorithms, which cannot find semantically-related code. To solve this problem by helping developers compose better queries, researchers have proposed numerous query recommendation techniques, relying on a variety of dictionaries and algorithms. However, few of these techniques are empirically evaluated by usage data from real-world developers. To fill this gap, we designed a multi-recommendation system that relies on the cooperation between several query recommendation techniques. We implemented and deployed this recommendation system within the Sando code search tool and conducted a longitudinal field study. Our study shows that over 34% of all queries were adopted from recommendation; and recommended queries retrieved results 11% more often than manual queries.
Sensor-driven applications, implemented using modern mobile or gaming devices, have great potential in motivating computer science students. Recent industry trends toward including more sensors on devices such as mobile phones, which enable new applications in health monitoring, smart homes, and human safety, among others, indicate that the number of such sensor-driven applications will continue to rise. Via a study to learn the difficulties that a group of students face in designing such sensor-driven applications, we uncover a set of instructional principles for instructors to follow in using sensor-driven applications in classrooms. Our findings include that (1) exposing students to sensor data earlier helps improve self-efficacy; (2) focusing on extracting overall patterns from sensor data rather than understanding specifics of physical quantities is beneficial; and (3) good sensor data visualization is beneficial to design, but bad visualization can confuse students.
Using IDE usage data to analyze the behavior of software developers in the field, during the course of their daily work, can lend support to (or dispute) laboratory studies of developers. This paper describes a technique that leverages Hid- den Markov Models (HMMs) as a means of mining high-level developer behavior from low-level IDE interaction traces of many developers in the field. HMMs use dual stochastic processes to model higher-level hidden behavior using observable input sequences of events. We propose an interactive approach of mining interpretable HMMs, based on guiding a human expert in building a high quality HMM in an iterative, one state at a time, manner. The final result is a model that is both representative of the field data and captures the field phenomena of interest. We apply our HMM construction approach to study debugging behavior, using a large IDE interaction dataset collected from nearly 200 developers at ABB, Inc. Our results highlight the different modes and constituent actions in debugging, exhibited by the developers in our dataset.
Our current understanding of how programmers perform feature location during software maintenance is based on controlled studies or interviews, which are inherently limited in size, scope and realism. Replicating controlled studies in the field can both explore the findings of these studies in wider contexts and study new factors that have not been previously encountered in the laboratory setting. In this paper, we report on a field study about how software developers perform feature location within source code during their daily development activities. Our study is based on two complementary field data sets: one that reflects complete IDE activity of 67 professional developers over approximately one month, and the other that reflects usage of an IR-based code search tool by nearly 600 developers. Analyzing this data, we report results on how often developers use which type of code search tools, on the types of queries and retreival strategies used by developers, and on patterns of developer feature location behavior following code search. The results of the study suggest that there is (1) a need for helping developers to devise better code search queries; (2) a lack of adoption of niche code search tools; (3) a need for code search tool to handle both lookup and exploratory queries; and (4) a need for better integration between code search, structured navigation, and debugging tools in feature location tasks.