From Developer Networks to Verified Communities:
A Fine-Grained Approach

Supplementary material for ICSE 2015 technical research track submission.

Mitchell Joblin
Siemens AG

Wolfgang Mauerer
Siemens AG, OTH Regensburg

Sven Apel, Janet Siegmund
University of Passau

Dirk Riehle
Friedrich-Alexander-University

View Developer Survey

Abstract

Effective software engineering demands a coordinated effort. Unfortunately, a comprehensive view on developer coordination is rarely available to support software-engineering decisions, despite the significant implications on software quality, software architecture, and developer productivity. We present a fine-grained, verifiable, and fully automated approach to capture a view on developer coordination, based on commit information and source-code structure, mined from a version control system. We apply methodology from network analysis and machine learning to identify statistically significant developer clusters. Compared to previous work, our approach is fine-grained, and identifies clusters using order-statistics and a sophisticated cluster-evaluation technique based on graph conductance. To demonstrate the scalability and generality of our approach, we analyze ten open-source projects with complex and active histories, written in various programming languages. By questioning 53 open-source developers from ten different projects, we validate the authenticity of the clusters with respect to representing developer communities. Our results indicate that developers of open-source projects form statistically significant clusters and this particular view on collaboration partially coincides with developers' perceptions of community structure.

Description

An example developer network is shown on the right. Each bounding box represents a single community of developers. The border color of each box uniquely identifies each community, and pie charts are used to represent each developer's fractional participation in a community. The box background color is used to represent the strength of each community, calculated according to conductance (cf. full paper). Green signifies a strong community, yellow a weak community, and red an anti-community. Intracommunity edges are shown in black, and intercommunity edges are shown in red. The edge thickness represents the strength of a relationship. PageRank centrality was used to identify important developers and is represented by the size of each node. We filter the inter-community-edges to reduce clutter by aggregating the edge multiplicity between two communities into a single edge, connecting the two most important developers. The weight of an inter-community edge represents the total collaboration between all nodes in the connected communities.