Software Research and the Industry

Dirk Riehle’s blog about everything computer science, applied and more

Software Research and the Industry header image 1

Open Source Labor Economics…

December 6th, 2008 · No Comments

…is not nearly as sexy a title for an industry talk as is “Open Source Hacker Careers” so it had to go. The result you can observe at the 2009 Open Source Meets Business conference in Nuremberg, Germany, on January 28th, 2009, when I will be giving a talk (almost) so named.

Open Source Software Developer Careers

Open source is changing how software is built and how money is made. Open source also defines a new developer career that is independent of the traditional career within companies. This talk discusses this new career and argues that it creates economic value for some while it makes life harder for others. Suggesting that such a career is worthwhile, the talk then discusses key skills that a developer should possess or train in order to be successful in open source projects.

→ No CommentsTags: Industry · Open Source · Presentation

Collaboration and Knowledge Sharing in Software Development Teams (SofTEAM ‘09)

November 15th, 2008 · 1 Comment

For your information, a workshop on Collaboration and Knowledge Sharing in Software Development Teams (SofTEAM ‘09)

CALL FOR PAPERS

European Workshop on “Collaboration and Knowledge Sharing in Software Development Teams (SofTEAM’09)”

www1.in.tum.de/softeam09

[Read more →]

→ 1 CommentTags: Announcement · Open Source · Research · Software Design · Wikis

How Open Source Comments (by Programming Language)

November 10th, 2008 · 6 Comments

We recently looked at the commenting practice of active working open source projects. It is quite impressive: The average comment density of open source is around 19%. (Comment density is the percentage of text that are comments, or, more formally: comment density = comment lines / (comment lines + source code lines); for example, two lines of text, one a comment line and one a source code line, have a 50% comment density.) A 19% comment density is much more documentation than most people thought!

However, such a rough number needs discussion. Here, I look at the comment density on a programming language basis. As it turns out, the comment density of active open source projects varies by programming language. Java is leading the bunch, but that needs further discussion.

[Read more →]

→ 6 CommentsTags: Industry · Open Source · Research

The Perils of Going from Community to Commercial Open Source

November 4th, 2008 · 1 Comment

The growth and corporate adoption of many community open source projects is hindered by the lack of commercial support. At the same time, well working community open source is a temptation for startups to make a buck by turning the community project into commercial open source. We can currently observe the unraveling of such a story.

TWiki is a community open source project, a successful wiki engine, and the company TWIKI.NET is trying to turn it into a commercial open source project. This post discusses some of the levers and dynamics of this process. But first, to clarify terms: Community open source is open source owned by a community with no single dominant stakeholder, and commercial open source is open source owned or dominated by a single legal entity, typically a software firm that wants earn money with it.

[Read more →]

→ 1 CommentTags: Industry · Open Source · Wikimedia · Wikis

Capobianco’s OSS 2008 Keynote

October 21st, 2008 · 3 Comments

Fabrizio Capobianco made the slides from his OSS 2008 keynote available. This is the same conference where we reported about the total (exponential) growth of open source. Unfortunately I had to leave right after our talk for the Wiki Symposium so I didn’t catch him nor could I listen to his talk. His slides, however, provide great talking points and insights that he probably communicated to his audience.

[Read more →]

→ 3 CommentsTags: Industry · Open Source

A License Agnostic ACM Digital Library?

October 15th, 2008 · 4 Comments

Most authors transfer their copyright to the ACM when having their papers published and archived in the ACM Digital Library. While the ACM allows authors to provide their papers on personal servers for non-commercial purposes, the goal recognizably is to make the DL not only the primary source of such material, but also the only source.

A second less well-known option for authors is to sign the “permission release” form, granting the ACM the right to publish the work, but without loosing the copyright to it. Authors keep the rights to their work while still having the paper published and archived in the DL. Then, the DL becomes one source of the paper, but not the only one. This option is typically made available only under special circumstances, for example, if you are working for the Canadian government.

The recent publication of the Proceedings of the 2006 Conference on Pattern Languages of Programming may signify an important change in this regiment.

[Read more →]

→ 4 CommentsTags: Education · Research · Wikimedia · Wikipedia

PLoP Proceedings now in ACM Digital Library

October 15th, 2008 · No Comments

Thanks to the efforts of Joe Yoder and Ralph Johnson, the proceedings of the 2006 conference on Pattern Languages of Programming have been archived in the ACM Digital Library. I expect the 2007 and future proceedings to be made available through the ACM DL as well. Whether it will be applied to past years is unclear.

[Read more →]

→ No CommentsTags: Publication · Research

The Dominance of Small Code Contributions

September 23rd, 2008 · 4 Comments

What is the most common size of code contributions to open source? Maybe 30 lines of source code? 200 lines? Or just one line? What’s your guess?

[Read more →]

→ 4 CommentsTags: Open Source · Research

The Commit Size Distribution of Open Source Software

September 23rd, 2008 · No Comments

Title: The Commit Size Distribution of Open Source Software

Authors: Oliver Arafat, Dirk Riehle

Institution: SAP Research, SAP Labs LLC

Abstract: With the growing economic importance of open source, we need to improve our understanding of how open source software development processes work. The analysis of code contributions to open source projects is an important part of such research. In this paper we analyze the size of code contributions to more than 9,000 open source projects. We review the total distribution and distinguish three categories of code contributions using a size-based heuristic: single focused commits, aggregate team contributions, and repository refactorings. We find that both the overall distribution and the individual categories follow a power law. We also suggest that distinguishing these commit categories by size will benefit future analyses.

Reference: In Proceedings of the 42nd Hawaiian International Conference on System Sciences (HICSS-42). Forthcoming.

Available as a PDF file.

→ No CommentsTags: Open Source · Publication · Research

Learning from Wikipedia: Open Collaboration within Corporations

September 6th, 2008 · 1 Comment

Wikipedia is the free online encyclopedia that has taken the Internet by storm. It is written and administered solely by volunteers. How exactly did this come about and how does it work? Can it keep working? And maybe more importantly, can you transfer its practices to the workplace to achieve similar levels of dedication and quality of work? In this presentation I describe the structure, processes and governance of Wikipedia and discuss how some of its practices can be transferred to the corporate context.

This presentation represents the next step in the evolution of two Wikimania tutorials/workshops, see Presentations/Tutorials. If the slideshow doesn’t play, please use the PDF file download below.

Reference: Dirk Riehle. “Learning from Wikipedia: Open Collaboration within Corporations.” Invited talk at Talk the Future 2008. Krems, Austria: 2008.

The slides are available as a PDF file.

→ 1 CommentTags: Industry · Open Source · Presentation · Research · Wikimedia · Wikipedia · Wikis