Three Commits, Ding, Ding, Ding

« previous entry | next entry »
Jan. 31st, 2008 | 11:53 am

I dislike old boy's club. Not the part where you sit around in gab, but the part where people have to wait at the door to get in.

Something about them rubs me a bit raw.

Open Source has this problem in spades. The bar to commit to the major projects is not problematic in that it is high, it is a problem because it is not written down. It is often fuzzy, or worse personality dependent (so much for the meritocracy that is espoused).

Here is a story for you...

A few weeks ago I went to the Velocity Summit O'Reilly held in San Francisco (not to be confused with the Velocity Conference which I still owe an abstract to Jesse for), and I talked about my current topics, as always MySQL, and as of recent libmemcached.

Somewhere during on of the sessions we got off on a tangent about distributed source control (another favorite topic of mine), and how it made vendor versions simpler to deal with. Which is another way of saying that it encourages forks (not the kind you put in your eye).

Hallelujah!

I'm all for forks. They are a sign of health. They are not really much of a motivator for old boy's clubs though. The group I was with talked about that for a few moments. What was my surprise? The number of other people in the room that were hostile to the old boy's clubs. It was a common annoyance in the room.

Despite the ongoing fear of forks, open source projects remain very stuck in the proprietary world of software when it comes to contributions. Few are good at this, and most become an old boy's club of some sort. It is too much effort to create casual fixes for the average user. Which means it is impossible for the most part to make significant changes.

The Hanging fruit of features are left hanging until they either rot or... well they just keeping hanging around.

When I started working on libmemcached I decided to adopt a new strategy.

In the README I put "three good patches get you commit access". The fabled "you must be at least this...", was put into words. Any three patches and its yours.

So today some three months later? Five committers, and one person who is just one patch away. Most effort is still coming from me, but it is not all of the effort. The best bug fixers are coming in from a user (user's code better then straight developers, it is the nature of need). Documentation has all been strengthened from users.

To me five people is a success. It proved the point to me that it was better to grant access, then act as a gatekeeper for others.

It is a success that has shown off a few new problems.

  • Release. Just because you can commit, does not mean you can release. This is a problem, since it creates a bottleneck.
  • Regression. We need better regression testing, and by better I mean more. The current candy in my eye is the "BuildBot" project. I like the concept, but I want to see something more. I want all open source projects being filtered into a network where users can run slaves that do regression testing for them.
  • Coding Guidelines I've never written down what exactly my coding style is. It is hard for others to follow what they do not know.

    The last one is solvable, the other two I am trying to think up solutions for. I've pinged several people in large organizations to plant the seed for what I want to see happen with mass regression. It will take a few more months to see if anyone bites on it.

    Release might be solvable alongside regression. For years I've said that internally in MySQL we needed to just generate binaries with each push. Each push is either green or not, and green ones should be acceptable for a release (and if they are not... well then there is a problem with the regression system).

    Any open source projects should be able to do this (I'm amazed at how many pulls I see coming off my Mercurial trees, where users now just grab the tip of the tree). The problem with pulling directly from mercurial is that there is no way to know if the build you just pulled is good or not. There are no green lights, and to me this creates a barrier. Solving feedback on regressions to commits would make this go away.

    Another thought on this topic.

    Three commits might be too many. It is easy to roll back changes in a source control system. Perhaps take a queue from other groups? The folks at wikipedia have figured out a good formula for an on-line encyclopedia.

    Maybe, just maybe, revision control should be open access. There is a fear of trojans, but would it be possible for an open project to monitor itself? Wikipedia does a good job of rolling back malicious changes, could a similar gatekeeping system work for source code?

    How about a code reviewing captcha?

    I am left with the thought that open source is still more about source, and not so much about being "open".

    If Open Source wants to really move beyond the proprietary model, it needs to give some thoughts on how to open up the model.
  • Link | Leave a comment | Share

    Comments {11}

    Tanjent

    when in rome()

    from: tanjent
    date: Jan. 31st, 2008 08:42 pm (UTC)
    Link

    The nature of my job means that most of the code I write ends up going into foreign codebases - I've found that if I spend enough time reading through their codebase before I start writing, that I automatically pick up the "accent" of their coding guidelines. Not perfectly, but enough that the external devs aren't thrown off by style changes.

    Along those lines, perhaps a good guideline is just to say "Try and match my style - if you don't know how, you need to read more of my code".

    -tanjent

    Reply | Thread

    Brian "Krow" Aker

    Re: when in rome()

    from: krow
    date: Jan. 31st, 2008 08:58 pm (UTC)
    Link

    I believe you can pick up most of it, by just following along, but not all of it. Things like variable assignment, and naming schemes seems to be hard for most people to pick up on.

    That... and most people want to stick to their own naming schemes and resist what others do :)

    Reply | Parent | Thread

    awfief

    Re: when in rome()

    from: awfief
    date: Feb. 6th, 2008 10:46 pm (UTC)
    Link

    This also puts the burden on the very people you're trying to release the burden from. I've often heard "how can we get more x to do y?" Whether x is "women" or "people of color" or "volunteers" and y is "vote" or "join our organization" or "commit code", the answer is pretty simple:

    Make them want it
    and
    Make it easy

    I think people already want to, although that's a problem too. Making it easier is where Brian's going with this. I think having templates is nice, guidelines are nice too.

    The wikipedia-like nature is interesting....very often I find myself wanting to change other people's codes, and it may be something minor like "take out the hard coding of that number and put it in a variable" -- minor, but may require a few hours of grepping and making sure I'm not reusing a variable and making sure it's used in all the right places.

    So I could see the value in having it be truly open.

    As you say, it creates a bottleneck for review and release. Except it doesn't create the bottleneck, it *moves* the bottleneck.

    You hit the nail on the head with better regression testing. I think honestly that is where the focus needs to be. Have tons of regression tests, and a patch gets committed to the release branch if the software still "works" after that commit.

    Reply | Parent | Thread

    commits to tests

    from: dmarti
    date: Feb. 1st, 2008 12:07 am (UTC)
    Link

    How about: as a new contributor, you can commit all you want to /tests, and if a future commit makes one of your tests fail, or if you submit a patch that makes one of your tests go from fail to pass, you're in?

    Reply | Thread

    Brian "Krow" Aker

    Re: commits to tests

    from: krow
    date: Feb. 1st, 2008 12:30 am (UTC)
    Link

    That would be interesting. I've always thought it would be in the best interest of most companies to create test cases for open source projects that they rely on.

    Why?

    Make your application a part of the regression system. I've floated this several times to different audiences, but I have never had anyone bite on it.

    Reply | Parent | Thread

    awfief

    Re: commits to tests

    from: awfief
    date: Feb. 6th, 2008 10:46 pm (UTC)
    Link

    That *is* a smart idea.....

    Reply | Parent | Thread

    Formatting

    from: acdha
    date: Feb. 1st, 2008 12:49 am (UTC)
    Link

    Have you considered supplying a config file for one of the common reformatting tools? I've been using perltidy and html tidy this way for long enough to have considered having some sort of pre-commit hook which would automatically reformat code before submitting it.

    Reply | Thread

    Brian "Krow" Aker

    Re: Formatting

    from: krow
    date: Feb. 1st, 2008 02:34 am (UTC)
    Link

    I've thought about doing this in the past... but it scares me just a bit to have a formatter run like that. I use ident via vim, but it only gets things mostly right.

    Reply | Parent | Thread

    Re: Formatting

    from: acdha
    date: Feb. 1st, 2008 02:54 am (UTC)
    Link

    Yes - I have been quite impressed with what perltidy can handle but I still wouldn't use it without a backup. It seems like it would be useful to wrap the commit process so it would do a compiler test / lint, reformat with a backup file (which your VCS ignores) and give up if either step fails (e.g. copy the .bak file over the reformatted one and ask you to fix it).

    Reply | Parent | Thread

    Dúshlán

    About Release

    from: shadymist
    date: Feb. 1st, 2008 02:54 am (UTC)
    Link

    (please bear in mind I may be misinterpreting the problems specified due to my lack of programming knowledge...)

    If you don't know yet if you can fully trust committed versions, but you don't want to make people wait for you to post them, can you not just allow for a secondary link marking it as "beta" with some note that it's user-updated? This would only be helpful in regards to the first stated issue, but you could possibly have the notes request volunteers to test and comment publicly so newcomers can see the results and decide if it's worth downloading the un-Brian-verified updated version?

    Reply | Thread

    awfief

    Re: About Release

    from: awfief
    date: Feb. 6th, 2008 10:49 pm (UTC)
    Link

    Or, um, take your own medicine -- make a commit an implicit fork? (or branch, in cvs terms).

    ie, have your system have approved people, and those commits by approved people get into the base / next release, no questions asked. Then have those of us who aren't approved make an implicit fork.

    So you can have "libmemcache_SHEERI_00001" or whatever. Some system to merge branches might be nice, so I can have my changes from yesterday merged with your changes from today, and do a one-command download to my system. :)

    Reply | Parent | Thread