Log in

No account? Create an account

How many contributors does Drizzle have?

« previous entry | next entry »
Oct. 20th, 2010 | 11:05 am

Opscode posted a note this morning on their current contribution level, which got me to thinking about Drizzle's contributors.

From looking at bzr log I can find out some of the details.

To date we have had 13,478 commits that have went into our tree at all levels. If we look at level two commit (i.e. these are patches that are more likely to be a complete body of work) we have had 8, 064 commits.

We have had 96 total contributors to date who submitted code to the project.

1119 commits by students who participated in Google's Summer of Code Project.

I had someone ask me about my own contributions to the project, they had assumed that I had done more of the work (not even close!). To date I have done about 3,017 total commits, and if we look at the top four is myself, Monty Taylor, Stewart Smith, and then Jay Pipes. Monty Taylor has done 3496 commits, so I have some catching up to do!

Number five on the list is Padraig O'Sullivan, who never worked for Sun or MySQL. He now works at Akiban, but he started doing work with us as a Google Summer of Code Student.

Number six on the list is a sun employee who never worked for MySQL, and four of the people in the 5-10 range never worked at Sun at all. David Axmark, one of the founders of MySQL, has even contributed patches.

Our Trigger infrastructure was contributed to Drizzle from one company (Primebase). Joe Daly has worked out the scoreboard and most of the work done to our optimizer and executioner was done by folks who have never worked on MySQL. There are multiple stories that can be told about individuals or companies contributing work at this point.

We have had only seventeen contributors who have only contributed a single patch.

Launchpad says that we have had 37 active contributors in the last month.

This doesn't even begin to include hours put in by folks who keep our infrastructure working like Mike Shadle from Intel, or who have done work on drivers. We have our own JDBC driver written by Marcus Eriksson.

None of those numbers count toward commits done on engines that we have included. We don't include that many third party engines though, since we have created a fairly high bar for storage engines to meet in order to be included.

Going from the world of MySQL where we had nearly zero contributors to where we are at today has been pretty amazing and incredibly rewarding.

Link | Leave a comment |

Comments {5}

Henrik Ingo

Interesting stats

from: Henrik Ingo
date: Oct. 24th, 2010 11:09 am (UTC)

Hi Brian

Interesting stats. In the Spring I did an informal poll about each FOSS database community. The numbers you present are close, but slightly larger than the guesstimate I got from one of the Drizzle guys then. By these numbers, it seems Drizzle is already bigger than PostgreSQL, for which the estimate was 70 developers, almost all of whom are part time or night time.

Depending on what one wants to know, I actually don't think it is wrong to also include the engines. The engineers at Oracle and elsewhere contributing to InnoDB do contribute to Drizzle, whether they want to or not, and Drizzle without InnoDB is not a very interesting database. So if the question is to know the total amount of people and man-hours going into Drizzle, engines could be included and that would probably double what you present here.

I hope later in the winter to do a similar study on the MySQL/MariaDB/XtraDB family of code, although I'm not quite sure yet how to even define what to measure. (Currently I'm thinking MariaDB as the most inclusive variant would give a good lower bound estimate for all the commits going into the whole codebase of compatible MySQL variants. But I'm open for better ideas.)

Reply | Thread

Re: Interesting stats

from: ext_296916
date: Oct. 24th, 2010 05:17 pm (UTC)

Regarding the Storage Engines,
Since Drizzle is using all best class libraries around starting from Google Protobuf, GMP to Boost. any improvement / fixes on those libraries will be a contribution to drizzle indirectly. So is the case with Storage engines.
so if you are counting Storage engines, we need to count libraries also. :)

IMHO, growth of FOSS software is not easy to measure.
or what i see is a tip of an iceberg.

Reply | Parent | Thread

Brian "Krow" Aker

Re: Interesting stats

from: krow
date: Oct. 24th, 2010 08:21 pm (UTC)

We are certainly not bigger then Postgres when it comes to active developers (this is an assumption but it is based on just watching their conference.

We don't include the engines, we just include the work of the engine authors as each single commit that they do where we push their work into Drizzle.

Why not just pull the trees and account the work people are pushing to it? I don't think it makes a lot of sense to try to "down stream" the count. As another writer commented, if you do that, where do you stop? MySQL has mysys, we have boost. The number of people who work on boost is huge, and I would never count them.

Reply | Parent | Thread

Brian "Krow" Aker

Re: Interesting stats

from: krow
date: Oct. 24th, 2010 09:59 pm (UTC)

I was thinking about this a bit more this morning. I believe that it is disingenuous to count the engines since they are bodies of work unto themselves. In our count we only look at work as units pushed into our tree. In our case Paul pushes just PBXT and he does that in single commits (i.e. we count each of his commits count as a body of work).

Innodb? We take some patches from it, but we take those as single units based on whomever pulled and applied the patches (and this is the only Oracle work we look at, so there is nothing else to be gained for us in numbers). We won't even be taking Innodb patches in the near future because of our own initiatives in this area.

Reply | Parent | Thread

Henrik Ingo

Re: Interesting stats

from: Henrik Ingo
date: Oct. 25th, 2010 08:15 am (UTC)

Like I said, it depends on what you want to look at / measure.

If you want to know who is active in Drizzle community, then what you presented is what I would do too.

If you want to measure the amount of work that goes into Drizzle as a database product (man-hours, investment in dollars, whatever someone might be interested in), then including the engines would make sense. After all, I don't consider Drizzle itself a usable product without at least one engine included :-)

As an example, someone might want to compare the development effort going into Drizzle vs Postgres, for whatever reason. Then I would want to compare Drizzle with engines. If you compare a Drizzle that cannot actually store tables on disk, then you are not comparing apples to apples.

But the boost and protobuf arguments are good. For me it is clear that these are general purpose libraries and storage engines within the MySQL descendants are not. But this is a bit of a "I know it when I see it" argument, the world is not black and white.

Reply | Parent | Thread