Brian "Krow" Aker (krow) wrote,
Brian "Krow" Aker

Drizzle, State of Testing

Testing, Testing, Testing...

I've gotten a number of questions about how we are doing testing, and even how our methodology for accepting code works :)

A lot of this comes from running open source projects for almost a couple of decades (hell, if I toss in uploading public domain to software to BBS'es for the Commodore 64 it is a bit longer!).

One of the most important rules I have learned over the years is that anything that is not automated and not required, will get skipped.

Today Drizzle runs 213 tests, the entire MySQL test suite minus tests that are for features we don't have. We don't allow for any regression, meaning that no one is allowed to disable a test in order to get their code pushed. Our test suite was also modified so that we can run all of the tests against a particular engine. Today we do this with both Innodb and PBXT. So instead of having "engine specific" tests, we can test everything. Feedback we are getting from storage engine vendors is that this is golden. Even if they never release for Drizzle, they can use it to vastly increase the testing they do today.

We also do not allow any code to be pushed that causes a compiler to toss a warning. We do this for a wide set of versions of gcc, and also for Sun Studio. We treat warnings as errors :)

We are also enormously proud of this fact. This took a lot of effort :)

Our most recent change is that we now include a regression test for performance for sysbench. For each push into our "staging" tree we run a full test at different steps of "connections". We test both read-only and read-write workloads. My only real complaint right now about this system is that we look at absolute numbers, and being a math geek, I would really like some more information on standard deviation :)

The development process we use maximizes our use of tools. For Drizzle we accept no patches via email. Your patch must come through Launchpad, where you must have an account. We do this so that we can always track down "who did what". This is done almost entirely without exception, and the exception being that I am sure someone at some time as either pointed out a bug in code to me or has made a comment on how something should be written while reading over my shoulder.

We don't require anyone to sign a "all your bases belong to us" sort of contract. Speaking as myself, I personally find them unpalatable and frequently suggests developers think about them before ever signing. Code flowing is what I find important. With Drizzle we have had a 100+ contributors, and I suspect that number would be much smaller if we had taken a different point of view on this. One of the most interesting conversations I ever had about this topic was with Rasmus, who has continued to refuse to sign the Apache Foundation's agreement. If someone was to ever put code in from their employer, we can strip the code back out in seconds. We can also hand over all of their information to the original copyright holder for them to prosecute. The ones of us who do final review of code are very nitpicky about this. One of the nice things about "source code" is that it has finger prints. Anything that doesn't follow our coding style, which is very specific, stands out.

Code which was not original tends to always show it roots. I know from talking to Theodore Tso that IBM gives its developers who take in code to its open source projects, a class in how to identify copyright violations. IBM's general handling of open source always tends to impress me.

Having a modular approach in our design also means that any "large" sort of change can be reduced down to a plugin. Our average patches are bug fixes or refactoring bits. If you find that you enjoy making code readable, fast, and standard looking, you will probably enjoy working on Drizzle.

If you want to write fun and new code, then you should look at writing plugins!

On a personal note, if I was to write a database from scratch, I would go after a completely different problem domain. I still happen to need a relational database though, which is why I work on Drizzle.

That and the people who are working on the project are both awesome and fun :)

Code flow in Drizzle is pretty simple. You write a patch, you submit a tree to Laundpad, and one of the captains reviews it and puts it into their tree. I pull from one of their trees and merge into a local tree. Before I push the code it has to pass all tests/etc on 64bit Intel Fedora, Open Solaris Sparc, Solaris Sparc, and OSX. When I get around to buying myself a new desktop Mac I am going to spin up a few more platforms in virtual boxes, so that I can test more platforms. There is a simple Gearman system I use to populate and run the code on all of these systems.

Once the code passes on all of the above it goes to our staging tree on launchpad. From there the automated system gives me feedback on regression. If we pass then the code goes to trunk. Our turn around time for code is frequently about 24 hours. Since all of the above testing is done, we drop tarballs anytime we want too.

It is pretty much clockwork for us. If a human wasn't involved I suspect we could just set a cronjob to handle it every two weeks (and who knows... if Lee gets bored with this, maybe he will do exactly that!).

What is the future?

Today we generate automated reports for cachegrind, callgrind, and valgrind. We run pahole by hand. We are told that there is a new tool for generating random queries for MySQL for crash testing, we need to look into this.

I've also got a set of tests written around drizzleslap (aka mysqlslap) that we need to toss in soon.

All of these need to go into the process. We should never regress on L2 misses or branch predictions. We should never see holes in our structures/classes.

We don't have enough tests for failure cases. Our test coverage is public:

I want to see that increased. The problem is that there has never been a test system for "this is how to fail at X". A few hacks have been written, but we need to come up with a complete methodology for this. The one open source database that has an awesome test suite for this is SQLite, and we need to spend some time learning from them all of what they do.

Over dinner recently with Josh Berkus, he mentioned the work they are doing on pgbench. I am hoping that we can get a patch in it so that it will support libdrizzle. That way we could use one of the Postgres tools as well.

These are our next big steps :)

Have any suggestions? Want to contribute? All of our tools are open source. We welcome extensions and they are general enough that almost any other database could use them as well!
  • Post a new comment


    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded