Engines, On the State of

« previous entry | next entry »
Oct. 13th, 2008 | 09:31 am

So many engines, and so little to choose from. This is one of our two major decision points in Drizzle right now.

Let me explain.

Today we have Innodb, Maria, Falcon, and PBXT.

Simple?

Not really. Innodb is not a single engine, it is three engines. We have the default one which is shipped. It has been the wunderkinder for years now but has been showing its age. Go buy a piece of hardware that has four cores and it quickly becomes apparent that it is not aging well. There is the Innodb plugin, and while it delivers on features, performance still evades it. Both are works of the Innodb team at Oracle. The development style for Innodb has never been open, but they have always consistently delivered. Right now though? This delivery seems to be slowing. Since they do not function in an open model it is very hard to work with them. This means we have to shoulder most of the work, though the Innodb team has been responsive to questions.

We have the Innodb produced by Google. It is of the standard design, but has been modified with performance patches. These are widely believed, and often show, performance increases on hardware above four cores. The issues around this engine are more about maintenance. Google is happy to drop its patches out the door, but shows no sign of wanting to bundle these into a release. This makes perfect sense, they aren't in the business of releasing databases. The Google developers are doing a good job of getting their patches out in chunks and seem genuinely interested in getting them into trees (though they themselves do not do this work). They are not though a committed team, they are group focused inwards who get open source enough to understand that publishing their patches is a good thing. There may be an answer in looking at Percona's builds, this is an unexplored option at this point. They have been doing releases with the Google Innodb code. Their development model is not open. They do have an outward facing view of the world though since they work as consultants.

Maria continues to move along, but it is not transactional at this point. This makes it a non-starter. When they get it working, then it gets a ticket to the ballpark. It also hooks in deeper to the server then any of the other engines (aka bypasses the engine interface). It relies on the mysys library that MySQL ships. This makes it for us more difficult to work with, though all problems are solve-able. It is not being developed at a very quick pace.

Falcon has been released in the Alpha 6.0 MySQL tree. It though is alpha and has not shown to perform well in general against Innodb. It keeps going through design changes so it is not really a contender for use at this point. On the plus side for me it keeps to itself and the code is distributed as a complete library. Which means if we did integrate it into Drizzle it would be relatively simple. It has an active development team. To this date though we have not worked with them at all.

PBXT has shown over time steady improvement. It is hard for me to gauge at this point where it is in its development cycle. We have just pulled it into Drizzle recently and we know it fails some of our tests (keep in mind, the test system is only designed to test MyISAM, we have found bugs galore in shifting to Innodb as the default engine). Right now its design lends more to performance around indexes. Scans are still a performance bottleneck. This might be fine in our world, since for the web you typically only read from indexes. It does require row based replication and this is at issue in the server at general (someday soon there will be a long blog post by me on the sorry state of replication). Paul, the main developer, has been very active though and this wins big kudos from me personally. I would rather work with active developers and help them fix their work, and skip working with folks who are not so active.

So this is the state of it. I have a few other random thoughts, but at the moment I am left with the question of "what to do in the future". We have had a few attempts at merges from the different Innodb trees, but so far none of these have been completed. PBXT is moving along well and we have begun to take patches from Paul to help him, and us, with testing. A couple of the Falcon folks have approached me about getting a tree working with their engine, but nothing has come of that. If the Maria team can kick out a better MyISAM I am open to replacing ours, though this is not a priority.

Paul's recent changes make it much easier for us to maintain an active PBXT tree and Innodb tree.

So what is the future?

I am not sure at this juncture. We will continue down the path of trees for PBXT and Innodb. Those are the contenders at this point and no matter the performance issues with Innodb, it is prudent to keep it around because of its stability.

Next year though? I am not sure.

Next year is coming quickly though.

Link | Leave a comment | Add to Memories | Share

Comments {6}

Arjen Lentz

OurDelta

from: arjen_lentz
date: Oct. 13th, 2008 10:08 pm (UTC)
Link

A fairly large group of people in the community have been working on at least some aspects of this: http://ourdelta.org/

Reply | Thread

Oracle and Open Source

from: burtonator
date: Oct. 14th, 2008 01:20 am (UTC)
Link

IMO the problems around InnoDB at this point fall on the shoulders of Oracle.

I was optimistic that this would change after they announced the InnoDB plugin at the users conference but I should have bet on pessimism.

It appears they've gone heads down on features.

Google is doing the right thing.

As of now they're NOT the owners of the InnoDB code so the best they can do is push a few patches over the fence and hope Oracle does the right thing.

Perhaps it's time to fork InnoDB...... It's working well for Drizzle .

For now the stability and mult-core performance issues are more important for me on InnoDB over features.

I'm going to be buying a rack of 8 core machines soon and I don't want them DoA simply because InnoDB can't scale.

16/32 core machines are right around the corner.

Reply | Thread

Brian "Krow" Aker

Re: Oracle and Open Source

from: krow
date: Oct. 14th, 2008 06:07 am (UTC)
Link

I agree about Google, I think they have been doing the right thing.

We are seeing forks, the question is going to be which one will be the dominant one in the end... I really do not know.

Reply | Parent | Thread

peter_zaitsev

(no subject)

from: peter_zaitsev
date: Oct. 14th, 2008 06:42 pm (UTC)
Link

Just wanted to drop a note about Percona model being open etc.

We release the version which is we thing best solves out customer issues. We're doing development ourselves as well as pulling in good and stable patches we find. We do not actively see for contributions however we're open accepting any patches which we think meet our goals and quality guidelines.

Reply | Thread

Brian "Krow" Aker

(no subject)

from: krow
date: Oct. 15th, 2008 04:15 am (UTC)
Link

I had not seen much by you all on your open source efforts (aka in publishing trees and etc) until I made this post :)

Do you have a contributor policy?

Reply | Parent | Thread

Patrick Galbraith

FederatedX

from: capttofu
date: Oct. 27th, 2008 12:14 am (UTC)
Link

Brian,

I will.. I promise, get FederatedX going for Drizzle. This #@#$ing book (I love writing it yes, yes I do really) eats up time, but I *need* to get FederatedX for Drizzle. We can discuss at OpenSQL camp.

Also DBD::drizzle, which I'm resuming work on as I write this, I will get working.

Reply | Thread