Log in

Drizzle goes GA, From "What If", to "What has"

Mar. 17th, 2011 | 10:02 am

Not quite three years ago I wrote an article called "What If?".

What I wanted to do was go back and rethink decisions we had made during the years, especially decisions that we made for MySQL 5.0.

5.0 exists because of the MySQL/SAP alliance. SAP wanted to replace Oracle with MySQL, and to do that MySQL was going to need to run SAP R3 in order to do it. We didn't just pay lip service to SAP, there was an effort to make this happen. Somewhere in the middle of all this there was also a very odd "we were going to adopt SAPDB as the next MySQL". Which of course was never going to happen. There were countless meetings over this, and attempts to somehow sprinkle even an ounce of the SAPDB code into MySQL, but that never happened.

As far as making R3 work on MySQL? That was incredibly unlikely, and it was damaging to the product in the end. We ended up with a lot of features that the database was never designed to have. We created an unrealistic set of expectations. We had a source base which had too little testing.

So part of the goal with Drizzle was to cut it back to the core and build modules that we could then create better testing for. So for that reason Stored Procedures, Views, and Triggers were out. None of them were well designed, and all of them had/have major bugs. We tossed out the monolithic kernel design and moved to microkernel design.

MySQL 5.1 made an attempt to patch the replication system that had been written a decade ago. MySQL replication works, but it works with a lot of exceptions. Anyone who has ever put it into production is aware of these. The good thing about MySQL replication is that it mostly works out of the box, and that is something that was a bit of a revolution when it was created. Today? With the notable exception of SQL Server, the rest of the major databases still have replication systems which are difficult to use, install, or deploy.

We initially looked at using 5.1's replication. We were only going to refactor it such that we were going to beef up its file format and switch to just using the row based replication that was added in 5.1. We were unsuccessful in refactoring it. About 9 months in we figured this out, and we began a rewrite.

The rewrite was the right answer. The original code had too little testing for us to ever know whether or not a change we made created bugs or not.

A big lesson learned, if you are going to refactor code, make sure you have plenty of testing up front.

Internally we have "new code" and "old code". If we want to make a change to "new code" we can typically do it very rapidly. The rate at which we can extend it is pretty amazing. The MySQL code base is not friendly to anyone who knows C++. Pretty much all of the warnings have been disabled and there are a lot of tricky bits.

We have fixed all the warnings in Drizzle. This is something that isn't sexy work, and the only way it is justified is because cleaning up warnings fixes bugs. If you are starting a new code base let me implore upon on you the necessity of doing this from the beginning.

Today our replication is pretty spiffy, and it answers a couple of the big "What If" statements I have wanted answered:

1) We use an entirely open message format.
2) We store our replication records directly in Innodb.

The open message format comes with a penalty, it is more verbose than a native format. It takes up more space than if we just shipped the block records created in the transactional engine. Running a point in time recovery on block records is tricky and very limiting. You can't take the data from one database and push it to another. ETL? Forget it.

We used Google's Protobuffer for the message format. There are other libraries available but they were either license incompatible or were not widely known. At the time we hadn't made a decision to go with boost so using its serialization library wasn't an option. The disadvantage has been that the Google library created a dependency for installing Drizzle. Dependencies are a pain, and when we started Drizzle I had thought that the different Linux distributions had a good handle on this, I don't really believe this any longer. Avoid dependencies.

Storing the records in Innodb has always seemed to be an easy win in my mind. It solves a lot of the two phase commit problems that plague users and it gives you instant recovery. Storing the log in a separate file can possibly give you a win in that you can do some tricks with IO, but in the end it just complicates everything.

With MySQL you always need to keep in mind the question of "What would MyISAM do?"

MyISAM's design, and limitations, are scattered throughout the program. In all cases MySQL has to ask "how will this be handled if we need to store data in a storage engine that can't handle failure, handles all of its own IO, and needs to be locked at the Table level?".

We dropped MyISAM support about a year into our work, and relegated it to a support only roll for temporary tables. We didn't hide it completely before we GA'ed Drizzle, but we won't support it long term. I've heard users say "but I want its performance!". Trading performance for reliability works out for some people, but certainly not everyone. What I find is that when someone wants this, what they really want is a different sort of database all together. Typically it is some sort of analytics problem which creates this need.

Which gets us to the storage engine interface. It was within MySQL the first attempt to create an interface that we could plug different solutions in. I had proposed it in MySQL because I had written different engines and knew what a nightmare it was to make it work.

That engine interface has generated millions of dollars. When I wanted to make it available at MySQL the backlash was significant. Some of sales freaked out, some of marketing thought we were going to let others take over the product, and alliances wanted to know how we were going to limit it to "select partners". On top of that, half of engineering wanted to go and re-engingeer it immediately.

In Drizzle we have spent a significant amount of time reworking the interface, but it is far from perfect. We redesigned it so that engines now own their own meta data and federate that data to the microkernel. We also designed the interface to require that all new engines have ACID like qualities, know how to handle their own recovery, and can handle failure gracefully. Our core engine is Innodb. We have had others propose new engines, and we have even supported other engines, but at the end of the day we know people want a transactional engine mainly because they don't want to find that their data has been trashed.

Our Innodb is a little different. We have more views into the state of the engine, and we fixed our version to compile with a C++ compiler. We cleaned up warnings and fixed the bugs that popped up from that. We have begun to refactor it so that it is more integrated with Drizzle's thread scheduler.

Innodb would have been the default engine for MySQL long ago if not for some "not invented here" mentality, mixed with a flopped buy out attempt. Heikki, the inventor of Innodb, came out quite well in all of this. Good for him.

I don't believe we will spend much more energy on the storage engine interface going forward. It is a dead business, and while there are a couple of companies that have built brand and product enough to make a go of the business I don't expect any additional ones will show up. The storage engine business made money for MySQL, but it was a big distraction. While with Drizzle is easier to integrate an engine, I'm not sure that a business exists for storage engine vendors with it. I'll write more about this at a later date.

Speaking of dates, Drizzle's internal format for timestamps is 64bit. There is still some work to be done to allow to use all 64bits, but you won't need to recompile or change your disk format for them. Right now we need to fix some tests, and make sure a couple of functions will handle the formatting, but we store your data such that going forward, or backwards, you are in good shape. Unlike MySQL we store time as GMT, so there is no screwing around with "well I stored my data in my local time zone, but we had the machine set too...". I have personally spent over a month of time just fixing bugs in that code.

We have spent a lot of time fixing bugs. We get a big collective smile on our faces when we read about new bugs that have been discovered in MySQL, when we discover that we don’t have the bug. We have spent a lot of time fixing bugs, and a considerable amount of our time has been spent on finding new ways to test Drizzle. I am sure plenty of bugs exist to be found.

We also support storing/comparing/displaying time with microseconds. We also have a real BOOL type, which I have been told is handy for the SQL Alchemy folks, and a native ISO UUID type. The UUID is interesting in that it stores time as well as being unique. It isn't as fast as "please give me the next number", but I believe it will be useful for a lot of applications. We have refactored all of the types, and the only one that was not size related that we dropped was SET. If you wonder why we dropped it read the section in the MySQL manual about its limitations and bugs.

Why do we allow only DELETE against a single table at a time, like pretty much every other RDBMS? Beyond the conceptual issue that few can wrap their heads around how to form, let alone feel like they know what the query will do, we hit the problem of the "multiple execution path". There were a lot of one off execution paths in MySQL. In a lot of cases I know these were dead refactoring projects that were never completed. The "multiple execution path" problem is particularly disturbing when you think about fixing a bug. If you fix a bug in DELETE you need to know that there is an execution path for a single table that is different then the path used for multiple tables. This leads to odd behaviors, and a much richer set of bugs that exist.

SQL modes? Those are gone. If you wonder about what sort of problem they create inside of the server, I'd suggest reading about the "Legend of the Ambalappuzha Paal Paayasam".

In general in Drizzle we have tried to get rid of Gotchas that we have found. Things like declaring a column NOT NULL and discovering that somehow the database still stored a NULL is gone. Altering a table and adding a field that would violate the structure of the table? That is gone.

It is amazing that ALTER TABLE works, as the code there is Byzantine. We have made some effort to clean it up, but it is still way too tricky. I wish we could have done more there, but it is what it is. Are you using partitions? Make sure you back up your data before doing an ALTER TABLE. Wrapping partitions into the system in the way that was done at the time was simple, but it is far from robust.

I had hoped that with 5.1 we would have created a single logging API, but instead we ended up with multiple logging API internally in the server. With Drizzle we ripped all of them out and installed a single API. It is crazy simple to write a new logging plugin.

Which gets into the philosophy of plugins in general. Writing plugins should be low hanging fruit. Whenever possible we have tried to make it the case.

We have an entirely new INFORMATION_SCHEMA in Drizzle. It is based in table functions, which is a new concept in Drizzle. We keep a separate schema called DATA_DICTIONARY, in it we put whatever we like. Our INFORMATION_SCHEMA is only what the SQL standard has specified. We do zero vendor modification to it. Another hats off to SQL Server, their INFORMATION_SCHEMA is the closest to complying with the standard.

Drizzle's drivers are BSD. They were written outside of Sun, and Sun signed off on contributions to them under a BSD license. They speak Drizzle and MySQL's protocol. There is a JDBC version that was written. Their adoption is becoming wide spread. Licensing clarity around them is a big win for us, and for ISV who want to integrate. MySQL's licensing mess was related to a lot of hand waving involving them. Recently I noticed that MongoDB had written up a clear licensing policy with regards to their own drivers. Awesome.

We never got to finish all, or really much of any, of what we wanted to do with the Drizzle protocol. I believe this is an area where we will see change in the near future. Internally inside of drizzle we have a C++ interface that resembles JDBC that lets us execute queries. We will be doing a lot more with that interface going forward.

What about performance? With Drizzle we began doing benchmarks early on, using a few different benchmarks. The benchmark generated by sysbench has always been the benchmark we have used as our bellwether. Unlike a lot of databases we test Drizzle with up to 1024 concurrent executing queries. Most of the benchmarks I see people run are for far less connections. We have chosen time and time again to favor performance gains at the high end, over gains on the low end. We are roughly double in performance from where we began. We could still do a lot better. MySQL 5.5 has a new meta data locking system which should do well in a number of situations, we could do a bit better in some of these cases. Our lack of a MyISAM would make it simple for us to move forward in this direction if we want too.

There has been a lot asked about our claim on scaling with lots of cores. Our process there is simple, eliminate locks, favor performance gains when we find them that favor of additional CPU, and try whenever possible to remove strong ownership that require waits for locks. MySQL relies on MyISAM, and MyISAM has significant locks, especially around the keycache, we got rid of those by freeing ourselves from MyISAM. We have had some gains with our new scheduler and we have done some to improve how IO is handled.

I am sure we have a lot of tuning still to do. We won't be publishing benchmarks which compare us to others though. I've yet to see a comparison benchmark which wasn't completely flawed, and even when they are not, few people really understand them. They fall into the classic "how many angles does it take to dance on the head of pin" conversations.

Our authentication system is modular, and we need to iron out more of the authorization system.

I've seen someone say that Drizzle is designed for Google and Facebook. This is not the case at all. We built it so that the next Facebook, Google, etc would have a platform to build on. Facebook and Google have their own forks of MySQL, they aren't going to be using Drizzle. The pieces are there for the next company who needs to innovate, it is just a matter of someone making use of them. We speak the MySQL protocol, so the typical MySQL application runs just fine on Drizzle without change. We designed Drizzle to work as a piece of someone's current infrastructure, not be yet another application which has a costly integration. We have a NoSQL sort of solution via the blob streaming module, but we are first and foremost a relational database.

What will the next Google or Facebook find? A much more friendly platform than what MySQL provided to develop on and with. The big success for Drizzle has been in the people that have been involved. We are without a doubt the descendant of MySQL that has the largest contributor base, and we have long passed MySQL with regards to contributors. We are well into the hundreds when it comes to developers who have contributed code. We have had more then 921 commits in the last month across 20 people. Our numbers go up and down, but we are consistently more then double anyone else in size. If you just walked out of college, or skipped it all together, you are going to have a much easier time adjusting Drizzle to your needs . At least we believe this :)

The codebase is C++, we make use of Boost, and while we are cautious, we tend to favor more forward thought in how we code. Readability is the key to creating code that others will use. Because in the end? We can scale silicon, but carbon? People are much harder to scale.

The people to thank for the code:
Brian Aker
Monty Taylor
Stewart Smith
Lee Bieber
Jay Pipes
Padraig O'Sullivan
Andrew Hutchings
Marko Mäkelä
Joe Daly
Olaf van der Spek
Vijay Samuel
Patrick Crews
Toru Maesaka
David Shrewsbury
Eric Day
Marisa Plumb
Joseph Daly
Barry Leslie
Asil Dimov
Mark Atwood
Tim Penhey
Jimmy Yang
Paul McCullagh
Nathan Williams
Paweł Blokus
Sunny Bains
Andy Lester
Hartmut Holzgraefe
Trond Norbye

Other people to thank?

David Douglas who at Sun supported us initially, and when we didn't think our internal support initially at Sun could get any better? Bob Brewen worked with us till the end came for Sun. An extra mention should be made for Lee Bieber, he has been working with the project from nearly the beginning as well. He has handled project management, done code refactoring, made flyers, organized dinners, and did everything else in between.

Mike Shadle for getting us machines, and making sure everything runs. Adrian Otto at Rackspace should be thanked (along with a number of other people as well).

A thank you should go to Chris Dibona for the Google Summer of Code project. We have a number of students who now work on databases for a living thanks to that program. While with MySQL we constantly failed at getting student's code into the server, with Drizzle we have had a lot of success.

There is an entire channel of people who have been involved with Drizzle on Freenode in #drizzle who should be thanked as well. IRC is how we communicate.

There are a lot of other people I am forgetting to thank, sorry about that.

So what next? There is a lot more to Drizzle then what I have written above. Having worked on this for years I often forget what the differences are anymore. There are lots of new features, plenty of new enhancements, and new bugs just waiting to be found. I'm giving a talk at Web 2.0 Expo in a couple of weeks in San Francisco were I will talk about some of what we have done and are doing for virtualization.

I will being giving a keynote at the O'Reilly MySQL Conference & Expo, and there are a handful of talks there on Drizzle as well. The MySQL Ecosystem is a radically different place then what it was a year ago, I'll be commenting on it in the future online and at the conference.

About a week ago Monty Taylor and I sat down and talked about what we wanted to do with Drizzle going forward. Monty has been working on this since the beginning with me, and he has been a lot of fun to work with. One conclusion that we both came to was that we want to see where people will take Drizzle before we determine too much about its future. It is easy to get caught up in new features, and we are interested in seeing how others use it before too many decisions are made about what to do next.

Originally posted on blog.krow.net.

Link | | Share

Wikipedia, Mornings, and Danger of an iPad

Mar. 12th, 2011 | 11:21 am

Wakeup at 7:08

The movie I was watching last night, a semi-documentary on the early punk music scene leads me to remember that there was a character called the “Brood” in the Marvel Universe.

So I lookup “Brood”.

Next I read up on the Acanti. Who doesn’t like space traveling wales that do harmony?

Which leads me to read up on Acheron Empire because of a slight reference to to the Acanti.

This of course leads me to read up on the Hyborian Age, and read the entry on Robert E. Howard. I briefly read the entry on Red Sonja, which leads me to the entry on the film, ad makes me wonder what was up with the copyright on Conan at the time. I also wonder what the X governor of California is up too, and wonder if the movement to change who can become president is moving forward at all. Which makes me wonder if he wouldn’t make for a somewhat palatable republican candidate. I can only imagine him on stage with Palin.

I then retreat from this entire tangent.

I look to see whatever happened to Brigitte Nielsen and discover that celebrity drug hab TV exists, and then have to look up Jamiee Foxworth because I have no idea who this child actress was. If you aren’t Gary Coleman or Drew Barrymore. I have no idea who you are.

Sometime in the last year I’ve read the article on Drew Barrymore, so I can skip that.

Jumping back I read the entry on “Kull of Atlantis”. I do not get the appeal of barbarian fiction.

The apeman article is just a jumping off point for a number of topics. I read up on the concept of “Person” and read about Humanezee. Parahumans? Check.

Did you know that there is a movement called the “Great Ape Project”?

Give rights to our fellow apes.

The DNA difference is quite small. I then read about the Soviet project to breed human hybrids. I know the germans had one as well, but I don’t notice it linked into any of the articles. The story of “Oliver (chimpanzee)” is pretty sad. We humans suck. I read up on Karl Pinkington, and from there…

Ancient kingdom of the Picts, which requires me to understand Bede the historian.

The modern movement for different countries in the United Kingdom to gain some level of independence is fascinating (especially since in the end it disenfranchises the English (deserves you right…).

Transhumance has nothing to do with transhumanity. I did briefly see an article on that, but I have heard enough about it in life. On the other hand reading up on transhumance gives me a better picture about the legalize structure of a nation.

Want to start a movement to do away with counties in Washington?

I find myself reading up on Wales which leads to articles on the End of Roman Rule in Britain. Hadrian’s Wall? Sounds like it was a taxing structure. Built in seven years and 80 miles long!

I have doubts that we could do that today. It was historically saved by a plan of purchasing land around it, using that land for sheep, and using the proceeds to buy more land.

Devolution, Padani, and Nunavut all follow. I hadn’t realized how much the structure of Canada had changed in the last could of decades.

Go Nunavut!

BTW Canada? You suck for shipping eight families off to the great white north and not letting them come back south when they realize that you had sold them a bill of goods. That is just awful. Who would have thought Canadians did this shit?

Grise Fiord? At least let them rename it to a language that is spoken in Canada.

The Welsh at least have “Snowdonia”. Way more pleasant name.

Time spent on the above? About two hours.

Originally published on blog.krow.net

Link | | Share

Do It Tomorrow, Simple Notepads are the Solution to Stress

Mar. 8th, 2011 | 11:58 pm

I was glancing through Boing Boing tonight and noticed an article about an iPhone Application which is yet another “this is how to organize your life”.

There was a point in my life where every time I had an idea I would open up a window, put in a little bit of code, save, and move on. Despite the small amount of effort that this took I found it stressful.

I had all of these directories scattered around that had ideas in them.

Sometimes, if I had a window already open, I would just go work out the idea if it was small enough in the current code I had opened. Most of the time this was ok, but only ok. I’m not religious about code when it comes to a patch being only one thing (I rollup patches when they are related so there isn’t a lot of difference). It is not the best practice but it isn’t the worst either.

The problem was, sometimes this would go wrong. The idea I had required more work then what I thought, or the idea was a distraction from what I was working on.

So how did I solve this?

I just keep a simple note in my email where I tack on new ideas as I have them. There is no organization, no tags, and I just simply delete a line once I have completed it (and I doubt I will ever complete all of it). I scan it from time to time to see if I can remove anything from it, but for the most part I just leave it alone.

Once I have added an idea to the note I find that all of my stress goes away. I’ve recorded the thought, it will be there later to look at.

I find that most of the stress I have is not about completing what I need to complete, but it is about losing the knowledge of what I might like to complete.

Originally published at blog.krow.net

Link | | Share

Google Summer of Code! Have an idea for Drizzle?

Mar. 7th, 2011 | 01:02 pm

We have submitted our application for Google Summer of Code, and have a wiki page up for projects.

Many of our students have went on to get jobs in the database industry after GSOC, and we have a high rate of “you wrote the code, it will end up the main release”.

Are you a student and databases are not your thing? I’d go look at other projects which have been successful with GSOC and see if there are anything that interests you. Open source is an awesome way to get real world experience developing software.

Originally posted on http://blog.krow.net/.

Link | | Share

Ignite MySQL!

Mar. 1st, 2011 | 01:36 pm

Just a reminder, the call for submissions for Ignite MySQL is now open->

Originally Posted on http://blog.krow.net/

Link | Leave a comment | Share

Upgrading, Hudson to Jenkins

Feb. 9th, 2011 | 12:56 pm

Upgrading, Hudson to Jenkins
Yesterday we finished upgrading Drizzle’s hudson servers to Jenkins. We are a long time Hudson user, and we love it. It has been one of the most top notch pieces of open source software I have seen delivered in the last few years.

The authors, especially Kohsuke Kawaguchi, have done an amazing job.

So why move from Hudson to Jenkins? Because the authors have moved. The people we have trusted, and respect have been required to change the name from Hudson to Jenkins. We will bet our money and time on the folks who have earned it. These are he people who have been delivering time and time again.

In the last couple of years I have sold a dozen or more companies on Hudson (and a fair number of open source projects).

Over the years I have used a dozen or so continuous integration systems. Some were home grown, and some were purchased. In the software business, there are two things that differentiate the professionals from the rest. How you make use of revision control, and how you test your software. I have often wished that we would have had a piece of software like this for MySQL, and I love that we have had it for Drizzle.

Jenkins is just as important as git, bzr, and subversion. I am really happy to see the direction the authors are taking it in.

Reposted from http://blog.krow.net/post/3203287730/jenkins-to-hudson

Link | Leave a comment {1} | Share

O'Grady's Fear of Forking, Let a thousand flowers bloom

Nov. 16th, 2010 | 11:18 am

In the article "Fear of Forking" there was a quote pulled from me about my observations from a yearly call done by the folks at O'Reilly with many of the authors of different open source projects.

"On a related note there was a recent phone call that O’Reilly put together with a number of open source leads. It was amazing to hear how many folks on the call where terrified of how Github has lowered the bar for forking. Their fear being a loss of patches. It was crazy to listen too."

Since I made that comment, there is one new observation I have made. GitHUB has begun to feel like the Sourceforge of the distributed revision control world. It feels like it is littered with half started, never completed, or just never merged trees. If you can easily takes changes from the main tree, the incentive to have your tree merged back into the canonical tree is low.

I feel like you can look at it in either two ways.

If you count up all of the hours and energy going into abandoned trees then you begin to worry about "all of that wasted work". It takes a lot of effort to keep projects going, and if all new energy is focused in this direction I don't know that we can keep a sustainable amount of focus to produce the sort of software that we do today. While consulting this last year I've run into a number of shops where a developer has made changes to an open source project, and placed these into production without any vetting (and in most of these cases they had a github/launchpad/etc sort of tree, or they pulled from some random person's tree). They didn't use a released piece of software, and often the code they had used was just thrown over the wall by some devs in some other company. It is the "we hired a smart guy who tinkered with our debian distribution/kernel" problem all over again.

The other way to look at it, is that Github/Launchpad are today's Burgess Shale. We are in the equivalent of a cambrian explosion and the diversification we are seeing is similar to what we saw when Sourceforge first launched. If this is the case then we will see some stabilization in the next few years. In the database world, we are certainly in the middle of one of these periods.

If I put all of this into perspective and apply it to the MySQL Ecosystem, I fully believe that the forking we saw was enabled by the move to bzr/launchpad. Without that move it would have been a lot harder to make that shift for most of the forks and distributions (and I believe it has also slowed down the evolution of most, since almost all of the forks/distributions are heavily tied to downstream changes that Oracle makes). Beyond Drizzle, none of the other forks have any significant contributions, and they are all stuck waiting for Oracle to fix bugs for them and/or hoping that the changes they make don't conflict with what Oracle is doing.

In a related Ecosystem, I am eager to see what happens in the Postgres world now that they have moved to Git. As I have mentioned before, with Drizzle we could have started with Postgres as the foundation, and I believe there would have been a lot of benefit in doing so. I will be curious to see if anyone decides to see what they could do with the code if they take a radical departure from the current architecture. I am still happy with our choice of MySQL, but I believe there is an opportunity to do something pretty incredible with that codebase as well.

In the cloud world we have OpenStack, Canonical, and Eucalyptus all circling around the same problem and having a history of shared tools and code. It is going to be interesting to see what happens there as well.

Link | Leave a comment {2} | Share

BBC, RSS feed, Editorial in tags...

Nov. 6th, 2010 | 10:20 am

Screen shot 2010-11-06 at 10.10.14 AM.png

Screen shot 2010-11-06 at 10.19.12 AM.png

Get it fast before they fix it:

wget http://feeds.bbci.co.uk/news/rss.xml

Link | Leave a comment | Share

How many contributors does Drizzle have?

Oct. 20th, 2010 | 11:05 am

Opscode posted a note this morning on their current contribution level, which got me to thinking about Drizzle's contributors.

From looking at bzr log I can find out some of the details.

To date we have had 13,478 commits that have went into our tree at all levels. If we look at level two commit (i.e. these are patches that are more likely to be a complete body of work) we have had 8, 064 commits.

We have had 96 total contributors to date who submitted code to the project.

1119 commits by students who participated in Google's Summer of Code Project.

I had someone ask me about my own contributions to the project, they had assumed that I had done more of the work (not even close!). To date I have done about 3,017 total commits, and if we look at the top four is myself, Monty Taylor, Stewart Smith, and then Jay Pipes. Monty Taylor has done 3496 commits, so I have some catching up to do!

Number five on the list is Padraig O'Sullivan, who never worked for Sun or MySQL. He now works at Akiban, but he started doing work with us as a Google Summer of Code Student.

Number six on the list is a sun employee who never worked for MySQL, and four of the people in the 5-10 range never worked at Sun at all. David Axmark, one of the founders of MySQL, has even contributed patches.

Our Trigger infrastructure was contributed to Drizzle from one company (Primebase). Joe Daly has worked out the scoreboard and most of the work done to our optimizer and executioner was done by folks who have never worked on MySQL. There are multiple stories that can be told about individuals or companies contributing work at this point.

We have had only seventeen contributors who have only contributed a single patch.

Launchpad says that we have had 37 active contributors in the last month.

This doesn't even begin to include hours put in by folks who keep our infrastructure working like Mike Shadle from Intel, or who have done work on drivers. We have our own JDBC driver written by Marcus Eriksson.

None of those numbers count toward commits done on engines that we have included. We don't include that many third party engines though, since we have created a fairly high bar for storage engines to meet in order to be included.

Going from the world of MySQL where we had nearly zero contributors to where we are at today has been pretty amazing and incredibly rewarding.

Link | Leave a comment {5} | Share

Where does the Innodb Technology today come from?

Oct. 11th, 2010 | 10:51 am

Every so often I get a question on "where does Innodb technology come from right now?"

I was thinking about it this morning and decided to break it out via a chart. This is a rough number, and in it I give Percona points for the Xtrabackup.

Most of the work that I see, which is not niche work, is done by Oracle, hands down, no question. I don't see any sign of Heikki being very involved anymore but his legacy seems to be alive (or at least embers of them see to have not gone out). Percona provides a lot of niche changes to Innodb, and the Percona XtraBackup Tool (which if you don't know about, you should).

I, and others, look at changes that occur to Innodb for Drizzle. We adopt ones we are comfortable with(keep in mind, that I personally look at MySQL, PostgreSQL, and a couple of other open source databases as well). I am well aware of Innodb's short comings (and even more so with its connection to mysql's monolithic kernel, I noticed that in 5.5 they changed the default engine to Innodb, but they didn't change the testing frame work to use it for all tests).

HailDB is just offering testing right now for the embedded technology. This is giving projects that used the embedded version of Innodb a home since Oracle seems to have shutdown development on it (i.e. no new releases). HailDB is focused on getting the code in shape for open source development (and a better testing framework is being added to keep any regression issues from popping up).

MariaDB, i.e. Monty Program, is distributor of Innodb technoloy that they obtain from Percona.

I saw the note this morning about SkySQL opening its doors for doing business. They state "Through our relationships with strategic partners such as Monty Program AB", which would give them access to the old optimizer team from MySQL, but that is about it. Monty Program derives its knowledge of Innodb via Percona (and relies on their backup technology).

So in the end? At this point it appears that everyone is just a value added reseller of Oracle and Percona's work (though Percona is obviously a distributor of Oracle technology as well).

Screen shot 2010-10-11 at 10.43.48 AM.png

If you are interested in knowing about, and participating in the Ecosystem around MySQL you should either be attending or providing a talk at the O'Reilly MySQL Conference and Expo. This is the conference of the year for MySQL the technology, and I expect this years conference will be the best place to learn about what is happening next.

Update I was asked about bugs released to Innodb as the default engine. To get a good feel for it, force the tests in mysql-test to run as Innodb. We found very few bugs in Innodb itself (Innodb is relatively free of bugs), but the kernel was just not designed for it and it fits in more as an afterthought, not a design decision.

If you are up for another challenge, turn on Heap to just use its range index and not its hash index type.

Update2 I pulled a comment about replication that I had in the original post. Replication deserves more then a midstream one liner.

Link | Leave a comment {24} | Share