Log in

No account? Create an account

Drizze, libuuid, Sometimes "other" is better...

« previous entry | next entry »
Oct. 29th, 2008 | 03:05 pm

One of the stated goals of the Drizzle project is to "reuse many eyeballs". I dislike "Not Invented Here", it breaks one of my primary rules that is "all engineers should be lazy".

By lazy I don't mean "don't do your work". Being an engineer means that you build stuff. If you aren't building stuff, then you are not an engineer.

Being lazy means that you reuse other people's work as much as possible. Skip re-inventing the wheel.

Sometime ago MySQL introduced a uuid() function into the server. It creates infinite numbers of keys for you, at the cost of creating a large footprint in your indexes. There is a trade-off in this, but I find people are willing to make it.

What was the problem?

We wrote our own UUID function instead of just inheriting the one that most systems provide. What does this lead to?

Code that only a few eyeballs ever looked at (and we have an active debate on whether its startup is thread safe or not).

We decided to look at this recently as an exercise in "was there a better choice". For this we picked the libuuid code that comes with Linux and OSX distributions.

The end result?

The libuuid code was faster.

Not by a lot, but the performance did show up, especially on multi-core hardware.

Is this a surprise? Not entirely. I would hope that code which is used by many people would turn out better the code that only a few looked at.

I'm attaching the end results. The first run was doing incrementing thread runs while diving the load out among clients as threads increased. The second run shows increasing work as threads increase. All work was done using our default engine (which is Innodb). I used on of my spare 8 core systems. When I get the chance I will look at reproducing it on something larger.
Picture 5.png

Picture 4.png

Link | Leave a comment | | Flag

Comments {6}

Clean IP?

from: bkarwin
date: Oct. 29th, 2008 10:47 pm (UTC)

One reason to reinvent the wheel is that you don't have license to use the wheel. Or at least not under the terms you want to use it. (fwiw, libuuid is LGPL code)

Can you guarantee that libuuid is clean IP? If you can't guarantee this, what is the implication for the IP status of the Drizzle project? What is the Drizzle project's policy toward IP and CLA requirements?

Reply | Thread

Brian "Krow" Aker

Re: Clean IP?

from: krow
date: Oct. 29th, 2008 11:47 pm (UTC)

libuuid is used on every shipping linux distribution. Apple ships it with OSX.

We don't require sign over of IP, we track where it comes from. We have no CLA requirements. If you have issues with it, you just won't use us.

I find the CLA to be little protection. Look around at the market and I think you will find that who is sue'ing who is about patents.

My position on open source licensing is BSD everything. Get your code out there and into people's hands.

Reply | Parent | Thread


from: burtonator
date: Oct. 30th, 2008 01:40 am (UTC)

I don't grok why people use UUIDs.... for their primary keys?

Using truncated hashcodes is so much better as you can route them, they're deterministic, etc.


Reply | Thread

Brian "Krow" Aker

Re: hm......

from: krow
date: Oct. 30th, 2008 01:47 am (UTC)

There is less lock contentions with UUID, but otherwise the only advantage is that they can be generated in the client a head of time. There is a slight advantage also in that you don't have to worry about conflicts between multiple hosts if you aggregate your replication.

Reply | Parent | Thread

Re: hm......

from: anonymous
date: Oct. 30th, 2008 01:51 am (UTC)

Also works better if you're on a distributed/cloud DB and just using Drizzle as a storage node :)

You can compute the hash on the client and then route the query to the correct Drizzle node.

You can truncate the hashcode if you can accept more collisions or just store the whole thing.....

BTW. This is a feature I've wanted for a long time. The ability to store data as binary but use an escaped encoding when writing queries and printing results.

Base64+filesafe would be ideal for this.

SELECTing binary data on the console isn't always pretty. :)

Reply | Parent | Thread

Brian "Krow" Aker

Re: hm......

from: krow
date: Oct. 30th, 2008 02:00 am (UTC)

At some point in the very near future we will have a built in UUID type. It will store as 16byte, but will do proper display.

Reply | Parent | Thread