?

Log in

No account? Create an account

Myths, GUID vs Autoincrement

« previous entry | next entry »
Mar. 10th, 2007 | 05:56 pm

I am a fan of using GUIDs as primary keys. They are database
independent, object independent and I can either create them in my
application or in the database (MySQL since 5.0 has had a UUID()
function but since it does not replicate I don't use it normally...
this has been fixed in 5.1 though).

What I have heard for years was that Innodb didn't handle them very
well. I have had my doubts, but never taken the time to see if this
was true or not.

This week I finally got around to putting this to a test.

/mysqlslap --concurrency=1,50,100,150,200,250,300 --iterations=10 --
number-int-cols=2 --number-char-cols=3 --auto-generate-sql --auto-
generate-sql-add-autoincrement --auto-generate-sql-load-type=update --
auto-generate-sql-execute-number=10000 --auto-generate-sql-unique-
write-number=10 --auto-generate-sql-write-number=100000 --csv=/tmp/
innodb-auto.csv --engine=blackhole,innodb

So 10K of rows, with increasing numbers of threads making
modifications. Unlike a lot of tests I do this is a growth test...
aka as a site would grow this is what you would see. Normally when I
look at how SMP machines handle problems I am more interested in
seeing how it would split the problem with more threads and more CPU.
In this case I wanted to look at just how the database would behave
as a site grew.

The test I am looking at is primary key focused, and its all updates.
So a read is occurring and a write follows up.

As you can tell from the graph, the overhead, as seen via blackhole,
is nearly identical in cases where UUID() was used and Autoincrement
was used. The only issue I ran into was that Innodb started getting
deadlocks when I cranked up the connections to 300 on my quad PPC
running Ubuntu. I am going to play with this a bit further and see if
Innodb would have similar issues on a dual processor. The binlog was
not enabled during any of these tests so the row locks that we see
for updates when running with statement based replication won't be
seen (but I really want to see how that effects things!).

Just to confirm results I am going run the same test but just do key
lookups and see if any difference pops up. I am doubting it will but
you never know until you try :)
Update Auto vs GUID Innodb.jpg

Link | Leave a comment |

Comments {32}

Brian "Krow" Aker

(no subject)

from: krow
date: Mar. 13th, 2007 07:50 pm (UTC)
Link

I'm working on a benchmark for secondary at the moment :)

The point really is about scale out though, and removing the need for you to have to worry about object conflict.

Reply | Parent | Thread

peter_zaitsev

(no subject)

from: peter_zaitsev
date: Mar. 13th, 2007 08:38 pm (UTC)
Link

Well it is different point. You've started with performance thing here.

If you want to scale out and performance goal is secondary (ie you do not care if you would need 10 servers of 15 for the same stuff) UUID is good way to go.

Normally it is not that hard to generate sequences in the application which allow you to scale out while fitting in 64bit and maintaining uniqueness and locality for efficient btree handling.

I remember Monty even wanted to implement that on MySQL server with OID.

Reply | Parent | Thread

Brian "Krow" Aker

(no subject)

from: krow
date: Mar. 13th, 2007 08:43 pm (UTC)
Link

This was opened up because of a customer who was told "primary key lookup with GUID will be too slow". Turns out its not at all.

Memory usage is different, and secondary lookup when the primary key is a GUID may also be an issue (which I am going to test).

Generating sequences introduces a bottleneck (even if you create servers that do this via range). Personally I don't like to rely on this.

Monty has been bugging me to push my UUID field type, I'll do it sometime in the next few months.

Reply | Parent | Thread

peter_zaitsev

(no subject)

from: peter_zaitsev
date: Mar. 13th, 2007 09:02 pm (UTC)
Link

I see. That is important explanation. UUID point lookup by primary key will not be slower in most cases - if you lookup by random values there is little difference and it mainly happens in very niche case of BTREE would fit in memory for int but would not for UUID.

Reply | Parent | Thread

Brian "Krow" Aker

(no subject)

from: krow
date: Mar. 13th, 2007 09:03 pm (UTC)
Link

This customer was being told that UUID lookup would be much slower then autoincrement... I didn't know the answer. I thought it would be the same, and it turned out to be the same.

On an interesting note, this is a good way to find out if an engine has a crappy autoincrement design. If GUID insert is noticeably faster... then someone has messed up the autoincrement design.

Reply | Parent | Thread

peter_zaitsev

(no subject)

from: peter_zaitsev
date: Mar. 13th, 2007 09:15 pm (UTC)
Link

Right. Agree on storage engine testing.
Regarding lookups you need to make sure about key distribution for lookups to do benchmarks.

I've posted some of my comments on this topic on my blog:
http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/

Reply | Parent | Thread