Log in

No account? Create an account

Myths, GUID vs Autoincrement

« previous entry | next entry »
Mar. 10th, 2007 | 05:56 pm

I am a fan of using GUIDs as primary keys. They are database
independent, object independent and I can either create them in my
application or in the database (MySQL since 5.0 has had a UUID()
function but since it does not replicate I don't use it normally...
this has been fixed in 5.1 though).

What I have heard for years was that Innodb didn't handle them very
well. I have had my doubts, but never taken the time to see if this
was true or not.

This week I finally got around to putting this to a test.

/mysqlslap --concurrency=1,50,100,150,200,250,300 --iterations=10 --
number-int-cols=2 --number-char-cols=3 --auto-generate-sql --auto-
generate-sql-add-autoincrement --auto-generate-sql-load-type=update --
auto-generate-sql-execute-number=10000 --auto-generate-sql-unique-
write-number=10 --auto-generate-sql-write-number=100000 --csv=/tmp/
innodb-auto.csv --engine=blackhole,innodb

So 10K of rows, with increasing numbers of threads making
modifications. Unlike a lot of tests I do this is a growth test...
aka as a site would grow this is what you would see. Normally when I
look at how SMP machines handle problems I am more interested in
seeing how it would split the problem with more threads and more CPU.
In this case I wanted to look at just how the database would behave
as a site grew.

The test I am looking at is primary key focused, and its all updates.
So a read is occurring and a write follows up.

As you can tell from the graph, the overhead, as seen via blackhole,
is nearly identical in cases where UUID() was used and Autoincrement
was used. The only issue I ran into was that Innodb started getting
deadlocks when I cranked up the connections to 300 on my quad PPC
running Ubuntu. I am going to play with this a bit further and see if
Innodb would have similar issues on a dual processor. The binlog was
not enabled during any of these tests so the row locks that we see
for updates when running with statement based replication won't be
seen (but I really want to see how that effects things!).

Just to confirm results I am going run the same test but just do key
lookups and see if any difference pops up. I am doubting it will but
you never know until you try :)
Update Auto vs GUID Innodb.jpg

Link | Leave a comment |

Comments {32}

(no subject)

from: erikwett
date: Mar. 11th, 2007 02:39 pm (UTC)

I have had some serious performance problems with GUID's in MS SQL Server environment.

The scenario was a log table with GUID as the primary key and 3-4 secondary indexes. Since SQL Server by default clusters a table on the primary key and the secondary indexes contain the primary key values the result was that indexes took very much space.

I reorganized the table with a numeric auto-generated key instead and it shrunk by about 50%, which improved throughput a lot. Since the table was a log table with no need for global id's this was not a problem.

I believe InnoDb use the same structure with tables clustered on the primary key, which would make this relevant. Are there any secondary indexes in your table?


Reply | Thread

Brian "Krow" Aker

(no subject)

from: krow
date: Mar. 11th, 2007 04:53 pm (UTC)

No, this test was just about primary key lookup performance. From your description I would say that yes the two look similar in design.

It was a log table? I take it that you couldn't create a composite key from multiple attributes?

But no, this was not about secondary indexes, just about primary. The customer that got me thinking about this just uses primary indexes, not secondary.

Reply | Parent | Thread

(no subject)

from: jamesd
date: Mar. 12th, 2007 05:18 am (UTC)

InnoDB shares the issue. Designing for cache efficiency matters a lot if you're after the best performance.

Reply | Parent | Thread

Абу Антось

(no subject)

from: syarzhuk
date: Mar. 25th, 2007 01:24 am (UTC)

Why not cluster on datetime column?

Reply | Parent | Thread