Brian "Krow" Aker (krow) wrote,
Brian "Krow" Aker
krow

Many CPU's, Moore's Law, Scaling

One of my current interests is looking for bottlenecks in MySQL with
regards to the use of Multiple CPU usage.

I noticed Tim's comments yesterday on this topic:
http://radar.oreilly.com/archives/2007/02/a_fast_waitfree_1.html

What I see right now is that for the first time in years commodity
scaling with multiple CPU's is becoming a reality. What does this mean?

For years we have been scaling out, adding more hosts. Every hosts
increases the total cost of ownership. You can do this with two
processor hosts, but this is not all that interesting. Scaling for
two processors is quite easy, its when you get more then two that the
problem becomes interesting (Slashdot has had four processor machines
for its databases since 2001, but they were 32bit so the memory at
that time was really too limiting to really make use of the machines
(they were IO bound)).

Many core/cpu systems mean that for the first time in years we can
scale horizontally. Yes, we can increase the density via a single
computer instead of many computers. This is sexy from the stand point
of hosts that can handle a lot of memory (though for the Google's of
the world it would be interesting to know how density/power really
works out).

My recent keynote at the Vancouver PHP Conference talked about this.
Partition, route, batch... do whatever you can to break your problem
down so that it can be scaled. (http://krow.net/talks/ScalingVancouver2007.pdf).

What this means is that Moore's law may become irrelevant. A faster
CPU is only so-so interesting, a many core CPU is far more
interesting. If you think about the evolution of the processor, this
is a big event for performance.

I'm attaching a graph for current scaling. I've got one G5 running
Ubuntu at the moment which has 4 CPU's of processing power. The graph
shows a heavy INSERT load with no storage engine that locks/does IO
attached to MySQL. The load being generated is quite artificial, it
would mean a higher concurrency INSERT then what is almost ever seen
(the exception is MySQL Cluster, but that is an entirely different
ball game).

With MySQL there are two real killers for making use multiple CPU.

First killer is that all our DDL is not online. We are working to
resolve some of this in 5.1, but we have a ways to go.

The second killer is the storage engine you pick. Much of the locking
that goes on is all engines specific. If the engine scales well on
multiple CPU then you have a win. If it does not... then you probably
will want to pick another engine if you want to really make use of
multiple CPU environments.

The right storage engine should also be able to handle modifications
we are making to the kernel of the server to do online DDL.

The good news is that all of the storage engine authors are very
aware of this :)
Basic Thread Performance.jpg

Subscribe
  • Post a new comment

    Error

    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 4 comments