Log in

No account? Create an account

Many CPU's, Moore's Law, Scaling

« previous entry | next entry »
Feb. 21st, 2007 | 06:14 pm

One of my current interests is looking for bottlenecks in MySQL with
regards to the use of Multiple CPU usage.

I noticed Tim's comments yesterday on this topic:

What I see right now is that for the first time in years commodity
scaling with multiple CPU's is becoming a reality. What does this mean?

For years we have been scaling out, adding more hosts. Every hosts
increases the total cost of ownership. You can do this with two
processor hosts, but this is not all that interesting. Scaling for
two processors is quite easy, its when you get more then two that the
problem becomes interesting (Slashdot has had four processor machines
for its databases since 2001, but they were 32bit so the memory at
that time was really too limiting to really make use of the machines
(they were IO bound)).

Many core/cpu systems mean that for the first time in years we can
scale horizontally. Yes, we can increase the density via a single
computer instead of many computers. This is sexy from the stand point
of hosts that can handle a lot of memory (though for the Google's of
the world it would be interesting to know how density/power really
works out).

My recent keynote at the Vancouver PHP Conference talked about this.
Partition, route, batch... do whatever you can to break your problem
down so that it can be scaled. (http://krow.net/talks/ScalingVancouver2007.pdf).

What this means is that Moore's law may become irrelevant. A faster
CPU is only so-so interesting, a many core CPU is far more
interesting. If you think about the evolution of the processor, this
is a big event for performance.

I'm attaching a graph for current scaling. I've got one G5 running
Ubuntu at the moment which has 4 CPU's of processing power. The graph
shows a heavy INSERT load with no storage engine that locks/does IO
attached to MySQL. The load being generated is quite artificial, it
would mean a higher concurrency INSERT then what is almost ever seen
(the exception is MySQL Cluster, but that is an entirely different
ball game).

With MySQL there are two real killers for making use multiple CPU.

First killer is that all our DDL is not online. We are working to
resolve some of this in 5.1, but we have a ways to go.

The second killer is the storage engine you pick. Much of the locking
that goes on is all engines specific. If the engine scales well on
multiple CPU then you have a win. If it does not... then you probably
will want to pick another engine if you want to really make use of
multiple CPU environments.

The right storage engine should also be able to handle modifications
we are making to the kernel of the server to do online DDL.

The good news is that all of the storage engine authors are very
aware of this :)
Basic Thread Performance.jpg

Link | Leave a comment |

Comments {4}

(no subject)

from: jamesd
date: Feb. 22nd, 2007 12:47 pm (UTC)

Looking at MyISAM, how you make an engine that does a full table lock to update something fast with multiple CPUs is an interesting problem. I'd say inherent design limitation but there are already some workarounds (delayed inserts, concurrent inserts). :)

Looking at InnoDB, it now has a fair amount of granularity but it's still exposed, with the adaptive hash index updating looking like the most significant current CPU-limited liability. Worse, its background processes (delete purging, insert buffer merging, possibly dirty page flushing) are written for a single drive so they can fall behind on powerful disk systems and we've had powerful disk systems for a long time already and this is easy to fix but hasn't been done (yet).

Looking at others... it's still too soon to say much about their in-service concurrency limitations.

One certainty: MySQL as a whole is going to get better at doing fine-grained locking instead of coarse-grained locking.

Reply | Thread

Brian "Krow" Aker

(no subject)

from: krow
date: Feb. 22nd, 2007 06:39 pm (UTC)

MyISAM uses a row level lock with reader access on the tip of file, so for inserts it works just fine for multiple CPU. One possibility to solve updates and deletes is to scavenge the records by using a bit to say what the state of the record is.

Innodb is being worked on, Falcon was designed to work in environments with multiple CPU and Cluster is more scale out. Archive is row level insert with versioned reading so is scales beautifully with multiple CPU.

Reply | Parent | Thread

Online DDL?

from: nikhilim
date: Feb. 27th, 2007 07:46 pm (UTC)

Could you please elaborate on what you mean by 'online DDL'?

Reply | Thread

Brian "Krow" Aker

Re: Online DDL?

from: krow
date: Feb. 28th, 2007 05:30 am (UTC)


Alter table today block others from fully using the table while they are being accessed... some actions can be performed but the big ones like ADD INDEX or ADD COLUMN all block.

Reply | Parent | Thread