?

Log in

No account? Create an account

Parallel Processing, MySQL, Map and Reduce

« previous entry | next entry »
Jun. 11th, 2007 | 09:18 am

Catching up over the weekend I noticed Tim's
comment
on parallel processing, and followed that over to Nat's
comments about threads.

What does this mean to MySQL? Over a year ago we launched a multi-
concurrency tool, mysqlslap, so that we could start to take apart
the server under high loads. At the time we had five different tools
internally, all written in different languages, all not supplied with
the main binary, and all were hard to use. From this we have been
able to find bugs like the Innodb autoincrement lock (which Heikki is
actively working on), and the lock around temp tables (which Monty is
working on). Its also been good for us to take apart new storage
engines like Falcon and find concurrency issues with them before they
are released.

Monty also rewrote our thread to connection architecture in 5.1 to
enable us to write future patches to enable higher end concurrency.
This will come in handy for the sites trying to break the two
thousand concurrent connections limit.

All of this is the tip of iceberg. On the MySQL forums we have people
taking a part the federated engine and trying to paralyze it to
breakup query loads. There are a couple of projects trying to make a
"proxy" of sort to handle scale out (two big hints... proxy needs to
reside locally, and if Cisco can barely make stateful packet
inspection work, parsing SQL queries on your own is a dead end).
There are also a number of project popping up to do sharding.

On the state of the technology, it is interesting to see new
languages being mentioned. I am still using pthreads with C at this
point. I've not seen anything that really gives me the portability
and performance that I am after other then this combination. Long ago
I had hopes that Perl, maybe PHP might evolve this way. No matter how
good Perl's threading becomes, CPAN is filled with modules which are
not thread safe. I really like the concept though of a weakly typed
language providing an easy framework to make use of multiple threads.

I have no opinion on Hadoop at this point. A year or so ago I wrote "Serial"
which is my own solution. I've been thinking about abandoning my
solution for Brad's Gearman.
I've only got one user using mine, and when I can make use of other
people's work, I do :)

It is interesting to note, that no one is discussing 64bit. I
believe that it is not enough just to notice that we now have
multicore chips, I believe you need to look one step further and
realize that we have a 64bit memory space. Up times being what they
are, you can now consider putting a lot of data into memory, or at
least mapping it that way, and acting on the data in a flat memory
space. I don't believe enough developers have really put this to
their advantage yet.

Link | Leave a comment | Share

Comments {4}

64 bit address space

from: epaulson
date: Jun. 11th, 2007 07:47 pm (UTC)
Link

Do you really think that people are OK with just using a 64 bit number as their handle to data?

As a programmer, I'm just so sick of having to twiddle every byte of memory in C whenever I want to do something simple like munging a bunch of strings. I just don't want to know where in the 32 bit address space anything is. I sure as hell don't want to have to build all the auxiliary structures to help me keep track of where in a 64 bit address space everything is.

If you're going to treat the 64 bit address space as something shared, especially across many processors, you're either relying on the virtual memory subsystem of the OS to be very clever, or you've hacked your own system to let you pagefault across the network to other machines. Either way, I think if you've already built a higher-level abstraction to data like the Perl runtime or JVM, you'd be better off managing the shared memory yourself, and not relying on the OS to do it for you.

Reply | Thread

peter_zaitsev

Auto-inc bug

from: peter_zaitsev
date: Jun. 12th, 2007 06:19 pm (UTC)
Link

As far as I remember Auto-inc bug as known long before mysqlslap but was not getting traction for very long time.
Regarding tmp table bug if it is the same one I remember the bug being posted by Arjen based on someone SysBench benchmarks.

Do not get me wrong I'm really happy you're doing these benchmarks with mysqlslap and have no issues with the tool but putting it like before nothing could ever be discovered is giving wrong impression.

Reply | Thread

Brian "Krow" Aker

Re: Auto-inc bug

from: krow
date: Jun. 12th, 2007 11:52 pm (UTC)
Link

We know these issues exist, but knowing and coming up with a "this is how you repeat, graph it, show it...", has always been the issue.

Talking about problems doesn't get us much. What the tool is providing is a way for us to repeat the problem in a simple way that any developer can then work on it (and in the case of what we are seeing from customers, they are reducing difficult problems down to a couple of lines... which is quite nice).

Reply | Parent | Thread

awfief

not so sure about that....

from: awfief
date: Jul. 6th, 2007 07:21 pm (UTC)
Link

(two big hints... proxy needs to reside locally, and if Cisco can barely make stateful packet inspection work, parsing SQL queries on your own is a dead end)

Rippletech seems to have done it quite well....www.rippletech.com

Reply | Thread