Brian "Krow" Aker (krow) wrote,
Brian "Krow" Aker

Parallel Processing, MySQL, Map and Reduce

Catching up over the weekend I noticed Tim's
on parallel processing, and followed that over to Nat's
comments about threads.

What does this mean to MySQL? Over a year ago we launched a multi-
concurrency tool, mysqlslap, so that we could start to take apart
the server under high loads. At the time we had five different tools
internally, all written in different languages, all not supplied with
the main binary, and all were hard to use. From this we have been
able to find bugs like the Innodb autoincrement lock (which Heikki is
actively working on), and the lock around temp tables (which Monty is
working on). Its also been good for us to take apart new storage
engines like Falcon and find concurrency issues with them before they
are released.

Monty also rewrote our thread to connection architecture in 5.1 to
enable us to write future patches to enable higher end concurrency.
This will come in handy for the sites trying to break the two
thousand concurrent connections limit.

All of this is the tip of iceberg. On the MySQL forums we have people
taking a part the federated engine and trying to paralyze it to
breakup query loads. There are a couple of projects trying to make a
"proxy" of sort to handle scale out (two big hints... proxy needs to
reside locally, and if Cisco can barely make stateful packet
inspection work, parsing SQL queries on your own is a dead end).
There are also a number of project popping up to do sharding.

On the state of the technology, it is interesting to see new
languages being mentioned. I am still using pthreads with C at this
point. I've not seen anything that really gives me the portability
and performance that I am after other then this combination. Long ago
I had hopes that Perl, maybe PHP might evolve this way. No matter how
good Perl's threading becomes, CPAN is filled with modules which are
not thread safe. I really like the concept though of a weakly typed
language providing an easy framework to make use of multiple threads.

I have no opinion on Hadoop at this point. A year or so ago I wrote "Serial"
which is my own solution. I've been thinking about abandoning my
solution for Brad's Gearman.
I've only got one user using mine, and when I can make use of other
people's work, I do :)

It is interesting to note, that no one is discussing 64bit. I
believe that it is not enough just to notice that we now have
multicore chips, I believe you need to look one step further and
realize that we have a 64bit memory space. Up times being what they
are, you can now consider putting a lot of data into memory, or at
least mapping it that way, and acting on the data in a flat memory
space. I don't believe enough developers have really put this to
their advantage yet.
  • Post a new comment


    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded