We finally found the regression problem in Drizzle that we have been looking for over the last couple of months.
In the processes of doing this was have walked every line of code. I sat the other night doing a single step through the entire sysbench run looking for anything out of place. Nothing came up at all.
Eric was the person who finally asked the question "could it be tcmalloc"? No one had assumed it because it typically is a good solution (and we will be looking into why it turned out to be at fault, we will probably push now to more aggressively remove the MEMROOT system we inherited since we suspect it/it doesn't play well with C++).
We have not been able to push any patches in the last couple of months that really fixed other performance issues that we know exist.
Why?
Because we feared complicating the problem of finding the original problem. We have all spent time looking through our ancestors to see if there was something we missed.
1) Could it be C++?
2) Could reducing the number of locks, creating a traffic jam around a single lock?
3) Was UTF-8 at fault?
In the end it was none of these :)
So for us?
We have patches coming soon to optimize the UTF-8 system, to minimize LOCK_open, optimize/simplify the THR lock system simpler, and to partition caches internally.