## Drizzle, Regression, STL Bitset

#### « previous entry | next entry »

**May. 8th, 2009 | 05:15 pm**

So how is the regression issue coming along?

Glad you asked!

Earlier in the week Jay found that the bitset patch gave us significant regression. This particular patch was a move from using a built in bitmap system MySQL had, to using the bitset found in the standard template library.

Since then we have been able to find issues with it, but no conclusion on how to solve it via a new design/container/etc.

So what are we doing?

Earlier this week I refactored the interface to the Table objects to create, well, an interface!

It is not perfect but it encapsulated a large chunk of the code. Monty today put back into the original MY_BITMAP but did so behind this interface. Jay has run the numbers and declared that the regression can no longer be found.

What will happen from here?

We are going to work on the interface some more. Basically make it so that we can change the back end to the bitmap without changing a lot of code.

Right now we have a couple of ideas on how to solve the problem (I favor a bool in Field object, Monty wants to look at vector, Mats has a bitvector). We will test each of these and find a solution that gives us in the end better code with no regression.

Solving this issue we had to look at a number of things. Our methods, the outcome of a tree rollback, and if performance in this case mattered. The problem wasn't simple, and all solutions had draw backs. The main thing we were not going to do was push some code which caused regression that we would then "find a solution for in the future". That was not acceptable. Rolling back the tree? We could do this, I favored it if we had no other solution, but we determined that we could patch up the tree without causing this sort of disruption.

And?

Encapsulating the interface gives us room to find a new solution.

So what came out of all of this?

We are moving to a staging tree.

As of now we have an lp:drizzle/staging tree. This tree should not be pulled from. We will send code here before it is sent to trunk. If code fails the performance regression testing then we will pull it back.

So what does this mean? No more pushes from main that bypass staging. We have pointed the automatic regression testing at this tree. I am going to be suggesting that we point Hudson and Buildbot at it as well. If a tree can pass here then it will be moved to the main tree.

And what are the thoughts on regression for the future?

Jay asked me today "what do we mean by regression?". To me we can calculate regression pretty easily. We look at the standard deviation of all previous runs and apply it to the current tree. If we find that we are within norms then the new code is fine (and I suspect we will refine this formula in the future). This was my suggestion.

But what if regression happens and there is an argument for letting it happen?

Then we talk about it on the mailing list. Right now most of us have seen the numbers showing that 5.4 is faster then Drizzle at 16 concurrent connections. We have been looking into this, but we may find that some of the decisions that let us scale out to more connections/processors contributed to this. That is ok. Our target is not the 16 connections sites, it is the sites that need mass numbers of connections/threads/processors. If we find a change that hurts us at 1-N and N is a small number that may be ok.

What will we do when we are confronted by this? We talk abut it on IRC and we will send the information to the mailing list.

More eyeballs is a good thing.

Glad you asked!

Earlier in the week Jay found that the bitset patch gave us significant regression. This particular patch was a move from using a built in bitmap system MySQL had, to using the bitset found in the standard template library.

Since then we have been able to find issues with it, but no conclusion on how to solve it via a new design/container/etc.

So what are we doing?

Earlier this week I refactored the interface to the Table objects to create, well, an interface!

It is not perfect but it encapsulated a large chunk of the code. Monty today put back into the original MY_BITMAP but did so behind this interface. Jay has run the numbers and declared that the regression can no longer be found.

What will happen from here?

We are going to work on the interface some more. Basically make it so that we can change the back end to the bitmap without changing a lot of code.

Right now we have a couple of ideas on how to solve the problem (I favor a bool in Field object, Monty wants to look at vector

Solving this issue we had to look at a number of things. Our methods, the outcome of a tree rollback, and if performance in this case mattered. The problem wasn't simple, and all solutions had draw backs. The main thing we were not going to do was push some code which caused regression that we would then "find a solution for in the future". That was not acceptable. Rolling back the tree? We could do this, I favored it if we had no other solution, but we determined that we could patch up the tree without causing this sort of disruption.

And?

Encapsulating the interface gives us room to find a new solution.

So what came out of all of this?

We are moving to a staging tree.

As of now we have an lp:drizzle/staging tree. This tree should not be pulled from. We will send code here before it is sent to trunk. If code fails the performance regression testing then we will pull it back.

So what does this mean? No more pushes from main that bypass staging. We have pointed the automatic regression testing at this tree. I am going to be suggesting that we point Hudson and Buildbot at it as well. If a tree can pass here then it will be moved to the main tree.

And what are the thoughts on regression for the future?

Jay asked me today "what do we mean by regression?". To me we can calculate regression pretty easily. We look at the standard deviation of all previous runs and apply it to the current tree. If we find that we are within norms then the new code is fine (and I suspect we will refine this formula in the future). This was my suggestion.

But what if regression happens and there is an argument for letting it happen?

Then we talk about it on the mailing list. Right now most of us have seen the numbers showing that 5.4 is faster then Drizzle at 16 concurrent connections. We have been looking into this, but we may find that some of the decisions that let us scale out to more connections/processors contributed to this. That is ok. Our target is not the 16 connections sites, it is the sites that need mass numbers of connections/threads/processors. If we find a change that hurts us at 1-N and N is a small number that may be ok.

What will we do when we are confronted by this? We talk abut it on IRC and we will send the information to the mailing list.

More eyeballs is a good thing.