Log in

No account? Create an account

Giuseppe's Article on Partitioning

« previous entry | next entry »
Dec. 18th, 2007 | 05:10 pm

Just finished reading this:

His comments on partitioning affecting Archive are dead on. If you can keep it from scanning then it works better (though if all engines have to scan it normally is a big factor of improvement over the other engines).
Archive's compression cannot do much with a single column table like he is using, for that it has to do too much effort (and while there is optimizations for this, I would rather just write a column oriented store some day to solve these sorts of problems).

It does make me wondering how those tests would come out with 6.0. He has a multi-processor box, and in 6.0 Archive uses multiple threads for IO to achieve a bit better performance. I don't have any good benchmarks right now, so I am not really sure.

What is in the back of my head when I think of Partitioning and Archive?

I wonder if I should not just pull it natively into the engine since I could do more with it there. The partition engine is just a simple wrapper around the Engine interface. Most of the hard logic is in the optimizer work that Sergey Petrunia did to allow pruning, so it is not a ton of work to support it natively.

Pushing down the pruning information would allow me to do more with parallel IO (and would allow me to make more operations online).

No time at the moment to work on this, but it is something to think about.

Link | Leave a comment |

Comments {4}

Tables have more than one column

from: datacharmer
date: Dec. 19th, 2007 06:34 am (UTC)


Thanks for your comment.

The one-column tables are only used to explain how partitioning works. The measurement are done on a more complex table. The article has a link to a MySQL Forge page, containing the instructions to build the sample, including the data structure.


create table flightstats (
AirlineID int not null,
UniqueCarrier char(3) not null,
Carrier char(3) not null,
FlightDate date not null,
FlightNum char(5) not null,
TailNum char(8) not null,
ArrDelay double not null,
ArrTime datetime not null,
DepDelay double not null,
DepTime datetime not null,
Origin char(3) not null,
Dest char(3) not null,
Distance int not null,
Cancelled char(1) default 'n',
primary key (FlightDate, AirlineID, Carrier, UniqueCarrier, FlightNum, Origin, DepTime, Dest)

I did some limited testing with 6.0 as well, but only with 2 GB tables (17 million records) and I did not notice significant differences. I will keep your remarks in mind for a new batch of tests using 6.0.


Reply | Thread

Brian "Krow" Aker

Re: Tables have more than one column

from: krow
date: Dec. 19th, 2007 06:58 am (UTC)

How well did the tables compress?

Reply | Parent | Thread

Re: Tables have more than one column

from: datacharmer
date: Dec. 19th, 2007 07:04 am (UTC)

There is one table noting engines, records, sizes, and loading times in the article.
The archive table is 1.8 GB compared to 8 GB of uncompressed MyISAM

Reply | Parent | Thread

Brian "Krow" Aker

Re: Tables have more than one column

from: krow
date: Dec. 19th, 2007 07:24 am (UTC)

One comment on your article. The numbers are hard to understand in your explanation (what do they mean? should they be large or small?).

Reply | Parent | Thread