?

Log in

No account? Create an account

XLDB, Vertica, Covering Indexes, Greeenplum

« previous entry | next entry »
Oct. 26th, 2007 | 09:49 am

Yesterday was spent at SLAC's "Extremely Large Database Workshop". It was a great collection of vendors, scientists, and people from Google, Ebay, Yahoo, etc.

Vertica's Michaell Stonebraker was present. Ever since I read the C-Store that he wrote I have been wanting to ask him about his thoughts on the differences between covering indexes and column store.

In a column store the values are stored separately, and queries are resolved by searching a particular column. This gives you less data to read when you need particular set of columns, and allows your to compress the Indexes very nicely assuming that the data is not entirely random.

MySQL has covering indexes. When you do a read on an index, and only use the index, MySQL resolves the query by using data in the Index. This is true for all storage engines that keep a complete value in the index. MyISAM is a bit more optimized for this then most engines since it keeps a separate data and index file. This gives it a vector approach to its indexes similar to Vertica.

The covering index has the disadvantage in that it will take more memory then the column oriented store since it will store the location of row. The column store on the other hand, or at least Vertica's implementation, uses its position to know where other values are being stored. So in Vertica a query like:

   SELECT A, B, C FROM FOO WHERE A=3;


Would be resolved by lookup into the A column to find rows matching "3". If these were rows 50 and 100, it would then lookup the value of B and C based on positions of 50 and 100. Nice trick, no need to store the value of the row position (which is in MyISAM 8 bytes I believe).

Since in column oriented stores the values are similar you should get better compression then a row storage format like Archive uses. Much slower insert speed then what Archive has though. You trade off read performance for write performance.

Really interesting stuff. I am curious to see what niches Vertica and the column oriented storage engines in MySQL fill.

Wondering why I mentioned Greenplum in the title? Luke Lonergan, Greenplum's Founder and CTO, made a great comment during the Vendor Panel discussion. He was the first to bring up open source on the panel by mentioning that his product is based off of the open source database PostgreSQL. Greenplum has been extending PostgreSQL with closed source enhancements. He announced that within 12 years Greenplum would open source everything they do with PostgreSQL. I do not know if it was the spirit of the moment or just the long term plan for Greenplum to become an open source company, but it was good to hear.

Link | Leave a comment |

Comments {5}

12 years, or 1-2?

from: epaulson
date: Oct. 26th, 2007 08:14 pm (UTC)
Link

Did you mean 1 to 2 years, or 12 years, for Greenplum's open source plan?

Reply | Thread

Re: 12 years, or 1-2?

from: jbecla
date: Oct. 27th, 2007 01:33 am (UTC)
Link

12, vendor representatives were responding to a question 'where will your company be in 12 years'

Reply | Parent | Thread

Brian "Krow" Aker

Re: 12 years, or 1-2?

from: krow
date: Oct. 27th, 2007 01:46 am (UTC)
Link

It was 12.

Reply | Parent | Thread

Re: 12 years, or 1-2?

from: snarky111
date: Nov. 25th, 2007 01:38 pm (UTC)
Link

I suggest that we can consider this a very conservative statement. I'd be surprised if they weren't:

- column store engine enabled

and

- open source

within 12 months :-)

-- Snarrrrrky

Reply | Parent | Thread

Brian "Krow" Aker

Re: 12 years, or 1-2?

from: krow
date: Nov. 25th, 2007 04:37 pm (UTC)
Link

Maybe, the engines are pretty niche.

Reply | Parent | Thread