Brian "Krow" Aker (krow) wrote,
Brian "Krow" Aker

XLDB, Vertica, Covering Indexes, Greeenplum

Yesterday was spent at SLAC's "Extremely Large Database Workshop". It was a great collection of vendors, scientists, and people from Google, Ebay, Yahoo, etc.

Vertica's Michaell Stonebraker was present. Ever since I read the C-Store that he wrote I have been wanting to ask him about his thoughts on the differences between covering indexes and column store.

In a column store the values are stored separately, and queries are resolved by searching a particular column. This gives you less data to read when you need particular set of columns, and allows your to compress the Indexes very nicely assuming that the data is not entirely random.

MySQL has covering indexes. When you do a read on an index, and only use the index, MySQL resolves the query by using data in the Index. This is true for all storage engines that keep a complete value in the index. MyISAM is a bit more optimized for this then most engines since it keeps a separate data and index file. This gives it a vector approach to its indexes similar to Vertica.

The covering index has the disadvantage in that it will take more memory then the column oriented store since it will store the location of row. The column store on the other hand, or at least Vertica's implementation, uses its position to know where other values are being stored. So in Vertica a query like:


Would be resolved by lookup into the A column to find rows matching "3". If these were rows 50 and 100, it would then lookup the value of B and C based on positions of 50 and 100. Nice trick, no need to store the value of the row position (which is in MyISAM 8 bytes I believe).

Since in column oriented stores the values are similar you should get better compression then a row storage format like Archive uses. Much slower insert speed then what Archive has though. You trade off read performance for write performance.

Really interesting stuff. I am curious to see what niches Vertica and the column oriented storage engines in MySQL fill.

Wondering why I mentioned Greenplum in the title? Luke Lonergan, Greenplum's Founder and CTO, made a great comment during the Vendor Panel discussion. He was the first to bring up open source on the panel by mentioning that his product is based off of the open source database PostgreSQL. Greenplum has been extending PostgreSQL with closed source enhancements. He announced that within 12 years Greenplum would open source everything they do with PostgreSQL. I do not know if it was the spirit of the moment or just the long term plan for Greenplum to become an open source company, but it was good to hear.
  • Post a new comment


    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded