Drizzle, InfiniDB, Column Oriented Storage

« previous entry | next entry »
Nov. 3rd, 2009 | 09:44 am

I have been asked a number of times "do you think there is a need for a column oriented database in the open source world?"

The answer has been yes!

Users and vendors have asked me this question a number of times. The problem has been most of the vendors were interested in creating closed source solutions around either Drizzle/MySQL, or, did their efforts in a way that made serious modifications to the backend (aka... made poor use of the storage engine interface).

For these reasons I have not really found myself all that thrilled to work with what has been out there. Also, I would often find that the commitment to open source was either luke warm or "we will do it, once we have some traction...".

My response to that? "Tell me more when you open source it. I'll see if it will work."

For this reason I was very happy to see Calpont do their release of Infinidb last week.

So as of this weekend?

We have a project to use their engine with Drizzle. Infinidb makes use of the storage engine interface I worked on for MySQL which is a subset of the interface we have built for Drizzle. We have had several engines ported already, but this will be the first column oriented engine we will have ported to Drizzle.

Building in different engines beyond the basic transactional engines is fun, because we get to see how the design stretches to fit additional needs. The core of Drizzle stays the same, but the micro-kernel nature of our design allows for others to expand the reach of where Drizzle can be used. Padraig started working on the engine on Friday and had it loading by the end of the weekend.

It should be fun to see what additional enhancements we can do out of the box with Infini engine :)

Link | Leave a comment | Add to Memories | Share

Comments {11}

Поисковик-затейник

(no subject)

from: itman
date: Nov. 3rd, 2009 06:07 pm (UTC)
Link

That is really cool. Thank you for the information!

Reply | Thread

Dossy

(no subject)

from: dossy
date: Nov. 3rd, 2009 06:10 pm (UTC)
Link

Thanks for the InfiniDB release tip! This is fantastic news - at MySQL Camp II, I tried to get a then-current release of C-Store working as a storage engine under MySQL with Chad Miller, but discovered it wasn't quite suited for such integration at the time.

Reply | Thread

Brian "Krow" Aker

(no subject)

from: krow
date: Nov. 4th, 2009 12:10 am (UTC)
Link

If you are interested in hacking on this go for it. For the moment it is likely to stay a community supported bit for a while.

Reply | Parent | Thread

Dossy

(no subject)

from: dossy
date: Nov. 4th, 2009 02:43 am (UTC)
Link

I would love to, but at the moment I have more work than I can handle - can't really justify spending time hacking on f/OSS stuff at the moment.

Reply | Parent | Thread

Column store

from: anonymous
date: Nov. 3rd, 2009 08:22 pm (UTC)
Link

Something I have learned by speaking to people like Daniel Abadi is that a column store isn’t just a storage solution. For it to be truly effective awareness has to exist in the query processor. For example, compression is an important benefit of the columnar layout. But to get good compression with good performance you use lightweight compression, having the optimizer aware of this and not require de-compression prior to query resolution has a significant benefit to scalability & performance.

So I am just wondering how effective a columns store can actually be using the storage engine interface? Or are you making changes up the stack also?

Reply | Thread

Brian "Krow" Aker

Re: Column store

from: krow
date: Nov. 3rd, 2009 08:26 pm (UTC)
Link

You need to have support in the upper layer to make some of this truly effective. There is a concept of providing a virtual table for "results" that allow joins/etc to be processed at a lower layer. Whenever possible, you use this interface.

Reply | Parent | Thread

Re: Column store

from: jtommaney
date: Nov. 4th, 2009 10:08 pm (UTC)
Link

Yes, InfiniDB uses a virtual table concept when handing off results to the upper layer. Column storage allows for deferring I/O for columns until needed to satisfy the next operation, whether it is a column filter, a hash join, a complex expression, or for projection. This just-in-time I/O would not be possible with the standard table-oriented storage engine interface that deals with rows.

This virtual table concept also is needed to allow joins and aggregation to be processed within the underlying engine.

Reply | Parent | Thread

How does it compare to Infobright Community Edition

from: anonymous
date: Nov. 4th, 2009 02:34 pm (UTC)
Link

How does InfiniDB compare to Infobright? Infobright (community and enterprise edition) has been available for a while. It provides column-based storage with a MySQL interface. The community edition differs from the enterprise edition of Infobright in that it does not support DML statements (update, delete, insert)

Reply | Thread

Brian "Krow" Aker

Re: How does it compare to Infobright Community Edition

from: krow
date: Nov. 4th, 2009 05:09 pm (UTC)
Link

I don't really know. Infobright has been out there for a while and I generally hear positive things about it.

Reply | Parent | Thread

Re: How does it compare to Infobright Community Edition

from: anonymous
date: Nov. 4th, 2009 06:41 pm (UTC)
Link

Hi -

While at MySQL, I was Infobright's #1 fan and wrote the first article about it on the MySQL dev zone. IB's a very good product.

Where we and IB are the same: column-oriented design, MySQL front end, and no indexing necessary.

What IB does better than us: They have very good storage compression and when they can satisfy a query from the KnowledgeGrid they always have fast response times.

Where we are different than IB: Our open source edition offers a high-speed bulk loader, is ACID transaction compliant, does crash recovery, is multi-threaded (we use more than 1 CPU/core for a query), supports DML, provides hash joins, and supports MVCC. We also plan on having a paid MPP scale-out option that allows you to do parallel processing across multiple nodes.

Hope this helps.

--Robin

Reply | Parent | Thread

Re: How does it compare to Infobright Community Edition

from: jtommaney
date: Nov. 4th, 2009 09:48 pm (UTC)
Link

At a (very) high level, Infobright offers excellent compression and a materialization/meta-data layer that is able to answer some queries without running any database operations. InfiniDB Community Edition offers parallel processing of column scan, hash join, and aggregation operations. Both offer column storage that can significantly reduce I/O costs for many data warehouse queries.

Reply | Parent | Thread