Log in

No account? Create an account

Drizzle, Views And Triggers

« previous entry | next entry »
Jul. 15th, 2009 | 01:15 pm

I've been trading twitter responses back and forth with jswanhart of Kickfire recently on the state and planning of Views in Drizzle.

In Drizzle right now we do not have views. There are plans to add views which never "materialize", but that is still a couple of milestones off.

One of the problems when talking about views is that the word "materialize" has been over used.

To "materialize" a view, means that you take the view definition, turn it into a temporary table, and then join it against a query. In Drizzle we consider this a "no no".

Why? Because any query that does this is more then likely going to turn out to be to slow to be of any real use in our domain space. The same problem exists with subqueries though for the time being we are leaving them in. Subqueries at this point are only semi-useful. Many of the common methods of using them result in a materialized table, which just makes them too slow. Our plans involve eliminating these sorts of queries and only allowing ones which can be "merged".

This will be limiting, but experience shows that the current state of use is that most users don't want to be able to write a query that will under perform (unless I am wrong, Google has disabled the subquery system in their version of MySQL just to make sure no developer gets anywhere near this part of the system).

Frankly, I would like to take this one step further at some point, and eliminate cartesian joins unless they are explicitly requested. How often do you really want one? Pretty much never. Typically they are done by accident and you either wait out the mistake or start hitting "CTRL-C" hoping to kill the query before it eats up the machine.

So "materializing" is out, but what is not out is "materialized views". This is an SQL feature whereby a table is created and is updated based on entries from one or more tables, either syncronously or asycronously. This is something I do expect us to support at some point.

One of the strengths in the current design is that the replication system lives as a service to the micro-kernel.

What does this mean?

It means that we internally have triggers on any sort of DDL/DML event that occurs in the server. As soon as someone wants to write up some parser goodness we can have basic triggers in whatever language you want.

We also gain materialized views via the same method. The "tires" on this project have been kicked around but no one has started it yet.

Having everything built on the same entry point to the micro-kernel really simplifies the design of a lot of different components. We have someone doing the design on direct to memcached replication for instance (we will have UDF as well, but this will simplify cache coherency and not require you to modify any SQL to make it work).

Materialized Views are pretty cool, materializing a view though pretty much make you want to cry.

BTW It was just pointed out to me that Justin had some of his own thoughts about this.

Link | Leave a comment |

Comments {26}

Brian "Krow" Aker

(no subject)

from: krow
date: Jul. 16th, 2009 03:11 pm (UTC)

The FROM subquery clause raises my concern as well... I've not really made up my mind on it at this point. We have left subqueries alone for the moment... I am not sure when we can get around to tackling them.

On a different note, as fas as column=const goes with indexes... requiring users to create indexes is a complete hack, long term we need to figure out how to do performance and not require this (and I don't mean just go the direction of using column stores and sacrifice write performance).

Reply | Parent | Thread

Justin Swanhart

(no subject)

from: swanhart
date: Jul. 17th, 2009 11:50 pm (UTC)

Are there plans to add hash joins to Drizzle?

The inline view (subquery in the from clause) is very convenient, and powerful, but suffers from the problem that the subquery often must be materialized with no indexes. Even if indexes could be added automatically, cardinality of the joined columns can be bad anyway.

To really properly support inline views which must be temporarily materialized I think you need a better join algorithm when indexes are not available. Of course, hash join only works for efficiently for equijoins and antijoins, and sometimes inline views are outer joined.

Or perhaps build bitmaps on the joined columns (ideally compressed bitmaps ala fastbit) as bitmap lookups have shown to be as efficient as btree index lookups and handle low cardinality data far better.

Reply | Parent | Thread

Brian "Krow" Aker

(no subject)

from: krow
date: Jul. 18th, 2009 03:18 am (UTC)

I want to finish some refactoring first on the JOIN class before I get into do a hash join. Right now there are just a couple of things that I want to do there... that being said I'd like to see someone else step and spend sometime working on it.

Reply | Parent | Thread