? ?

Drizzle, Optimizer, Execution Flow

« previous entry | next entry »
Sep. 21st, 2009 | 09:29 am

The present in my RSS feed this morning was an article by Peter Zaitsev on :

One of the changes made recently to Drizzle, is a redesign of our executioner. This was done by Padraig. MySQL inherited a design where the parser uses one Global Lex Structure to fill in the members of the query for later use. This structure has members in it for every form of query that can be executed. The structure is also assigned an ENUM that will be used later for determining if the query is a SELECT, INSERT, etc... via a switch/case dispatcher.

This of course creates a number of limitations in the design, especially for Drizzle since we focus on micro-kernel over monolithic design. What Padraig completed recently was work that took apart the switch/case dispatcher and replaced it with an object executioner. Each of the query types, SELECT, INSERT, etc,... are now assigned to a Statement object which are assigned in the parser on the top level. We are now focusing on removing the Global Lex Structure and placing its members into the Statement execution objects.

This gives us a smaller footprint parsing and it makes the entire system pluggable in that we can now truly break down the parser in ways that we couldn't before. Before to extend the parser you would have to grow the global lex structure for any extension to the SQL syntax/execution functionality. The global lex structure has a very large footprint and you have to weigh the cost of any new feature vs growing the size of the global structure. Setting the default values for this alone is fairly expensive (memset() is not your friend).

It also meant that for someone who wanted to write a new execution path, that had to modify the switch/case executioner and fiddle with all of the code along the path.

Now in Drizzle? Anyone who wants to write a new execution path can focus on just a single class. This class is generated in the parser and encapsulates all of the logic for a given statement. BTW these Statement objects are not limited to the SQL parser, so any of the new methods for Connection can generate one as well (without going to deep here, what I mean is the work that Eric Day has been doing that will allow us to speak native HTTP/etc will now be able to use direct REST parsers... so SQL is just one method of execution among many).

This move to Statement gives us a design which is less complex then what we did in the past and it allows a developer who wants to extend the database to focus on just what they want to work on. Hell, maybe we should put together a talk for the East Coast No-SQL conference just to make a point about flexibility around SQL vs No-SQL designs.

So why did I open up this article up with a reference to Peter's article?

Just as the switch/case executioner existed for all queries, a similar one existed for the optimizer. About two months ago we started straightening it out so that we could do the same to it as we have been doing to the main switch/case dispatcher. From the top down we are able to use similar designs at each connecting point in the database.

The gain is? Just as our Statement Executioner cracks the door open to the main executioning at the core of Drizzle being pluggable, by reuse of design the same will be true of our join optimizer (!!!). Once we are complete, our Join Exectioner will be extendable by just adding in new plugins. Our plans are moving along, and the pluggable parser is on our horizon. The goal will be a set of classes that can be extended so that the Join Executioner can be easily extended as well.

When I read articles like Peter's, and Peter is one of the experts in tuning MySQL, it makes me pretty excited to see validation in the concepts we hold dearly. I'm not sure that we will pull off exactly what Peter wants longterm, time will only tell, but I do see that the concepts that we are pushing are shared by others as well.

Link | Leave a comment | | Flag

Comments {3}

Parse / Compile / Optimize / Execute

from: anonymous
date: Sep. 21st, 2009 09:01 pm (UTC)

Mighten it make sense to use the classical formula of parse (take apart the language), compile (aka semantic analysis -- name resolution and the like), optimize (figure how to actually do it), and execution (do it!)?

Separating the steps makes the code simpler, easier to extend, more moduler, easier to maintain, and promotes world peace.

Parsers parse! Compilers compile! And the execution engine executes!

-- Jim Starkey (who else?)

Reply | Thread

Brian "Krow" Aker

Re: Parse / Compile / Optimize / Execute

from: krow
date: Sep. 21st, 2009 10:53 pm (UTC)

You mean classical bytecode engines? No I don't buy into that design. I would rather use objects with standard interfaces.

I'm fine with parsers buildings objects and having those objects executed, bytecode executioners? I think we can skip them at this point. You still multi-stage the entire thing, aka parse, rewrite, resolve,... join optimize, execute engine, but you don't need to break it down into bytecode to do that.

Reply | Parent | Thread

Re: Parse / Compile / Optimize / Execute

from: anonymous
date: Sep. 24th, 2009 01:26 am (UTC)

Bytecode may not be requied. but what about preparting a execution plan (just like Oracle does) at the end of the parse phase. which can be easly utilized for the optimizations.
The Execution engine can be something which takes a optimized execution plan.

Reply | Parent | Thread