June 28th, 2007

glasses

Queue Engine, and why this won' likely happen...

After mentioning Blaine Cook's use of MySQL as a queue engine for Twitter I've been pinged about why couldn't one be written for MySQL.

Useful? Hell yes.

Possible? Not Really.

Lets look at the simple case.

CREATE TABLE queue (id serial, message text);

INSERT INTO queue VALUES (message) "This is the first message";
INSERT INTO queue VALUES (message) "This is the second message";
INSERT INTO queue VALUES (message) "This is the third message";

In an application that would use this, the sort of query you would see would be:

SELECT message FROM queue LIMIT 1;

For a queue engine to work, it would always need to respond with the first record in, being the first record out. So you would get:

"This is the first message"

as a response.

For a second call to the select you would get:

"This is the second message"

At no point would the row appear twice. Sounds great? Here is the problem where SQL would come in. Say you had never done any select so all three rows were intact. So you then do a:

SELECT message FROM queue WHERE message LIKE "%third%";

This would cause MySQL to scan the rows. The engine would have no idea what rows were returned and would empty its queue. Even push down conditions could not fully resolve for the engine what rows were actually sent.

Ick!

That part of the behavior really bothers me, and is why I don't think a queue engine really makes sense (though I believe I could be really wrong about this). The behavior of the engine would be radically different from any engine we have seen so far.

I and Monty have talked about adding syntax for years that would allow for a DELETE FROM queue WHERE <> RETURN RESULT (or something similar) that would allow this behavior, but having an engine specialize in it would be a bit different. Most engines would have a problem optimizing for the sort fragmentizing affect this would have.

It might be possible to add a callback from SELECT that would tell the storage engine that the row had been selected for use... but I worry about calling that for each row (though this is a matter adding just one additional call in the handler API).

So what do you think?
glasses

More on queue... Possible Solution...

My blog is blocked to those who have a livejournal account. I set it
up this way years ago because I hate dealing with comment bots that
generate SPAM. This means I get a lot of back channel comments on my
posts (aka email, IM, IRC) that you never actually see on the
site... I don't normally post them but this one from email was pretty
interesting:


The challenging design issues about queuing are about locking and
indexing, not about the API. The API to look at queues and message
should be SQL (so you can use SQL to access the message properties,
the message history, and the payload or do auditing and tracking),
but the API to modify the queue must be something else to preserve
the integrity of the queue.

Oracle Streams AQ does not support data manipulation language (DML)
operations on a queue table or an associated index-organized table
(IOT), if any. The only supported means of modifying queue tables is
through the supplied APIS (which include PL/SQL stored procedures,
Visual Basic using Oracle Objects for OLE, Java Message Service (JMS)
using the oracle.jms Java package and Internet access using HTTP(S).

You can read about Oracle Streams AQ here:
http://download.oracle.com/docs/cd/B19306_01/server.102/b14257/
aq_intro.htm#sthref14

So queuing mechanisms definitely can be written and nicely integrated
with a SQL database. Really ... it's possible ;-)


Another interesting comment came from Jan in IRC. Jan was thinking
we could hack up some SQL and try limiting responses based only
supporting certain index components of the API. Would it work?
Well... only sort of. The "sort of" is the problem I have been
getting all afternoon in comments from people. SQL, at least the core
92 sql that MySQL supports, doesn't have the right semantics.

But!

MySQL supports more then SQL.

http://dev.mysql.com/doc/refman/5.1/en/handler.html

The HANDLER commands don't come up often, there isn't that many COBOL
applications out there that need them (and this is only a partial
joke). But semantics that they provide should work.

I am only partially sure this will work... basically disallow normal SELECT statements, and only allow HANDLER issued commands.

Go and read the Oracle document. Certain aspects of what they are
describing remind me of what Paul is doing with the blob streaming
protocol (which once implemented would make for a good spot to
implement pub/sub and some other out of band communication techniques).