Brian "Krow" Aker (krow) wrote,
Brian "Krow" Aker

Queue Engine, and why this won' likely happen...

After mentioning Blaine Cook's use of MySQL as a queue engine for Twitter I've been pinged about why couldn't one be written for MySQL.

Useful? Hell yes.

Possible? Not Really.

Lets look at the simple case.

CREATE TABLE queue (id serial, message text);

INSERT INTO queue VALUES (message) "This is the first message";
INSERT INTO queue VALUES (message) "This is the second message";
INSERT INTO queue VALUES (message) "This is the third message";

In an application that would use this, the sort of query you would see would be:

SELECT message FROM queue LIMIT 1;

For a queue engine to work, it would always need to respond with the first record in, being the first record out. So you would get:

"This is the first message"

as a response.

For a second call to the select you would get:

"This is the second message"

At no point would the row appear twice. Sounds great? Here is the problem where SQL would come in. Say you had never done any select so all three rows were intact. So you then do a:

SELECT message FROM queue WHERE message LIKE "%third%";

This would cause MySQL to scan the rows. The engine would have no idea what rows were returned and would empty its queue. Even push down conditions could not fully resolve for the engine what rows were actually sent.


That part of the behavior really bothers me, and is why I don't think a queue engine really makes sense (though I believe I could be really wrong about this). The behavior of the engine would be radically different from any engine we have seen so far.

I and Monty have talked about adding syntax for years that would allow for a DELETE FROM queue WHERE <> RETURN RESULT (or something similar) that would allow this behavior, but having an engine specialize in it would be a bit different. Most engines would have a problem optimizing for the sort fragmentizing affect this would have.

It might be possible to add a callback from SELECT that would tell the storage engine that the row had been selected for use... but I worry about calling that for each row (though this is a matter adding just one additional call in the handler API).

So what do you think?
  • Post a new comment


    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded