?

Log in

No account? Create an account

Queue Engine, and why this won' likely happen...

« previous entry | next entry »
Jun. 28th, 2007 | 11:40 am

After mentioning Blaine Cook's use of MySQL as a queue engine for Twitter I've been pinged about why couldn't one be written for MySQL.

Useful? Hell yes.

Possible? Not Really.

Lets look at the simple case.

CREATE TABLE queue (id serial, message text);

INSERT INTO queue VALUES (message) "This is the first message";
INSERT INTO queue VALUES (message) "This is the second message";
INSERT INTO queue VALUES (message) "This is the third message";

In an application that would use this, the sort of query you would see would be:

SELECT message FROM queue LIMIT 1;

For a queue engine to work, it would always need to respond with the first record in, being the first record out. So you would get:

"This is the first message"

as a response.

For a second call to the select you would get:

"This is the second message"

At no point would the row appear twice. Sounds great? Here is the problem where SQL would come in. Say you had never done any select so all three rows were intact. So you then do a:

SELECT message FROM queue WHERE message LIKE "%third%";

This would cause MySQL to scan the rows. The engine would have no idea what rows were returned and would empty its queue. Even push down conditions could not fully resolve for the engine what rows were actually sent.

Ick!

That part of the behavior really bothers me, and is why I don't think a queue engine really makes sense (though I believe I could be really wrong about this). The behavior of the engine would be radically different from any engine we have seen so far.

I and Monty have talked about adding syntax for years that would allow for a DELETE FROM queue WHERE <> RETURN RESULT (or something similar) that would allow this behavior, but having an engine specialize in it would be a bit different. Most engines would have a problem optimizing for the sort fragmentizing affect this would have.

It might be possible to add a callback from SELECT that would tell the storage engine that the row had been selected for use... but I worry about calling that for each row (though this is a matter adding just one additional call in the handler API).

So what do you think?

Link | Leave a comment |

Comments {4}

Mark Atwood

(no subject)

from: fallenpegasus
date: Jun. 28th, 2007 06:58 pm (UTC)
Link

My answer for troublesome queries is "So, don't do that then", similarly to my answer to troublesome and expensive queries in the S3 engine.

Separate SELECT from DELETE, and then implement

SELECT id,message FROM queue LIMIT 1;

and implement

DELETE FROM queue WHERE id="foo";

and then just tell the user that doing any other sort of SELECT is a Bad Idea.

Reply | Thread

Dossy

(no subject)

from: dossy
date: Jun. 28th, 2007 08:48 pm (UTC)
Link

I wouldn't have thought to use SELECT/DELETE on a queue table in MySQL. I would have (naturally) have used the HANDLER approach:

HANDLER queue READ FIRST LIMIT 1;

Optionally, you might use a "queue_read_ptr" table where you store the "next id serial to read" ... and the "reading" operation starts out by incrementing the id--the assumption is you get to "read" the previous ID. Then, you select that ID out of the queue table and delete it when done.

Yeah, actually, that's the approach I'd take ...

Reply | Thread

too low level......

from: burtonator
date: Jun. 28th, 2007 09:40 pm (UTC)
Link

you're thinking too low level.

The queue API should sit on TOP of MySQL and limit what you can do to it...

The only operations would be to push, pop, and peek basically...but these would be mapped in MySQL in different ways.

Kevin

Reply | Thread

Brian "Krow" Aker

Re: too low level......

from: krow
date: Jun. 29th, 2007 02:33 pm (UTC)
Link

I think that a queue is different then a database in the terms that we think of it, and as implemented in the engine. I spent yesterday arguing this in person with a number of people and we came to the conclusion that there is no high performance way to do it with the engines we have today (the Twitter guys should document their own issues with it).

I think you either need a different engine, with a different set of mechanisms, or you just need a different server. Twitter hacked up memcached for their solution.

Reply | Parent | Thread