Queue Engine, and why this won' likely happen...
« previous entry | next entry »
Jun. 28th, 2007 | 11:40 am
After mentioning Blaine Cook's use of MySQL as a queue engine for Twitter I've been pinged about why couldn't one be written for MySQL.
Useful? Hell yes.
Possible? Not Really.
Lets look at the simple case.
CREATE TABLE queue (id serial, message text);
INSERT INTO queue VALUES (message) "This is the first message";
INSERT INTO queue VALUES (message) "This is the second message";
INSERT INTO queue VALUES (message) "This is the third message";
In an application that would use this, the sort of query you would see would be:
SELECT message FROM queue LIMIT 1;
For a queue engine to work, it would always need to respond with the first record in, being the first record out. So you would get:
"This is the first message"
as a response.
For a second call to the select you would get:
"This is the second message"
At no point would the row appear twice. Sounds great? Here is the problem where SQL would come in. Say you had never done any select so all three rows were intact. So you then do a:
SELECT message FROM queue WHERE message LIKE "%third%";
This would cause MySQL to scan the rows. The engine would have no idea what rows were returned and would empty its queue. Even push down conditions could not fully resolve for the engine what rows were actually sent.
Ick!
That part of the behavior really bothers me, and is why I don't think a queue engine really makes sense (though I believe I could be really wrong about this). The behavior of the engine would be radically different from any engine we have seen so far.
I and Monty have talked about adding syntax for years that would allow for a DELETE FROM queue WHERE <> RETURN RESULT (or something similar) that would allow this behavior, but having an engine specialize in it would be a bit different. Most engines would have a problem optimizing for the sort fragmentizing affect this would have.
It might be possible to add a callback from SELECT that would tell the storage engine that the row had been selected for use... but I worry about calling that for each row (though this is a matter adding just one additional call in the handler API).
So what do you think?
Useful? Hell yes.
Possible? Not Really.
Lets look at the simple case.
CREATE TABLE queue (id serial, message text);
INSERT INTO queue VALUES (message) "This is the first message";
INSERT INTO queue VALUES (message) "This is the second message";
INSERT INTO queue VALUES (message) "This is the third message";
In an application that would use this, the sort of query you would see would be:
SELECT message FROM queue LIMIT 1;
For a queue engine to work, it would always need to respond with the first record in, being the first record out. So you would get:
"This is the first message"
as a response.
For a second call to the select you would get:
"This is the second message"
At no point would the row appear twice. Sounds great? Here is the problem where SQL would come in. Say you had never done any select so all three rows were intact. So you then do a:
SELECT message FROM queue WHERE message LIKE "%third%";
This would cause MySQL to scan the rows. The engine would have no idea what rows were returned and would empty its queue. Even push down conditions could not fully resolve for the engine what rows were actually sent.
Ick!
That part of the behavior really bothers me, and is why I don't think a queue engine really makes sense (though I believe I could be really wrong about this). The behavior of the engine would be radically different from any engine we have seen so far.
I and Monty have talked about adding syntax for years that would allow for a DELETE FROM queue WHERE <> RETURN RESULT (or something similar) that would allow this behavior, but having an engine specialize in it would be a bit different. Most engines would have a problem optimizing for the sort fragmentizing affect this would have.
It might be possible to add a callback from SELECT that would tell the storage engine that the row had been selected for use... but I worry about calling that for each row (though this is a matter adding just one additional call in the handler API).
So what do you think?
(no subject)
from:
fallenpegasus
date: Jun. 28th, 2007 06:58 pm (UTC)
Link
Separate SELECT from DELETE, and then implement
SELECT id,message FROM queue LIMIT 1;
and implement
DELETE FROM queue WHERE id="foo";
and then just tell the user that doing any other sort of SELECT is a Bad Idea.
Reply | Thread
(no subject)
from:
dossy
date: Jun. 28th, 2007 08:48 pm (UTC)
Link
HANDLER queue READ FIRST LIMIT 1;
Optionally, you might use a "queue_read_ptr" table where you store the "next id serial to read" ... and the "reading" operation starts out by incrementing the id--the assumption is you get to "read" the previous ID. Then, you select that ID out of the queue table and delete it when done.
Yeah, actually, that's the approach I'd take ...
Reply | Thread
too low level......
from:
burtonator
date: Jun. 28th, 2007 09:40 pm (UTC)
Link
The queue API should sit on TOP of MySQL and limit what you can do to it...
The only operations would be to push, pop, and peek basically...but these would be mapped in MySQL in different ways.
Kevin
Reply | Thread
Re: too low level......
from:
krow
date: Jun. 29th, 2007 02:33 pm (UTC)
Link
I think you either need a different engine, with a different set of mechanisms, or you just need a different server. Twitter hacked up memcached for their solution.
Reply | Parent | Thread