?

Log in

No account? Create an account

Prepared Statements, Musings

« previous entry | next entry »
Jun. 9th, 2008 | 03:45 pm

In the back of my mind for the last couple of weeks have been some musings on prepared statements.

So I walk into this customers office and they don't speak english.

I only speak a few words of their language.

It goes like this:

"Blah, blah, blah, Prepared Statements, blah, blah..."

I say "Turn them off, they are what is crashing your application."

The end customer is confused because the translator had not finished before I gave the answer.

So I explain the problem... the customer has a lot of connections to the database... a lot... and the database is running out of memory and crashing. Was this the customer's problem? Yes it was.

This is a practiced conversation for me. I could do a song and dance on "well you should be building your application like..." but the customer wants to build their application their way. They want a lot of SQL to exist that they are going to generate on the fly. The shortest answer is "turn them off".

So why are prepared statements a problem?

Because users do not clean up/close unused prepared statements. Multiply the number of prepared statements times the number of open connections and you begin to see the problem.

What do you do about this?

Turn them off. The Java driver and a few other drivers turn them off automatically.

Why?

Because unless you really happen to know what you are doing with them, you are going to crash the database (somewhere in the back of my mind is "this is why we moved to garbage collected languages").

I'm not going to defend the design.

Are they faster?

Well "yes" if you have blobs, otherwise... still "yes" for most cases.

Each connection thread in MySQL keeps its own copy. There is no global pool of prepared statements on the server.

And if there was a global pool?

We would need a bunch of read/write locks. Locks are a real problem right now across the industry. I do not believe that any application being built today which has to share data/state has this solved.

Locks harm scaleability. Sure... you could keep a cache of them local to the connection that reads from a central cache, but this would just be pushing out the problem with caching (which is the curren trend in scaling work arounds). The problem is that we see MySQL set ups with thousands of connections, so even this solution is going to chew up a lot of memory.

So what am I thinking?

Ditch the current design completely. Push the parser out to the client, and let it send out a stripped down bytecode language which can be de-serialized back to something we can execute on.

Just skip the entire "lets put this in the server" approach for prepared statements. No cache needed, and you push processing off to the client where it belongs. Will it require a new extension to the protocol? Yes.

The client library would also get a bit bigger, but that does not concern me as much. You can pull sql_yacc.yy and a few functions into a library and get away from the problem of supporting two parsers.

To someone writing against the database there would be no change, dynamic SQL would continue to work as is. The difference is all in what the client is doing internally, and what it is shipping over the wire.

Heck, its even modular at this point :)

Just thinking about a solution...

Link | Leave a comment | Share

Comments {32}

deja vu

from: jimw
date: Jun. 9th, 2008 11:07 pm (UTC)
Link

that idea actually came up in the sorrento meeting, by antony if i remember correctly.

you'd want some sort of cross-language parser generator, so that we wouldn't have to implement parsers in c, java, c#, python, perl, etc.

Reply | Thread

Brian "Krow" Aker

Re: deja vu

from: krow
date: Jun. 9th, 2008 11:21 pm (UTC)
Link

The idea has been floating around for a bit... it has just crossed my mind in the last couple of weeks that perhaps there is no solution to fixing the current PS design. Just toss it.

Reply | Parent | Thread

Bytecodes

from: kazuho
date: Jun. 10th, 2008 12:15 am (UTC)
Link

> Push the parser out to the client, and let it send out a stripped down bytecode language

Yeah! I definitely agree that this is the way to go. It would no only improve server performance, but also opens the possibility of hand-coding access to storage engines (for better performance than what SQL optimizers can achieve).

Reply | Thread

Throwing the baby out with the bathwater

from: bkarwin
date: Jun. 10th, 2008 01:20 am (UTC)
Link

This hearkens back to the days when 4GL was a buzzword, and SQL was parsed at compile-time and converted into low-level API for a binary query protocol. Dynamic SQL, sent to the server and parsed and optimized at runtime, was at that time the new-fangled method of running SQL.

Now static SQL is a distant memory, and everyone uses dynamic SQL. Why? Because now the server determines the SQL language support uniformly for all clients. You have greater assurance that the SQL language is identical for all clients connected to a given server, regardless of the client. Think of tech support nightmare when the SQL language support varies by the client programming language, client version, and operating system!

Also, you don't have to upgrade your clients when a language bug is fixed. Upgrade the server, and voila! All clients benefit. Deployment of SQL fixes and enhancements is so much easier.

There have to be alternative solutions to the problem you mention, of leaking PS data structures. Set a server cap on the memory allocation to these data structures. Implement an LRU algorithm to toss out unused PS. Make this behavior configurable on the server, enabled by default.

Your reasoning sounds like, "people frequently fill up and fragment their hard disk because they never empty their trash can (a true statement), so we should make sure no one can use a hard disk." WTF?!?

Reply | Thread

Brian "Krow" Aker

Re: Throwing the baby out with the bathwater

from: krow
date: Jun. 10th, 2008 01:26 am (UTC)
Link

You are missing the point. You would still have dynamic SQL in your application, the change would be that instead of sending raw SQL to the server, you could optionally send it pre-parsed into bytecode.

AKA no change to how applications are written today. This is about what is happening under the hood.

An LRU with a global cache still has locks, and MySQL doesn't do the second trick that many prepared statement designs does, which is to store the execution plan. So really the current implementation just gives you a way to send objects without escaping them (and a slightly parsed tree).

BTW I updated the post to clarify the point that dynamic SQL is still available after doing this. I hated embedded SQL.



Edited at 2008-06-10 01:41 am (UTC)

Reply | Parent | Thread

Re: Throwing the baby out with the bathwater

from: bkarwin
date: Jun. 10th, 2008 03:07 pm (UTC)
Link

I do understand the difference between compile-time parsing, runtime parsing in the client, and runtime parsing in the server. I also understand the issue with scalability for a resource that requires locking.

I just can't understand the logical progression from "resource constraint on server" to "move parser to client". Won't that just change the location of the resource constraint problem, instead of solving the problem? Perhaps you have made some mental steps connecting the two but they are still in your head. :-)

I do understand that by moving the PS resource management to the client, it's less likely to crash the server. But if you ever want the server to cache optimization plans, you'd be back at square one, because I can't believe you're suggesting to move optimization to the client as well.

What about setting the resource cap per thread, and use a priority queue to implement an LRU at the thread level? When a PS is used, it moves to the beginning of the priority queue and assign it the current timestamp. When the PS pool is full, knock off the entry at the end, under the assumption it is the least recently used.

If the PS at the end has a timestamp so recent that you don't feel it's "stale", then either error out the creation of the new PS or else sleep for enough milliseconds.

Maintain the priority queue as a separate data structure from the thread's PS pool, so you don't have to lock the PS pool to change its "order." You could also implement alterations to the priority queue asynchronously, so they don't block continuation of work. You just have to trust that the state of the priority queue stays up to date enough.

Reply | Parent | Thread | Expand

(Deleted comment)

That will be more problamatic

from: venudb
date: Jun. 10th, 2008 05:09 am (UTC)
Link

Thats exactly what we used to do in our 'last' company and you believe or not that ended up with more problems; and finally we ended by moving everything to server..

But having a parset @client end; gives more controlled options and few calls will be almost NOOP and can be returned directly from client and in case of prepared execution, it saves lot of network bandwidth as well.

Reply | Thread

Brian "Krow" Aker

Re: That will be more problamatic

from: krow
date: Jun. 10th, 2008 07:20 am (UTC)
Link

What sort of problems did you hit?

Reply | Parent | Thread

(Deleted comment)

Brian "Krow" Aker

Re: I am shocked.

from: krow
date: Jun. 10th, 2008 07:19 am (UTC)
Link

I agree on principle that people should not be developing for any particular database.

Other databases created a central cache... this has not been a real problems for others because:

1) They typically run fewer connections.
2) No one has the "many cores" problem solved.

The query cache is global, so small results are global.

Reply | Parent | Thread

awfief

Re: I am shocked.

from: awfief
date: Jun. 11th, 2008 01:40 am (UTC)
Link

Sure, on principle, don't develop to a database....also, all schemas should be normalized, and transactions should be used for perfect data integrity.

In reality, performance is often the #1 motivation, and risk assessment concludes that theory can take a backseat to better performance.

In the same way, prepared statements are often used as "best practice" for security against SQL injections. But they're not so great for performance.

Reply | Parent | Thread | Expand

(Deleted comment)

Philomath

Curious

from: tcepsa
date: Jun. 10th, 2008 03:35 pm (UTC)
Link

Wouldn't putting that implementation in the client lend itself to a greater possibility of SQL Injection-type attacks? Or are there other details around that which would prevent end-users from executing arbitrary SQL? Am I still living in a stone age by using prepared statements to protect against that?

Reply | Thread

Brian "Krow" Aker

Re: Curious

from: krow
date: Jun. 10th, 2008 06:17 pm (UTC)
Link

The way multi-SQL statements are sent is why SQL injection attacks are possible. That should have been done differently in the client in the first place.

Would this solve it? Not really, but solving it could be done by just altering the protocol.

Reply | Parent | Thread

Konstantin Osipov

can't toss them out

from: kostja_osipov
date: Jun. 25th, 2008 08:37 pm (UTC)
Link

Running out of memory is a yet another issue we will get fixed over time, just like we got Bug#27430 and others fixed, albeit only recently.

Would be nice to toss them out completely, but we need to keep the technology, since the a stored procedure or a function is in a way just a collection of prepared statements.

So it seems we're stuck with having some prepared statements, at least internally, for some time.

And BTW, they do get better overall, one gotcha or another gets fixed once a few months.

Reply | Thread

Brian "Krow" Aker

Re: can't toss them out

from: krow
date: Jun. 25th, 2008 09:07 pm (UTC)
Link

Antony and Eric's design for SP does not have this limitation (aka the need for prep stmt). There are a few ways to skin this cat.

Currently I am fond of shipping back the byte code and letting the clients send it with each query. That way any proxy/whatever can inspect and make of the information. I and Monty went over this on the phone a few weeks ago and he could not find immediate issue with it.

Just need a little bit of time to code up an example and see how well it works.

Reply | Parent | Thread