Brian "Krow" Aker (krow) wrote,
Brian "Krow" Aker

Why does MySQL insert a NULL when a column is declared NOT NULL?, Other Thoughts...

Here is one, but not all of the culprits in the codebase:

    Count warnings for all inserts.
    For single line insert, generate an error if try to set a NOT NULL field
    to NULL.
  session->count_cuted_fields= ((values_list.elements == 1 &&
                                 !ignore) ?
                                CHECK_FIELD_ERROR_FOR_NULL :

With MySQL you have to keep your mind firmly in the state of "if I was MyISAM, what would I do?".

MyISAM is the default engine for MySQL. It handles everything from temporary tables on disk to the authentication tables. It lays out its rows, for the most part, in the order written. Updates are written in place when possible, and any sort of traffic beyond INSERT and DELETE will fragment it. Its indexes are written out to separate files.

So why does this matter? It means that if a delete occurs that spans multiple rows or if an update happens the entire statement needs to either fail quickly, on the first row, or write out the entire query. When you can't fix what you just wrote, you end up with a problem that if it turns out the query was bad, you need to "make the best of it". Most of the key pair systems, ie the no sql solutions, have a similar issue though they often get around it just by not supporting multiple update, or they just muddle through like MyISAM does.

If you have bad data, if you are truncating some data, or you just didn't get the INSERT query right?

No problem, just insert NULL.

The first change to Drizzle was to make Innodb the default engine, and make it so that we looked at the world through the eyes of a transactional system. Then it took a year to clean up all of the bugs (you can turn on Innodb, and most do, but please realize that it is not very well tested in the MySQL codebase).

Other transactional systems solve this by keeping some sort of log around of the changes, which despite some of the advertising done by the no-sql solutions, is not typically the performance problem that exists for most databases. Users can make it an issue by designing their applications so that it is a problem, but by default it is not a major issue (as a slight digression I need to get one, any, of the companies I know who have done the write data/hardware usage for the current batch of no-sql solutions to publish their numbers because they show the obvious, the current solutions are atrocious).

When I was at Open Source Bridges one of the common themes I heard was "next year we plan to write a new storage engine" by a number of the vendors peddling new databases. This was followed up often by "we attain our performance by not calling sync". I've wondered how many developers have had to stand in front of their bosses and explain "the reason the site will be down for the next day is that we are rebuilding the database because we chose to go with a solution that never actually saved the data".

If you are using MyISAM for data you care about, then you are trading write performance for data corruption and downtime issues.

One thing I learned some time ago, if you ask anyone who builds databases about how they make sure the data they are storing is not either corrupted or lost, and they say anything other then "we call sync", then most likely your data has not been saved, and most likely you will lose some data. The song and dance a developer can do around this is pretty funny. I can't find the link on the OSB site to the MongoDB talk, which is a shame, since there were some good examples found in that talk.

The code fragment points out a bigger culprit in the overall architecture for MySQL. When you keep MyISAM in the codebase, and make it center stage, you have to keep the internal design on a course of "best effort". Where "best effort" is really a very low bar. You can never fix data integrity issues, and you can really never hope to be 100% that what you stored is what the user wanted to insert in the first place. Most of this is gone in Drizzle, and we are cleaning up the last bits for our beta, but every time I find one of these gems I ask myself "why didn't we just ditch MyISAM?".

The answer was of course, because we didn't own Innodb, but that really was a lame answer. Oracle will be paying for that for years, since even if they did switch to Innodb, they have the baggage of supporting MyISAM, and years worth of effort in front of them to move past those design decisions.

I hear they have a lot of engineers though, so I bet somehow this will all get resolved someday :)
  • Post a new comment


    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded