Log in

No account? Create an account

MySQL Archive's Compression Method

« previous entry | next entry »
Mar. 20th, 2007 | 05:31 pm

In 5.1 I modified the compression methodology in response to seeing some schema's which were only seeing compression in the 50% range (which is low compared to what I had seen in most common cases). The main problem I identified was that the compression of rows with NULLS and long varchars was sub-optimal.

In 5.0 and 4.1 the methodology for compression was pretty simple. Take a row, add it to the compression buffer, and then write that out when it was full. In 5.1 this has changed:

1) Rows are first packed. This leads to a varchar being copied into the compression buffer with only the contents and length ever being seen for compression. The extra space is never copied.

2) NULL removal. A set of bytes are stored representing NULLS are stored before any row is copied in. If the attribute (we call them fields internally) was NULL no length or data is stored. Effectively NULLs no longer count in storage.

3) Compression The compression buffer is compressed using ZLIB. In versions before 5.1 gzio was used.

In 5.1 azio is used. It is a bit different then GZIO.
  • Mallocs are avoided.
  • Compression header is very different in AZIO.
  • Knowledge of rows internally (which is why the archive_reader program can do nifty tricks like creating online backups of archive tables).
  • Room for comments, frm, and tracks statics on the file.

    For 5.2 I am planning on adding an information schema to expose this data, and more to users.
  • Link | Leave a comment |

    Comments {2}

    (no subject)

    from: anonymouscowa
    date: May. 30th, 2007 06:38 am (UTC)

    Unfortunately, that means that archive tables created by 4.1 and 5.0 are not readable by 5.1 and cause mysqld to coredump with an assertion failure:

    Assertion failed: (buffer[0] == az_magic[0] && buffer[1] == az_magic[1]), function read_header, file azio.c, line 350.

    This should probably get documented somewhere (or fixed). Preferably by allowing old-format archive tables to be read and upgraded.

    Reply | Thread

    Brian "Krow" Aker

    (no subject)

    from: krow
    date: May. 30th, 2007 07:28 am (UTC)

    I need to take a look at what you are pointing out. The files should be upgradable (aka readable). The assert should only be in a debug build. Did you try a production build?

    Reply | Parent | Thread