Brian "Krow" Aker (krow) wrote,
Brian "Krow" Aker

MySQL Archive's Compression Method

In 5.1 I modified the compression methodology in response to seeing some schema's which were only seeing compression in the 50% range (which is low compared to what I had seen in most common cases). The main problem I identified was that the compression of rows with NULLS and long varchars was sub-optimal.

In 5.0 and 4.1 the methodology for compression was pretty simple. Take a row, add it to the compression buffer, and then write that out when it was full. In 5.1 this has changed:

1) Rows are first packed. This leads to a varchar being copied into the compression buffer with only the contents and length ever being seen for compression. The extra space is never copied.

2) NULL removal. A set of bytes are stored representing NULLS are stored before any row is copied in. If the attribute (we call them fields internally) was NULL no length or data is stored. Effectively NULLs no longer count in storage.

3) Compression The compression buffer is compressed using ZLIB. In versions before 5.1 gzio was used.

In 5.1 azio is used. It is a bit different then GZIO.
  • Mallocs are avoided.
  • Compression header is very different in AZIO.
  • Knowledge of rows internally (which is why the archive_reader program can do nifty tricks like creating online backups of archive tables).
  • Room for comments, frm, and tracks statics on the file.

    For 5.2 I am planning on adding an information schema to expose this data, and more to users.
  • Subscribe
    • Post a new comment


      Comments allowed for friends only

      Anonymous comments are disabled in this journal

      default userpic

      Your reply will be screened

      Your IP address will be recorded