Mercurial, Clever or just a weird way to do Backups?

« previous entry | next entry »
Apr. 22nd, 2008 | 10:55 am

All of my sites run a piece of software called Everything. Mine is a fork from the original Everydevel Corp's release. It shares some common traits with Slash but it has a more rapid development cycle that I like.

One of the things it shares with Slash, is that the "site" is the database. AKA everything that makes a site, a site, is the database. This means that a backup of the site can be tossed into an everything container and be up and running in seconds (I use this with virtual machines so that I can move stuff around on a whim).

I hate backups. I don't want to take snapshots with LVM, since I know I just need the content, and typically with dumps I end up with a huge set of these files just sitting around.

  • I cannot search them.
  • I cannot diff them between dates.
  • I cannot partially restore them.
  • They take up a lot of space

    So it dawns on me at dinner last night. What if I just used Mercurial?

    So I add a crontab like so:

    0 3 * * * mysqldump -f --single-transaction -T /var/backup/sitename sitename; (cd /var/backup/sitename; hg commit -m "auto"; hg push)


    I push the backups to a centralized server.

    Instant daily backups, and I can partially restore thanks to the tab format. I can pull diffs that I can directly apply and reinsert into the database.

    Perfect? Nope.

    Mercurial is storing deltas so the space difference does not seem to be to bad (and I am only shipping what changed over the wire). This will not work for my Archive tables, they are just too damn big (4 billion rows and growing). Those tables are just logs and I have a different long term plan for them.

    But the data that makes up the site? Works well.

    Since I use Innodb for sites, I can use the --single-transaction trick to take an online backup. If you are using any other storage engine, the above crontab will lock up your site while the dump is being made.

    This will probably work for most revision systems (if you take one of the nifty ones that can chop history you could even automate the tossing of deltas after a week or so). Mercurial has the strong benefit of being http centric and it is incredibly easy to install as a server (and it has never eaten my data like some other open source systems have).

    I welcome feedback :)
  • Link | Leave a comment | Add to Memories | Share

    Comments {16}

    Tanjent

    (no subject)

    from: tanjent
    date: Apr. 22nd, 2008 06:35 pm (UTC)
    Link

    I think I've mentioned to you before that I use SVN pointing at a NAS drive for backup - it's worked fine for me so far and saved my ass recently when the USB hard drive I was using to migrate my stuff to a new machine died. SVN's never eaten any of my data, near as I can tell, though I don't do any weird things with it.

    Anyhow, I found myself wishing I was fluent in Mercurial while I was out in the UK last week - the studio had locked down their Perforce depot (no non-critical checkins for two weeks - ack!), I had to make a ton of changes, and I really needed a bit more flexibility in how I could merge stuff between the locked depot and my local depot.

    I've been pondering moving my backup system to Mercurial for similar reasons - it would be nice to commit things locally during the day and then just push to home when I get back to the hotel room.

    Reply | Thread

    Brian "Krow" Aker

    (no subject)

    from: krow
    date: Apr. 22nd, 2008 07:15 pm (UTC)
    Link

    Why not svk?

    Reply | Parent | Thread

    Tanjent

    (no subject)

    from: tanjent
    date: Apr. 22nd, 2008 07:48 pm (UTC)
    Link

    *laugh*

    Thanks, I'd never heard of that.

    Reply | Parent | Thread

    Lover of Ideas

    (no subject)

    from: omnifarious
    date: Apr. 22nd, 2008 08:17 pm (UTC)
    Link

    I've taken to using Mercurial in this way for some of my data. I've been thinking of writing a permission preserving extension for it so I could keep my .ssh directoy (sans private key file) this way.

    Reply | Parent | Thread

    Brian "Krow" Aker

    (no subject)

    from: krow
    date: Apr. 22nd, 2008 08:36 pm (UTC)
    Link

    What are the odds that a method will be done that will allow you to go into a server and collapse older revisions?

    I was trying to think of a way of "trimming" so that the repository never had to get super big.

    You know... we could just have a "backup mysql" direct to mercurial script :)

    Reply | Parent | Thread

    Lover of Ideas

    Hmmm...

    from: omnifarious
    date: Apr. 22nd, 2008 11:06 pm (UTC)
    Link

    That's an interesting feature. So far, all of the discussion has been about truncating history at some point in the 'past'. Yeah... thinking about it more, being able to collapse portions of the history rather than just truncate is a lot harder and trickier. I think it's possible, but not at all easy.

    I have some ideas on how it might be accomplished even. It's possible, but not easy. It would be easy for the implementer to accidentally corrupt your repository if said implementer wasn't very careful.

    Reply | Parent | Thread

    Brian "Krow" Aker

    Re: Hmmm...

    from: krow
    date: Apr. 23rd, 2008 12:56 am (UTC)
    Link

    Easy to keep backups right now :)

    So truncating... how far off is that?

    Reply | Parent | Thread

    Lover of Ideas

    Re: Hmmm...

    from: omnifarious
    date: Apr. 23rd, 2008 06:03 pm (UTC)
    Link

    Well, there was someone who expressed interest in doing it as an SoC project. Otherwise I would imagine it would represent 2-3 weeks of concentrated effort on the part of one of the core developers. I'm guessing that if it's not an SoC project it's still many months off.

    Reply | Parent | Thread

    awfief

    (no subject)

    from: awfief
    date: Apr. 22nd, 2008 06:36 pm (UTC)
    Link

    If it works for you, then that's great!

    One company I worked for used their version control system for keeping the schema changes (using mysqldump --no-data).

    If this technique is used for sensitive data then of course the repository should be secured, as should access to it.

    I usually recommend a mysqldump --skip-extended-insert on each table individually as an alternative backup methodology to be able to easily diff backups (table by table). This of course means 2 backups -- one that's consistent, for disaster recovery, and another that isn't, for diff-ability.

    Mostly I really like this solution; it shows you have thought about what you really *want* in a backup, and have adjusted your backup plan accordingly. Most folks think a consistent backup is enough, and while it's great for making a new slave, it stinks for incrementals and diffs and whatnot.

    I think that in general, using a source control program for backups will work, if what one wants is what the source control program provides. :)

    Reply | Thread

    awfief

    (no subject)

    from: awfief
    date: Apr. 22nd, 2008 06:42 pm (UTC)
    Link

    Another thought I just had -- this would make data auditing much easier, at least for the "when did this change?" aspect. (that's why a company I worked for used it for schema changes....)

    Reply | Parent | Thread

    Brian "Krow" Aker

    (no subject)

    from: krow
    date: Apr. 22nd, 2008 06:59 pm (UTC)
    Link

    I use tabs for the above. This will keep the size of each individual file smaller (and it makes it much easier to do selected restore on a per table basis).

    Tab formats also diff really well :)

    Reply | Parent | Thread

    (no subject)

    from: sykosoft
    date: Apr. 23rd, 2008 04:18 am (UTC)
    Link

    From watching planet mysql for the last few weeks, I've seen about 4 different methods of MySQL backups. None of them have impressed me very much.

    Last year during linux world, I was walking around the expo floor, and found a company called r1soft. They back up MySQL with a point-in-time snapshot (no lvm used or needed). For an our of the box solution, we use it, and are very happy with it.

    (No, I have no affiliation with r1soft, just like the ease with which backups are made)

    Michael

    Reply | Thread

    Brian "Krow" Aker

    (no subject)

    from: krow
    date: Apr. 23rd, 2008 05:15 am (UTC)
    Link

    What do you like about the backup?

    Reply | Parent | Thread

    (no subject)

    from: sykosoft
    date: Apr. 23rd, 2008 05:35 am (UTC)
    Link

    One of the things that really made me happy was that with an existing (non lvm, etc) system, with mixed table types (innodb, myisam), we could just install the r1soft agent, and make a backup of mysql, in real time, with no table locks or downtime, it's incremental, and I can selectively restore certain tables. I've had the (unfortunate) opportunity to test the restores, and they're very painless, even to an alternate server/location.

    Michael

    Reply | Parent | Thread

    Brian "Krow" Aker

    (no subject)

    from: krow
    date: Apr. 23rd, 2008 06:13 am (UTC)
    Link

    What all did you have to install to make it work?

    Reply | Parent | Thread

    (no subject)

    from: joelpietersen
    date: Feb. 25th, 2010 03:26 pm (UTC)
    Link

    Marvelous article. This is a highly interesting article. I look forward to find more in the near future.

    Reply | Thread