The Death of Read Replication

« previous entry | next entry »
Apr. 6th, 2008 | 03:12 pm

A number of months ago, possibly a year ago, I wrote an internal letter to the MySQL internal discuss list with the title of "The Death of Read Replication". Ever since then I've been getting pinged internally to publish my thoughts on this externally.

Here goes :)

Read replication is going to be in use for many years into the future. There are plenty of reasons to use it, and plenty of setups where it will make sense.

All of the scripting, management daemons, and ease of use scenarios will not solve its problems though, and I am finding that users have either moved away from it, or more often, have reduced their need for it.

A few reasons:
  • Latency is painful to manage.
  • Lots of servers means more head count (both disks and in numbers of people to manage it)
  • In web usage, the rule of thumb is to keep your query number under 7, for this reason you try make more out of the seven queries you get. You get more bang for your buck with cached objects.

    For query count, you don't care about how many asynchronous queries you have to make, you care about how many synchronous queries you have to make.

    In 2000 a number of us tried to build caches, and all of us failed except Brad. Brad a few years later came up with memcached. In a world where we all thought the network traffic would be too
    expensive he proved everyone wrong. Livejournal used it. Then CNET used it. Etc...

    Today you cannot name a big Web 2.0 website that does not use it, and among the non-web 2.0 sites its usage has spread.

    For MySQL users, the concept is simple. Build objects from data stored in MySQL and push those objects into memcached. Have all webpages build from objects stored in memcached. Memcached is not the only game in town, but it certainly is the most predominant.

    In the past object stores meant "all your data belongs to us". Today we use them as caches.

    The model for object caches is simple, it is called read through caching. Everything you can turn into a read through cache object, is an object you can scale.

    For sites backed by MySQL, updates to data are handled by pushing data into MySQL, and then having asynchronous systems build objects that are pushed into memcached (you can study these systems by look at Gearman, Hadoop, and Map/Reduce systems via Google). Jamie took my ghetto style runtask and built it out to being something serious for Slashdot somewhere in 2002.

    Asynchronous job systems with cached object stores are the state of the art for building large scale websites.

    MySQL" scaling users" come in three breeds for task systems.

    Those who are deploying an open source solution, those who wrote their own, and those who are still trying to make it work with cron jobs.

    What does this mean? Less database servers. Bringing down your load means you push off the load to another tier.

    Is it worth the hassle to scale another tier?

    It is if you can:
  • Lower the number of machines required
  • Lower the complexity of the setup
  • Use less electricity!

    Getting rid of spinning disks is a good way to get rid of number three. The number of companies who are building solid state platforms is also growing.

    Let me leave you with a thought. Recently I was at the SLAC large database forums, and one of the scientists pointed out, perhaps rightly, that all of the database vendors may find themselves made irrelevant by Hadoop. That is a hard thought to grasp, but I have been kicking it around in my head to think about what this means.

    In one of the diagrams I drew for myself I asked "why do I need to go through MySQL at all... unless I just want it as a backup or for ad-hoc reporting?".

    An observation Josh Berkus made internally was that databases are here because in the generic sense, they work well enough. Despite all of the fancy features we layer around them, they do a good job over a broad range of problems. This was an excellent observation.

    Web 2.0 scaling is a classic problem. You can even say it is "Windows vs UNIX". UNIX means small parts that do one task being used to build up systems. Windows means monolithic solutions. There are advantages and disadvantages of both (though personally I am only interested in heterogenous solutions... I do not like having to keep all my eggs in one basket).

    Web 2.0 is about picking the best of all solutions and gluing them together.
  • Link | Leave a comment | Add to Memories | Share

    Comments {16}

    dormando

    (no subject)

    from: dormando
    date: Apr. 6th, 2008 11:10 pm (UTC)
    Link

    HA master:master replication with buttloads of cache, and sharding :)

    Years go people would shrug that off when I told them... Too hard, inconsistent, data integrity, etc.

    Now most people are on the bandwagon and it feels good. Next step is getting job systems to be more universal.

    Reply | Thread

    Brian "Krow" Aker

    (no subject)

    from: krow
    date: Apr. 6th, 2008 11:32 pm (UTC)
    Link

    Gearman client library is almost done :)

    Worker and C server are next. I'll make an announcement on the Gearman mailing list sometime in the next day or so.

    Reply | Parent | Thread

    (no subject)

    from: mike503
    date: Apr. 7th, 2008 01:16 am (UTC)
    Link

    "All of the scripting, management daemons, and ease of use scenarios will not solve its problems though"

    Exactly. I've been trying to get someone to make it as easy as possible to implement a reliable self-healing replication mechanism for MySQL.

    Luckily so far I haven't grown large enough, and I am pushing everyone I know into a simple object get/set methodology that will make read through caching simpler and decrease any issues of data being set and the cache being out of date.

    Although now I am also looking into CouchDB as a possibility. With built-in replication and a cocky "databases are okay, but they're trying to build redundancy and reliability into something that wasn't designed for it" attitude, I am very intrigued. Especially with it's simple REST-based usage model using HTTP and JSON standards (well, not sure if JSON is formally a "standard" but good enough) and being architected from the start to be shared-nothing and distributable... this could help solve my sleeplessness at night worrying about scaling...

    Reply | Thread

    Brian "Krow" Aker

    (no subject)

    from: krow
    date: Apr. 7th, 2008 01:34 am (UTC)
    Link

    CouchDB is one of the coolest things I have seen conceptually come up in the last year. I think that when Damien hit on the JSON stuff he found the niche that needed to be filled.

    Reply | Parent | Thread

    (no subject)

    from: mike503
    date: Apr. 7th, 2008 02:10 am (UTC)
    Link

    Yeah, right now I am packaging nginx and couchdb .debs to install on my servers and give them a go. Already moved my storage to an iSCSI solution (although my only two filesystem choices were GFS and OCFS2... I went with OCFS2, GFS seems to be a nightmare to configure...)

    I'm hoping getting into all of these help me move to a more shared-nothing architecture :)

    Reply | Parent | Thread

    OCFS2

    from: sykosoft
    date: Apr. 7th, 2008 05:31 am (UTC)
    Link

    We use ocfs2 in production on a +$100M revenue e-commerce/social network site, and it works rather well. The database backend is mysql5 master-master, with mulitple mysql instance per database server, and memcached on all of the web servers. There are SOME caveats with ocfs2 (seriously watch out for the 32000 directory limit).

    Michael

    Reply | Parent | Thread

    Re: OCFS2

    from: mike503
    date: Apr. 7th, 2008 07:24 am (UTC)
    Link

    Michael: Thanks for the input. Would you mind if I talked to you directly? I can't figure out your email from here... mine is mike503 AT gmail.com. I'd like to pick your brain since you've got it running and sanity check myself :)

    Thanks!

    Reply | Parent | Thread

    Владимир

    (no subject)

    from: volodymir_k
    date: Apr. 7th, 2008 10:48 am (UTC)
    Link

    > you don't care about how many asynchronous queries you have to make

    Synchronous queries in the sense that client does not have to wait for response before sending another request. Ethernet card however does not send packets asynchronous, one network segment is shared by all connected hosts. Interesting what is typical ration of time-to-ethernet-send to overall roundtrip. E.g.when there would be 1e+4 network requests, does it matter that carrier is busy.

    > The model for object caches is simple

    And I am waiting when smart batching idea will come. I mean, if we have to query 2 levels of objects by userId (e.g. "accounts" - "transactions"), with memcached we have to send 2 requests. The first is for accountIds, the second is for transactions. We cannot reference objects until *the client* gets their IDs. I wonder if that would be common case to send "program" to memcached: "get refs value by object ID then get referenced objects".

    Reply | Thread

    NDB

    from: bg194
    date: Apr. 18th, 2008 10:28 pm (UTC)
    Link

    NDB, let's talk about it. There is some nice asynchronous replicated object store stuff going on in there.

    (With pluggable storage could we back memcached with NDB?)

    Reply | Thread

    Brian "Krow" Aker

    Re: NDB

    from: krow
    date: Apr. 18th, 2008 11:01 pm (UTC)
    Link

    Yep! The pluggable engine interface is being actively worked on.

    Reply | Parent | Thread

    David Phillips

    Re: NDB

    from: davidphillips
    date: May. 3rd, 2008 12:44 am (UTC)
    Link

    When is NDB going to support adding servers without doing a full backup/restore?

    Reply | Parent | Thread

    Brian "Krow" Aker

    Re: NDB

    from: krow
    date: May. 3rd, 2008 01:52 am (UTC)
    Link

    No idea.... I would ask on internals@mysql.com

    Reply | Parent | Thread

    Re: NDB

    from: bg194
    date: May. 13th, 2008 03:27 pm (UTC)
    Link

    I assume they need ketama-style key hashing for that, but that makes table scans even harder, so it would have to move around junks and compact them or something. Pretty painful.

    Reply | Parent | Thread

    Re: NDB

    from: bg194
    date: May. 13th, 2008 03:32 pm (UTC)
    Link

    s/junks/chunks/

    Or junks. Junks is good.

    Reply | Parent | Thread

    Putting the database where it belongs

    from: natishalom
    date: Apr. 26th, 2008 06:53 am (UTC)
    Link

    I don't see how Hadop or even memcache can be a replacement for a database. I actually tend to agree with your closing phrase
    "Web 2.0 is about picking the best of all solutions and gluing them together."
    . We need to figure out the *right* combination in-memory solution and existing databases and also address a reasonable migration path to this new model after all most systems are using databases today to run their systems.

    I've recently wrote a detailed post on Scaling Out My SQL where i compare the pure database clustering approach with a combination of In-Memory-Data-Grid as front end and database as the backend data store. In general i think that decoupling the application from the underlying database as well as keeping pure in-memory data store to manage our data as the front-end data store is the most scalable and efficient solution. There are plenty of references, many of them servers today mission critical applications in the financial world in which this model proved to be successful.

    I was wondering if you ever considered something on that line?

    Reply | Thread

    Brian "Krow" Aker

    Re: Putting the database where it belongs

    from: krow
    date: May. 13th, 2008 10:57 pm (UTC)
    Link

    I've pretty much considered everything brought to me, or that has been dumped into my lap in recent time.

    When you go to the store you find dozens of flavors of tomato sauce. Architecture is pretty much the same way... there are several solutions.

    Reply | Parent | Thread