Brian "Krow" Aker (krow) wrote,
Brian "Krow" Aker

The Death of Read Replication

A number of months ago, possibly a year ago, I wrote an internal letter to the MySQL internal discuss list with the title of "The Death of Read Replication". Ever since then I've been getting pinged internally to publish my thoughts on this externally.

Here goes :)

Read replication is going to be in use for many years into the future. There are plenty of reasons to use it, and plenty of setups where it will make sense.

All of the scripting, management daemons, and ease of use scenarios will not solve its problems though, and I am finding that users have either moved away from it, or more often, have reduced their need for it.

A few reasons:
  • Latency is painful to manage.
  • Lots of servers means more head count (both disks and in numbers of people to manage it)
  • In web usage, the rule of thumb is to keep your query number under 7, for this reason you try make more out of the seven queries you get. You get more bang for your buck with cached objects.

    For query count, you don't care about how many asynchronous queries you have to make, you care about how many synchronous queries you have to make.

    In 2000 a number of us tried to build caches, and all of us failed except Brad. Brad a few years later came up with memcached. In a world where we all thought the network traffic would be too
    expensive he proved everyone wrong. Livejournal used it. Then CNET used it. Etc...

    Today you cannot name a big Web 2.0 website that does not use it, and among the non-web 2.0 sites its usage has spread.

    For MySQL users, the concept is simple. Build objects from data stored in MySQL and push those objects into memcached. Have all webpages build from objects stored in memcached. Memcached is not the only game in town, but it certainly is the most predominant.

    In the past object stores meant "all your data belongs to us". Today we use them as caches.

    The model for object caches is simple, it is called read through caching. Everything you can turn into a read through cache object, is an object you can scale.

    For sites backed by MySQL, updates to data are handled by pushing data into MySQL, and then having asynchronous systems build objects that are pushed into memcached (you can study these systems by look at Gearman, Hadoop, and Map/Reduce systems via Google). Jamie took my ghetto style runtask and built it out to being something serious for Slashdot somewhere in 2002.

    Asynchronous job systems with cached object stores are the state of the art for building large scale websites.

    MySQL" scaling users" come in three breeds for task systems.

    Those who are deploying an open source solution, those who wrote their own, and those who are still trying to make it work with cron jobs.

    What does this mean? Less database servers. Bringing down your load means you push off the load to another tier.

    Is it worth the hassle to scale another tier?

    It is if you can:
  • Lower the number of machines required
  • Lower the complexity of the setup
  • Use less electricity!

    Getting rid of spinning disks is a good way to get rid of number three. The number of companies who are building solid state platforms is also growing.

    Let me leave you with a thought. Recently I was at the SLAC large database forums, and one of the scientists pointed out, perhaps rightly, that all of the database vendors may find themselves made irrelevant by Hadoop. That is a hard thought to grasp, but I have been kicking it around in my head to think about what this means.

    In one of the diagrams I drew for myself I asked "why do I need to go through MySQL at all... unless I just want it as a backup or for ad-hoc reporting?".

    An observation Josh Berkus made internally was that databases are here because in the generic sense, they work well enough. Despite all of the fancy features we layer around them, they do a good job over a broad range of problems. This was an excellent observation.

    Web 2.0 scaling is a classic problem. You can even say it is "Windows vs UNIX". UNIX means small parts that do one task being used to build up systems. Windows means monolithic solutions. There are advantages and disadvantages of both (though personally I am only interested in heterogenous solutions... I do not like having to keep all my eggs in one basket).

    Web 2.0 is about picking the best of all solutions and gluing them together.
  • Subscribe
    • Post a new comment


      Comments allowed for friends only

      Anonymous comments are disabled in this journal

      default userpic

      Your reply will be screened

      Your IP address will be recorded