April 6th, 2008


PostgreSQL to Scale to 1 Biilllliooonnnn Users, Dr Evil would be proud

For reference:

Here are some observations by me on the state of database usage in Web 2.0:
  • All major web 2.0 sites now use object caching (of one type or another)
  • Sharding and now Proxy style solutions are becoming commodity. They are everywhere.

    What does this mean?

    Replication is dead except for replicating for "application" needs.

    Good News :)

    For MySQL it encourages multiple engines. For Postgres I suspect their flexible index design will be useful. The "I replicated over here for a backup, or to run reports..." is still happening a lot. Multi-master replication is one scenario to achieve high availability (DRBD on the low end... you will go broke trying to deploy it with too many nodes). The problem with multi-master is the users, or the developers.

    We could blame the users for not understanding it, and deploying it incorrectly, or we could blame the developers for not making it dirt simple to setup.

    Hey! We can blame marketing guys for over hyping it!

    No matter who is to blame, not everyone can keep it running. Plenty of people do though. This blog entry is being hosted on a site that has had it working for a long time.

    Bad News :(

    The above mentioned technologies now work for any database. So you can pick your database and scale it. Picking an open source database is now just picking for reliability, since that is the one thing that open source databases have in common right now... that and a plethora of drivers for almost any situation.

    What does this mean for someone trying to promote an open source database today? It means that there are only two large differentiators:

  • Online features (aka making schema changes, modifying tables...)

  • Scaling on multi-core/multi-way machines.

    Both of the above are done horribly today by open source database (and not all of the commercial competitors do well either). Online features are a ways off I suspect in the open source world. With proxy designs you can build around online features, but at the end of the day... they do not exist, you are building around problems.

    And backup? I know someone out there is thinking "backup".

    Backup is irrelevant for those of you who care about this discussion. LVM/ZFS snapshots are the rule of the land. With Apple moving to ZFS this will be built into the OS (which makes Apple start to look like a viable platform for servers).

    BTW I am at the MySQL User's Conference the week after this week. Most likely I will be putting together a BOF one night on this topic (and we have a Hackathon planned for Memcached another night). I will also be talking on the future of databases at Web 2.0 Expo in a couple of weeks.
  • glasses

    The Death of Read Replication

    A number of months ago, possibly a year ago, I wrote an internal letter to the MySQL internal discuss list with the title of "The Death of Read Replication". Ever since then I've been getting pinged internally to publish my thoughts on this externally.

    Here goes :)

    Read replication is going to be in use for many years into the future. There are plenty of reasons to use it, and plenty of setups where it will make sense.

    All of the scripting, management daemons, and ease of use scenarios will not solve its problems though, and I am finding that users have either moved away from it, or more often, have reduced their need for it.

    A few reasons:
  • Latency is painful to manage.
  • Lots of servers means more head count (both disks and in numbers of people to manage it)
  • In web usage, the rule of thumb is to keep your query number under 7, for this reason you try make more out of the seven queries you get. You get more bang for your buck with cached objects.

    For query count, you don't care about how many asynchronous queries you have to make, you care about how many synchronous queries you have to make.

    In 2000 a number of us tried to build caches, and all of us failed except Brad. Brad a few years later came up with memcached. In a world where we all thought the network traffic would be too
    expensive he proved everyone wrong. Livejournal used it. Then CNET used it. Etc...

    Today you cannot name a big Web 2.0 website that does not use it, and among the non-web 2.0 sites its usage has spread.

    For MySQL users, the concept is simple. Build objects from data stored in MySQL and push those objects into memcached. Have all webpages build from objects stored in memcached. Memcached is not the only game in town, but it certainly is the most predominant.

    In the past object stores meant "all your data belongs to us". Today we use them as caches.

    The model for object caches is simple, it is called read through caching. Everything you can turn into a read through cache object, is an object you can scale.

    For sites backed by MySQL, updates to data are handled by pushing data into MySQL, and then having asynchronous systems build objects that are pushed into memcached (you can study these systems by look at Gearman, Hadoop, and Map/Reduce systems via Google). Jamie took my ghetto style runtask and built it out to being something serious for Slashdot somewhere in 2002.

    Asynchronous job systems with cached object stores are the state of the art for building large scale websites.

    MySQL" scaling users" come in three breeds for task systems.

    Those who are deploying an open source solution, those who wrote their own, and those who are still trying to make it work with cron jobs.

    What does this mean? Less database servers. Bringing down your load means you push off the load to another tier.

    Is it worth the hassle to scale another tier?

    It is if you can:
  • Lower the number of machines required
  • Lower the complexity of the setup
  • Use less electricity!

    Getting rid of spinning disks is a good way to get rid of number three. The number of companies who are building solid state platforms is also growing.

    Let me leave you with a thought. Recently I was at the SLAC large database forums, and one of the scientists pointed out, perhaps rightly, that all of the database vendors may find themselves made irrelevant by Hadoop. That is a hard thought to grasp, but I have been kicking it around in my head to think about what this means.

    In one of the diagrams I drew for myself I asked "why do I need to go through MySQL at all... unless I just want it as a backup or for ad-hoc reporting?".

    An observation Josh Berkus made internally was that databases are here because in the generic sense, they work well enough. Despite all of the fancy features we layer around them, they do a good job over a broad range of problems. This was an excellent observation.

    Web 2.0 scaling is a classic problem. You can even say it is "Windows vs UNIX". UNIX means small parts that do one task being used to build up systems. Windows means monolithic solutions. There are advantages and disadvantages of both (though personally I am only interested in heterogenous solutions... I do not like having to keep all my eggs in one basket).

    Web 2.0 is about picking the best of all solutions and gluing them together.