Brian "Krow" Aker (krow) wrote,
Brian "Krow" Aker

PostgreSQL to Scale to 1 Biilllliooonnnn Users, Dr Evil would be proud

For reference:

Here are some observations by me on the state of database usage in Web 2.0:
  • All major web 2.0 sites now use object caching (of one type or another)
  • Sharding and now Proxy style solutions are becoming commodity. They are everywhere.

    What does this mean?

    Replication is dead except for replicating for "application" needs.

    Good News :)

    For MySQL it encourages multiple engines. For Postgres I suspect their flexible index design will be useful. The "I replicated over here for a backup, or to run reports..." is still happening a lot. Multi-master replication is one scenario to achieve high availability (DRBD on the low end... you will go broke trying to deploy it with too many nodes). The problem with multi-master is the users, or the developers.

    We could blame the users for not understanding it, and deploying it incorrectly, or we could blame the developers for not making it dirt simple to setup.

    Hey! We can blame marketing guys for over hyping it!

    No matter who is to blame, not everyone can keep it running. Plenty of people do though. This blog entry is being hosted on a site that has had it working for a long time.

    Bad News :(

    The above mentioned technologies now work for any database. So you can pick your database and scale it. Picking an open source database is now just picking for reliability, since that is the one thing that open source databases have in common right now... that and a plethora of drivers for almost any situation.

    What does this mean for someone trying to promote an open source database today? It means that there are only two large differentiators:

  • Online features (aka making schema changes, modifying tables...)

  • Scaling on multi-core/multi-way machines.

    Both of the above are done horribly today by open source database (and not all of the commercial competitors do well either). Online features are a ways off I suspect in the open source world. With proxy designs you can build around online features, but at the end of the day... they do not exist, you are building around problems.

    And backup? I know someone out there is thinking "backup".

    Backup is irrelevant for those of you who care about this discussion. LVM/ZFS snapshots are the rule of the land. With Apple moving to ZFS this will be built into the OS (which makes Apple start to look like a viable platform for servers).

    BTW I am at the MySQL User's Conference the week after this week. Most likely I will be putting together a BOF one night on this topic (and we have a Hackathon planned for Memcached another night). I will also be talking on the future of databases at Web 2.0 Expo in a couple of weeks.
  • Subscribe
    • Post a new comment


      Comments allowed for friends only

      Anonymous comments are disabled in this journal

      default userpic

      Your reply will be screened

      Your IP address will be recorded