Log in

No account? Create an account

PostgreSQL to Scale to 1 Biilllliooonnnn Users, Dr Evil would be proud

« previous entry | next entry »
Apr. 6th, 2008 | 02:44 pm

For reference:

Here are some observations by me on the state of database usage in Web 2.0:
  • All major web 2.0 sites now use object caching (of one type or another)
  • Sharding and now Proxy style solutions are becoming commodity. They are everywhere.

    What does this mean?

    Replication is dead except for replicating for "application" needs.

    Good News :)

    For MySQL it encourages multiple engines. For Postgres I suspect their flexible index design will be useful. The "I replicated over here for a backup, or to run reports..." is still happening a lot. Multi-master replication is one scenario to achieve high availability (DRBD on the low end... you will go broke trying to deploy it with too many nodes). The problem with multi-master is the users, or the developers.

    We could blame the users for not understanding it, and deploying it incorrectly, or we could blame the developers for not making it dirt simple to setup.

    Hey! We can blame marketing guys for over hyping it!

    No matter who is to blame, not everyone can keep it running. Plenty of people do though. This blog entry is being hosted on a site that has had it working for a long time.

    Bad News :(

    The above mentioned technologies now work for any database. So you can pick your database and scale it. Picking an open source database is now just picking for reliability, since that is the one thing that open source databases have in common right now... that and a plethora of drivers for almost any situation.

    What does this mean for someone trying to promote an open source database today? It means that there are only two large differentiators:

  • Online features (aka making schema changes, modifying tables...)

  • Scaling on multi-core/multi-way machines.

    Both of the above are done horribly today by open source database (and not all of the commercial competitors do well either). Online features are a ways off I suspect in the open source world. With proxy designs you can build around online features, but at the end of the day... they do not exist, you are building around problems.

    And backup? I know someone out there is thinking "backup".

    Backup is irrelevant for those of you who care about this discussion. LVM/ZFS snapshots are the rule of the land. With Apple moving to ZFS this will be built into the OS (which makes Apple start to look like a viable platform for servers).

    BTW I am at the MySQL User's Conference the week after this week. Most likely I will be putting together a BOF one night on this topic (and we have a Hackathon planned for Memcached another night). I will also be talking on the future of databases at Web 2.0 Expo in a couple of weeks.
  • Link | Leave a comment |

    Comments {6}


    (no subject)

    from: dormando
    date: Apr. 6th, 2008 11:08 pm (UTC)

    Sit in #memcached for a few days :)

    While not as prevalent, a lot of small companies (1-20 machines) are using memcached now. You can't really say "everyone" like you can for bigger web 2.0 companies, but the number is absolutely growing!

    Sure it seems silly (APC/Xcache is faster if they're using php, offer the same local cache features), but in a sense it's great for when they actually need to scale. Some smaller folks cache search results, results of template operations, and delay the need to scale for a long time.

    Reply | Parent | Thread


    (no subject)

    from: awfief
    date: Apr. 6th, 2008 11:16 pm (UTC)

    *nod* I think there's a bell-curve center, though, of companies that have been around for a few years (say, 4-7 years) that haven't learned the sexiness of memcached, or are newer and don't know about it.

    Newer companies starting now *better* know about it!

    So yeah, and hopefully this will also be the death of ORM's. But I dream big. :)

    Reply | Parent | Thread