Log in

No account? Create an account

PostgreSQL to Scale to 1 Biilllliooonnnn Users, Dr Evil would be proud

« previous entry | next entry »
Apr. 6th, 2008 | 02:44 pm

For reference:

Here are some observations by me on the state of database usage in Web 2.0:
  • All major web 2.0 sites now use object caching (of one type or another)
  • Sharding and now Proxy style solutions are becoming commodity. They are everywhere.

    What does this mean?

    Replication is dead except for replicating for "application" needs.

    Good News :)

    For MySQL it encourages multiple engines. For Postgres I suspect their flexible index design will be useful. The "I replicated over here for a backup, or to run reports..." is still happening a lot. Multi-master replication is one scenario to achieve high availability (DRBD on the low end... you will go broke trying to deploy it with too many nodes). The problem with multi-master is the users, or the developers.

    We could blame the users for not understanding it, and deploying it incorrectly, or we could blame the developers for not making it dirt simple to setup.

    Hey! We can blame marketing guys for over hyping it!

    No matter who is to blame, not everyone can keep it running. Plenty of people do though. This blog entry is being hosted on a site that has had it working for a long time.

    Bad News :(

    The above mentioned technologies now work for any database. So you can pick your database and scale it. Picking an open source database is now just picking for reliability, since that is the one thing that open source databases have in common right now... that and a plethora of drivers for almost any situation.

    What does this mean for someone trying to promote an open source database today? It means that there are only two large differentiators:

  • Online features (aka making schema changes, modifying tables...)

  • Scaling on multi-core/multi-way machines.

    Both of the above are done horribly today by open source database (and not all of the commercial competitors do well either). Online features are a ways off I suspect in the open source world. With proxy designs you can build around online features, but at the end of the day... they do not exist, you are building around problems.

    And backup? I know someone out there is thinking "backup".

    Backup is irrelevant for those of you who care about this discussion. LVM/ZFS snapshots are the rule of the land. With Apple moving to ZFS this will be built into the OS (which makes Apple start to look like a viable platform for servers).

    BTW I am at the MySQL User's Conference the week after this week. Most likely I will be putting together a BOF one night on this topic (and we have a Hackathon planned for Memcached another night). I will also be talking on the future of databases at Web 2.0 Expo in a couple of weeks.
  • Link | Leave a comment |

    Comments {6}


    (no subject)

    from: awfief
    date: Apr. 6th, 2008 10:31 pm (UTC)

    I will agree that the major web 2.0 companies are using it. But what about the little people? I have seen very few companies using memcached (then again, they come to me because they need my level of help, they don't come to MySQL until they really are huge)......

    If judging the market by the top companies means "x is dead", we have problems.

    How many of these companies are using MySQL 5.0? 5.1? how abut 4.0 or 4.1? I bet none are using 4.1 or lower. In the "lower 95%" or 90% of customers, the ones I normally see, only about half have moved to mysql 5.0, the rest are still on 4.1, and a tiny minority on 4.0.

    So really, while what you say is true -- that the top companies are using object caching -- replication is far, far from dead. It's still a staple amongst many. (although I did have a client say to me "well, the schema generator spit out an AUTO_INCREMENT INT, so let's keep it that way, not using UNSIGNED, because I know it works."

    ugh. :)

    Reply | Thread

    Brian "Krow" Aker

    (no subject)

    from: krow
    date: Apr. 6th, 2008 10:55 pm (UTC)

    When someone just needs a database or two, they typically are not having scaling problems. When they jump that number though... things get interest ing :)

    Reply | Parent | Thread


    (no subject)

    from: dormando
    date: Apr. 6th, 2008 11:08 pm (UTC)

    Sit in #memcached for a few days :)

    While not as prevalent, a lot of small companies (1-20 machines) are using memcached now. You can't really say "everyone" like you can for bigger web 2.0 companies, but the number is absolutely growing!

    Sure it seems silly (APC/Xcache is faster if they're using php, offer the same local cache features), but in a sense it's great for when they actually need to scale. Some smaller folks cache search results, results of template operations, and delay the need to scale for a long time.

    Reply | Parent | Thread


    (no subject)

    from: awfief
    date: Apr. 6th, 2008 11:16 pm (UTC)

    *nod* I think there's a bell-curve center, though, of companies that have been around for a few years (say, 4-7 years) that haven't learned the sexiness of memcached, or are newer and don't know about it.

    Newer companies starting now *better* know about it!

    So yeah, and hopefully this will also be the death of ORM's. But I dream big. :)

    Reply | Parent | Thread

    Yazz D. Atlas


    from: aaton
    date: Apr. 7th, 2008 06:38 am (UTC)

    you wouldn't be talking directly to me now would you :-)

    Reply | Thread

    Brian "Krow" Aker

    Re: backups

    from: krow
    date: Apr. 7th, 2008 06:57 am (UTC)

    Nope :)

    I do know you and your systems fall into this category. It is best practice at this point.

    Reply | Parent | Thread