Brian "Krow" Aker (krow) wrote,
Brian "Krow" Aker

Andreesen, Scale-out, EC2, Its the virtual servers...

I see that Marc Andreesen has been updating his blog:

Some commenters have proposed that Amazon's EC2 service would be a way to easily scale a Facebook app (or a non-Facebook web app). I think EC2 is a great service and have no desire to say anything negative about it. So I will just say two things: it isn't as easy as that, and EC2 is not free either. Bonus points to commenters who want to go into more detail on these topics than I have here!

Its good to see that he updated his blog, but he missed addressing this quote:

The implication is, in my view, quite clear -- the Facebook Platform is primarily for use by either big companies, or venture-backed startups with the funding and capability to handle the slightly insane scale requirements. Individual developers are going to have a very hard time taking advantage of it in useful ways.

The point people are making is not about EC2. The point is that people who are doing scale out via the concept of "buy more hardware" are missing the boat. Why would any VC want to see their money being spent on creating a physical plant for a service solution? The point is the service, not the hardware being bought. You make money on the service, you spend money on the hardware.

Determining how you are going to scale as an after thought will have you working on the weekends. Using concepts like virtual machines will ease your deployment (noticed that on Fedora's registry page that virtual machines are number 2? Look here under the tag "model"). You should be able to split up your architecture and deploy it as virtual machines. If you have to buy hardware, it should be setup so that it uses virtual machines.

This will make your eventual migration to computing farms much easier :)

Building a scale out architecture today means that you should be working out these problems:

  • Caching
  • Partitioning
  • Replicating
  • Batching Processes
  • Study Performance
  • Routing

    Caching Pick a technology to cache whatever pages or data objects you can. Don't assume your architecture will keep pace building web pages for each request. Its a waste of hardware to make this happen.

    Partitioning Split your data up. Make it so that you can spread your architecture up into components.

    Replication Pick a replication strategy. Your data, and your software will need to be copied, make it easy.

    Batching Processes Anything that can be taken off line, should be moved off line. Build queues and design ways of distributing work.

    Study Performance Be analytical, use real numbers. From day one determine how to benchmark what you are doing and how your changes are affecting your systems. This piece almost everyone blows, and its the piece of the puzzle that will keep you up at night, and have you spending your weekends at work. This should not be magic, if you are treating it as this, you are wasting time and money.

    Routing Be smart on how you point your incoming traffic. Round Robin DNS? If that is your answer you might as well take up prayer. Point users to the resources they need, and balance those resources to not waste cpu cycles.

    As I mentioned previously, there are some pages up on MySQL customer scaling here:

    Here is also a slide deck I have been using when speaking at conference on scale-out:

    And Mark Atwood has a good rant on this topic here:
  • Subscribe
    • Post a new comment


      Comments allowed for friends only

      Anonymous comments are disabled in this journal

      default userpic

      Your reply will be screened

      Your IP address will be recorded