Brian "Krow" Aker (krow) wrote,
Brian "Krow" Aker
krow

Assumptions, Drizzle

What is the future of Drizzle? What sort of assumptions are you making?

  • Hardware

    On the hardware front I get a lot of distance saying "the future is 64bit, multi-core, and runs on SSD". This is a pretty shallow answer, and is pretty obvious to most everyone. It suits a sound bite but it is not really that revolutionary of a thought. To me the real question is "how do we use them".

    64bit means you have to change the way you code. Memory is now flat for the foreseeable future. Never focus on how to map around 32bit issues and always assume you have a large, flat, memory space available. Spend zero time thinking about 32bit.

    If you are thinking "multi-core" then think about it massively. Right now adoption is at the 16 core point, which means that if you are developing software today, you need to be thinking about multiples of 16. I keep asking myself "how will this work with 256 cores". Yesterday someone came to me with a solution to a feature we have removed in drizzle. "Look we removed all the locks!". Problem was? The developer had used a compare and swap, CAS, operation to solve the problem. Here is the thing, CAS does not scale with this number of cores/chips that will be in machines. The good thing is the engineer got this, and has a new design :) We won't adopt short term solutions that just kneecap us in the near future.

    SSD is here, but it is not here in the sizes needed. What I expect us to do is make use of SSD as a secondary cache, and not look at it as the primary at rest storage. I see a lot of databases sitting in the 20gig to 100gig range. The Library of Congress is 26 terabytes. I expect more scale up so systems will be growing faster in size. SSD is the new hard drive, and fixed disks are tape.

    The piece that I have commented least on is the nature of our micro-kernel. We can push pieces of our design out to other nodes. I do not assume Drizzle will live on a single machine. Network speed keeps going up, and we need to be able to tier the database out across multiple computers.

    One final thought about Hardware, we need 128bit ints. IPV6, UUID, etc, all of these types mean that we need single instruction operator for 16byte types.

  • Community Development

    Today 2/3 of our development comes from outside of the developers Sun pays to work on Drizzle. Even if we add more developers, I expect our total percentage to decrease and not increase. I believe we will see forks and that we have to find ways to help people maintain their forks. One very central piece of what we have to do is move code to the Edge, aka plugins. Thinking about the Edge, has to be a share value.

    I see forks as a positive development, they show potential ways we can evolve. Not all evolutionary paths are successful, but it makes us stronger to see where they go. I expect long term for groups to make distributions around Drizzle, I don't know that we will ever do that.

    Code drives decisions, and those who provide developers drive those decisions.

    While I started out focusing Drizzle on web technologies, we are seeing groups showing up to reuse our kernel in data warehousing and handsets (which is something I never predicted). By keeping the core small we invite groups to use us as a piece to build around.

    Drizzle is not all about my vision, it is about where the collective vision takes us.

  • Directions in Database Technology

    Map/Reduce will kill every traditional data warehousing vendor in the market. Those who adapt to it as a design/deployment pattern will survive, the rest won't. Database systems that have no concept of being multiple node are pretty much dead. If there is no scale out story, then there is not future going forward.

    The way we store data will continue to evolve and diversify. Compression has gotten cheap and processor time has become massive. Column stores will continue to evolve, but they are not a "solves everything" sort of solution. One of the gambles we continue to make is to allow for storage via multiple methods (we refer to this as engines). We will be adding a column store in the near the future, it is an import piece for us to have. Multiple engines cost us in code complexity, but we continue to see value in it. We though will raise the bar on engine design in order to force the complexity of this down to the engine (which will give us online capabilities).

    Stored procedures are the dodos for database technology. The languages vendors have designed are limited. On the same token though, putting processing near the data is key to performance for many applications. We need a new model badly, and this model will be a pushdown from two different directions. One direction is obvious, map/reduce, the other direction is the asynchronous queues we see in most web shops. There is little talk about this right now in the blogosphere, but there is a movement toward queueing systems. Queueing systems are a very popular topic in the hallway tracks of conferences.

    Databases need to learn how to live in the cloud. We cannot have databases be silos of authentication, processing, and expect only to provide data. We must make our data dictionaries available in the cloud, we need to take our authentication from the cloud, etc...

    We need to live in the cloud.
  • Tags: database, drizzle, mysql
    Subscribe
    • Post a new comment

      Error

      Comments allowed for friends only

      Anonymous comments are disabled in this journal

      default userpic

      Your reply will be screened

      Your IP address will be recorded 

    • 8 comments