Assumptions, Drizzle

« previous entry | next entry »
Oct. 22nd, 2008 | 11:00 am

What is the future of Drizzle? What sort of assumptions are you making?

  • Hardware

    On the hardware front I get a lot of distance saying "the future is 64bit, multi-core, and runs on SSD". This is a pretty shallow answer, and is pretty obvious to most everyone. It suits a sound bite but it is not really that revolutionary of a thought. To me the real question is "how do we use them".

    64bit means you have to change the way you code. Memory is now flat for the foreseeable future. Never focus on how to map around 32bit issues and always assume you have a large, flat, memory space available. Spend zero time thinking about 32bit.

    If you are thinking "multi-core" then think about it massively. Right now adoption is at the 16 core point, which means that if you are developing software today, you need to be thinking about multiples of 16. I keep asking myself "how will this work with 256 cores". Yesterday someone came to me with a solution to a feature we have removed in drizzle. "Look we removed all the locks!". Problem was? The developer had used a compare and swap, CAS, operation to solve the problem. Here is the thing, CAS does not scale with this number of cores/chips that will be in machines. The good thing is the engineer got this, and has a new design :) We won't adopt short term solutions that just kneecap us in the near future.

    SSD is here, but it is not here in the sizes needed. What I expect us to do is make use of SSD as a secondary cache, and not look at it as the primary at rest storage. I see a lot of databases sitting in the 20gig to 100gig range. The Library of Congress is 26 terabytes. I expect more scale up so systems will be growing faster in size. SSD is the new hard drive, and fixed disks are tape.

    The piece that I have commented least on is the nature of our micro-kernel. We can push pieces of our design out to other nodes. I do not assume Drizzle will live on a single machine. Network speed keeps going up, and we need to be able to tier the database out across multiple computers.

    One final thought about Hardware, we need 128bit ints. IPV6, UUID, etc, all of these types mean that we need single instruction operator for 16byte types.

  • Community Development

    Today 2/3 of our development comes from outside of the developers Sun pays to work on Drizzle. Even if we add more developers, I expect our total percentage to decrease and not increase. I believe we will see forks and that we have to find ways to help people maintain their forks. One very central piece of what we have to do is move code to the Edge, aka plugins. Thinking about the Edge, has to be a share value.

    I see forks as a positive development, they show potential ways we can evolve. Not all evolutionary paths are successful, but it makes us stronger to see where they go. I expect long term for groups to make distributions around Drizzle, I don't know that we will ever do that.

    Code drives decisions, and those who provide developers drive those decisions.

    While I started out focusing Drizzle on web technologies, we are seeing groups showing up to reuse our kernel in data warehousing and handsets (which is something I never predicted). By keeping the core small we invite groups to use us as a piece to build around.

    Drizzle is not all about my vision, it is about where the collective vision takes us.

  • Directions in Database Technology

    Map/Reduce will kill every traditional data warehousing vendor in the market. Those who adapt to it as a design/deployment pattern will survive, the rest won't. Database systems that have no concept of being multiple node are pretty much dead. If there is no scale out story, then there is not future going forward.

    The way we store data will continue to evolve and diversify. Compression has gotten cheap and processor time has become massive. Column stores will continue to evolve, but they are not a "solves everything" sort of solution. One of the gambles we continue to make is to allow for storage via multiple methods (we refer to this as engines). We will be adding a column store in the near the future, it is an import piece for us to have. Multiple engines cost us in code complexity, but we continue to see value in it. We though will raise the bar on engine design in order to force the complexity of this down to the engine (which will give us online capabilities).

    Stored procedures are the dodos for database technology. The languages vendors have designed are limited. On the same token though, putting processing near the data is key to performance for many applications. We need a new model badly, and this model will be a pushdown from two different directions. One direction is obvious, map/reduce, the other direction is the asynchronous queues we see in most web shops. There is little talk about this right now in the blogosphere, but there is a movement toward queueing systems. Queueing systems are a very popular topic in the hallway tracks of conferences.

    Databases need to learn how to live in the cloud. We cannot have databases be silos of authentication, processing, and expect only to provide data. We must make our data dictionaries available in the cloud, we need to take our authentication from the cloud, etc...

    We need to live in the cloud.
  • Link | Leave a comment | Add to Memories | Share

    Comments {8}

    awfief

    (no subject)

    from: awfief
    date: Oct. 22nd, 2008 06:08 pm (UTC)
    Link

    the good thing about things like planning for multiple cores, lots of memory, and many different physical machines...

    is that it will all work on one behemoth machine if you want it to. Not planning this way right now makes no sense -- obviously we can't predict the future. The most difficult area will likely be disk, since historically that's what's actually changed. RAM hasn't changed (AFAIK), though amounts and speeds have. Disk has changed -- different OS's use different filesystem types, which is essentially different disks, due to usage (and is relevant for issues like fsync). SSD's are one way disks may or may not change in the future....but in the time of records, who envisioned tapes or CD's or mp3 files?

    We can try to figure that stuff out....but historically, RAM really hasn't changed, CPU's have changed a bit (layout wise w/multiple cores), and disks have changed the most. This boils down to "make use of everything we know about today, and give the most flexibility to the disk stuff".

    perhaps via drizzle plugins? *shrug*

    Reply | Thread

    Tanjent

    (no subject)

    from: tanjent
    date: Oct. 22nd, 2008 06:12 pm (UTC)
    Link

    Don't know about Sun's chips, but all modern intel/amd/ppc chips have 128-bit vector registers and a good complement of integer ops (though I don't think any can treat the entire register as one enormous 128-bit int). Windows under x64 actually bypasses the old x87 FPU entirely and does all float math using vector registers - it lets the compiler extract a nice bit of extra parallelism in certain cases.

    Reply | Thread

    Brian "Krow" Aker

    (no subject)

    from: krow
    date: Oct. 22nd, 2008 06:18 pm (UTC)
    Link

    I want a single instruction comparison for 128bit ints :)
    (Really... a 16byte comparison operation).

    I'm not trusting compilers to really optimize for me around this just yet.

    Reply | Parent | Thread

    Tanjent

    (no subject)

    from: tanjent
    date: Oct. 22nd, 2008 06:32 pm (UTC)
    Link

    Altivec has vcmp(eq/gt)(s/u)(b/h/w) - vector compare equal/greater signed/unsigned byte/half/word, and docs say it sets the comparison flags appropriately. Don't remember SSE's off the top of my head but they have an equivalent.

    Reply | Parent | Thread

    (no subject)

    from: axehind
    date: Oct. 22nd, 2008 06:25 pm (UTC)
    Link

    Totally agree about moving code to the edge. plugins/modules are the way to go if the interface is easy enough to use. This will spawn lots of sub projects which is a good thing I think!

    Not sure I agree with you in regards to SSD. I think using it as a secondary cache could be beneficial right now but I dont see that phase lasting long. I think the jump to all SSD will happen pretty fast.

    Reply | Thread

    mutantgarage

    (no subject)

    from: mutantgarage
    date: Oct. 25th, 2008 04:55 pm (UTC)
    Link

    NAND Flash SSDs are good for reads, but write performance sucks. They also wear out with a lot of writes, stick a big RAM writeback cache in front of the SSD. This can be hardware or implemented in your application.

    Reply | Parent | Thread

    Eric

    Cloud Application Framework

    from: oddity80
    date: Oct. 22nd, 2008 06:37 pm (UTC)
    Link

    Very well put (on all topics). In terms or pushing application logic out to where the data lives, we need a dynamic application framework that integrates well into existing web shops while leveraging the power of the cloud. Google and Amazon had the right idea, but they've missed the mark (keep it open, people want more control). There are some solutions that work well for specific purposes (Hadoop), but we still need something more general-purpose. I have a feeling a flexible, open source, language-independent, application framework is right around the corner... or is it already here but just not well known? ;)

    Reply | Thread

    Agreed on 128 bits.

    from: burtonator
    date: Oct. 22nd, 2008 07:03 pm (UTC)
    Link

    I agree on having 128 bits.... I've needed them before.......

    Kevin

    Reply | Thread