MySQL Conference, Percona, the Ecosystem...
Mar. 8th, 2012 | 12:22 am
This is great news. They run a great conference. They always get rave reviews.
This isn't the first one they've done. They've done San Francisco, New York, London, more.
What's different about Santa Clara?
And for the rest of the story...
Link | Leave a comment | Add to Memories | Share
Is it supposed to be a fish?
Jan. 5th, 2012 | 07:45 am
Link | Leave a comment | Add to Memories | Share
Spoon full of Sugar, Oracle and the Open Core Model
Sep. 29th, 2011 | 01:25 pm
From the 451 Group:
“MySQL flirted with the open core licensing model in early 2008 with plans to introduce new features into Enterprise Edition that would not be available under an open source license.”
MySQL didn’t flirt with, it was going to do it.
Why? Because we were asking the question, “how do we pull in customers to make more money”.
MySQL was going to put the new backup API, which never materialized, into an Enterprise branch.
It was a lousy idea for the following reasons:
1) There was no internal API in the server for this, so the engineering was going to be messy and expensive.
2) We didn’t own the technology that was needed to even do this (Oracle owned Hot Backup)
3) Percona has an awesome tool for doing this, that is Open Source (http://www.percona.com/software/percon
4) Backup is a core feature everyone needs, and some of those “everyones” are the folks who manufacture tools that you want to have work with your product.
5) When we were going to announce it, we hadn’t even written it/completed it. It was vaporware.
It would have been a horrible move, and would have caused Chaos for no particular reason. It was dead on arrival, and when it was to be announced as a strategy since it didn’t even exist.
Lets look at Oracle’s move. Both the authentication module, and the Thread Pool come into the MySQL server as plugins. If the engineering of the MySQL server continues in the current direction (which is somewhat flattering to Drizzle I might add), then they are on a good path (if I can find my blog entry where I talked about this as a good strategy, I’ll link back to it here).
Much of the hubbub around Open Source, Community, etc, in regards to this are a bit inflated I feel. They haven’t touched the core product, and they are creating API. Are they possibly hurting themselves in regards to ubiquity?
Doubtful.
Would I pick those two pieces? No, but they aren’t the last two I would pick either. If Sun had continued as a company?
Something similar to this would have been done as well.
From an engineering and usage stand point?
The first person who sniffs at the authentication mechanism who knows anything about security is going to freak.
The Thread Pool can only be used by a very limited number of users (and there are some restrictions on what can be done in the server while it is in use). MySQL’s IO was never designed for the Thread Pool, and there is a lot of engineering work that would need to be done to make it work.
Still? People will use both, and I am betting some customers will want them badly enough to pay.
If they are really badly needed? Well then someone will write an open source version of both.
I have no great love of Oracle, but this is really not a big deal at all. The original GPL’ing of the Public Domain/LGPL clients was a much bigger deal.
Link | Leave a comment | Add to Memories | Share
memcached_exist()
Sep. 22nd, 2011 | 11:11 pm
New in version 0.53 (which yes, I really should renumber into 1.X at some point in the near future) is memcached_exist().
Ever wanted to find out if a key existed but didn’t want to have to fetch the object?
Well now you can do this. It works by seeing if an add can be done on the key (the add though is dated in the past, so any write afterward will expire it).
You can currently grab the code via bar on Launchpad.
Have fun!
Link | | Add to Memories | Share
Dear Lazyweb, how secure is Tomato?
Jul. 21st, 2011 | 10:19 pm
So today I noticed on one of my internal servers the following:
Jul 17 23:53:13 localhost sshd[31847]: Invalid user sales from 123.196.113.11
Jul 17 23:53:13 localhost sshd[31848]: input_userauth_request: invalid user sales
And I also see….
Jul 17 23:47:11 localhost sshd[31690]: reverse mapping checking getaddrinfo for 42.ac.84ae.static.theplanet.com [174.132.172.66] failed - POSSIBLE BREAK-IN ATTEMPT!
Also?
Jul 20 14:56:01 localhost ¿<28>fail2ban.actions: WARNING [ssh-iptables] Ban 121.88.250.208
Huh? Nothing is port forwarded, and the only thing that could be connecting to the box is a Linksys running 1.28 Tomato.
So I am wondering, is Tomato secure right now?
Link | Leave a comment {1} | Add to Memories | Share
Syncing, Google vs OSX, iPhoto Ate my Father's Wedding Pictures
Jul. 7th, 2011 | 03:34 pm
I don't really love OSX, as much as I happen to be a UNIX bigot.
Minus the long grey beard.
Why do I like OSX?
- Terminal always works.
- WiFi always works.
- I love Toasters
Toasters are awesome. You put bread in, you push the bar down and you get toast.
That is until the toaster starts to burn bread because either the dial has been turned all the way up, or the toaster has become so old that the springs are worn out.
iPhotos? iPhotos burns a lot of bread, I mean, it eats a lot of photos.
It is really irritating to lose photos, especially in the manner that happens with iPhotos. With iPhotos you can see the icon it made of the photo, but the original? It is long gone. I haven't done an exhaustive search of all of the meta data, but you certainly can't export or even view the photos.
A lot of my photos I upload to flickr when I want to store/share them longterm, but I haven't always done that.
And when I went to show someone a photo from my father's wedding? I discovered that it had once again eaten all of the photos from the wedding.
So what to do about it? I'm going to go with Picasa. I recently acquired a new NAS (I upgraded from my NV+ Readynas, to a Ultra6 ReadyNas). I have been writing scripts that have been extracting all of the pictures from all my computers. It is opening up tarballs of old home directories and pulling images from them and then storing the images to the NAS. If I could figure out how to deal with Spam I would extract all of the images from my email as well.
Picasa has been running for a couple of dates. It has ~15K of headshots it has found. I thought that iPhotos face recognition was pretty gimmicky. The Google one though? It is sharp. It is finding friend's photos that I didn't know that I had (so much for anonymity during the Fremont Solstice Parade!). My only real complaint with it so far is that I wish I could share the facial recognition information with friends so that we could collectively parse photos.
Downside? Picasa image display is not that awesome. Its slow, and for some reason someone thought it would be brilliant to include all of the headshots in one window. Which means I have to do a bunch of scrolling to approve photos that it is finding.
Like all programs Picasa needs a kill file.
Another downside to Picasa? It is walled garden. I like flickr, I am going to continue to use flickr. It's annoying that I can't sync between the two (maybe Google will buy it?).
At the very least you would think that Gmail would be able to extract photos from email, Or make it easy to share photos between my computers.
iPhoto did an ok job at editing photos, Picasa is really lacking when it comes to this. I've been meaning to make more use of Lightroom, I guess this will give me a reason.
Next on my list of problems to solve?
Contacts.
Link | Leave a comment {1} | Add to Memories | Share
Mysql State of the Ecosystem 2011
Apr. 19th, 2011 | 12:57 pm
A number of years ago I coined the term "the mysql ecosystem". I did it at the time to express a view that MySQL had moved beyond being just what MySQL AB defined "MySQL" as being.
...click to read more.
Link | | Add to Memories | Share
A reminder...
Apr. 18th, 2011 | 08:54 am
The recent DOS attacks against LJ means that my attempts to keep both in sync is failing, so be sure to update to the new link if you are curious as to what I am up too.
And the recent mysql conference? It was fun, and the ecosystem is doing really well.
Link | Leave a comment | Add to Memories | Share
Drizzle goes GA, From "What If", to "What has"
Mar. 17th, 2011 | 10:02 am
What I wanted to do was go back and rethink decisions we had made during the years, especially decisions that we made for MySQL 5.0.
5.0 exists because of the MySQL/SAP alliance. SAP wanted to replace Oracle with MySQL, and to do that MySQL was going to need to run SAP R3 in order to do it. We didn't just pay lip service to SAP, there was an effort to make this happen. Somewhere in the middle of all this there was also a very odd "we were going to adopt SAPDB as the next MySQL". Which of course was never going to happen. There were countless meetings over this, and attempts to somehow sprinkle even an ounce of the SAPDB code into MySQL, but that never happened.
As far as making R3 work on MySQL? That was incredibly unlikely, and it was damaging to the product in the end. We ended up with a lot of features that the database was never designed to have. We created an unrealistic set of expectations. We had a source base which had too little testing.
So part of the goal with Drizzle was to cut it back to the core and build modules that we could then create better testing for. So for that reason Stored Procedures, Views, and Triggers were out. None of them were well designed, and all of them had/have major bugs. We tossed out the monolithic kernel design and moved to microkernel design.
MySQL 5.1 made an attempt to patch the replication system that had been written a decade ago. MySQL replication works, but it works with a lot of exceptions. Anyone who has ever put it into production is aware of these. The good thing about MySQL replication is that it mostly works out of the box, and that is something that was a bit of a revolution when it was created. Today? With the notable exception of SQL Server, the rest of the major databases still have replication systems which are difficult to use, install, or deploy.
We initially looked at using 5.1's replication. We were only going to refactor it such that we were going to beef up its file format and switch to just using the row based replication that was added in 5.1. We were unsuccessful in refactoring it. About 9 months in we figured this out, and we began a rewrite.
The rewrite was the right answer. The original code had too little testing for us to ever know whether or not a change we made created bugs or not.
A big lesson learned, if you are going to refactor code, make sure you have plenty of testing up front.
Internally we have "new code" and "old code". If we want to make a change to "new code" we can typically do it very rapidly. The rate at which we can extend it is pretty amazing. The MySQL code base is not friendly to anyone who knows C++. Pretty much all of the warnings have been disabled and there are a lot of tricky bits.
We have fixed all the warnings in Drizzle. This is something that isn't sexy work, and the only way it is justified is because cleaning up warnings fixes bugs. If you are starting a new code base let me implore upon on you the necessity of doing this from the beginning.
Today our replication is pretty spiffy, and it answers a couple of the big "What If" statements I have wanted answered:
1) We use an entirely open message format.
2) We store our replication records directly in Innodb.
The open message format comes with a penalty, it is more verbose than a native format. It takes up more space than if we just shipped the block records created in the transactional engine. Running a point in time recovery on block records is tricky and very limiting. You can't take the data from one database and push it to another. ETL? Forget it.
We used Google's Protobuffer for the message format. There are other libraries available but they were either license incompatible or were not widely known. At the time we hadn't made a decision to go with boost so using its serialization library wasn't an option. The disadvantage has been that the Google library created a dependency for installing Drizzle. Dependencies are a pain, and when we started Drizzle I had thought that the different Linux distributions had a good handle on this, I don't really believe this any longer. Avoid dependencies.
Storing the records in Innodb has always seemed to be an easy win in my mind. It solves a lot of the two phase commit problems that plague users and it gives you instant recovery. Storing the log in a separate file can possibly give you a win in that you can do some tricks with IO, but in the end it just complicates everything.
With MySQL you always need to keep in mind the question of "What would MyISAM do?"
MyISAM's design, and limitations, are scattered throughout the program. In all cases MySQL has to ask "how will this be handled if we need to store data in a storage engine that can't handle failure, handles all of its own IO, and needs to be locked at the Table level?".
We dropped MyISAM support about a year into our work, and relegated it to a support only roll for temporary tables. We didn't hide it completely before we GA'ed Drizzle, but we won't support it long term. I've heard users say "but I want its performance!". Trading performance for reliability works out for some people, but certainly not everyone. What I find is that when someone wants this, what they really want is a different sort of database all together. Typically it is some sort of analytics problem which creates this need.
Which gets us to the storage engine interface. It was within MySQL the first attempt to create an interface that we could plug different solutions in. I had proposed it in MySQL because I had written different engines and knew what a nightmare it was to make it work.
That engine interface has generated millions of dollars. When I wanted to make it available at MySQL the backlash was significant. Some of sales freaked out, some of marketing thought we were going to let others take over the product, and alliances wanted to know how we were going to limit it to "select partners". On top of that, half of engineering wanted to go and re-engingeer it immediately.
In Drizzle we have spent a significant amount of time reworking the interface, but it is far from perfect. We redesigned it so that engines now own their own meta data and federate that data to the microkernel. We also designed the interface to require that all new engines have ACID like qualities, know how to handle their own recovery, and can handle failure gracefully. Our core engine is Innodb. We have had others propose new engines, and we have even supported other engines, but at the end of the day we know people want a transactional engine mainly because they don't want to find that their data has been trashed.
Our Innodb is a little different. We have more views into the state of the engine, and we fixed our version to compile with a C++ compiler. We cleaned up warnings and fixed the bugs that popped up from that. We have begun to refactor it so that it is more integrated with Drizzle's thread scheduler.
Innodb would have been the default engine for MySQL long ago if not for some "not invented here" mentality, mixed with a flopped buy out attempt. Heikki, the inventor of Innodb, came out quite well in all of this. Good for him.
I don't believe we will spend much more energy on the storage engine interface going forward. It is a dead business, and while there are a couple of companies that have built brand and product enough to make a go of the business I don't expect any additional ones will show up. The storage engine business made money for MySQL, but it was a big distraction. While with Drizzle is easier to integrate an engine, I'm not sure that a business exists for storage engine vendors with it. I'll write more about this at a later date.
Speaking of dates, Drizzle's internal format for timestamps is 64bit. There is still some work to be done to allow to use all 64bits, but you won't need to recompile or change your disk format for them. Right now we need to fix some tests, and make sure a couple of functions will handle the formatting, but we store your data such that going forward, or backwards, you are in good shape. Unlike MySQL we store time as GMT, so there is no screwing around with "well I stored my data in my local time zone, but we had the machine set too...". I have personally spent over a month of time just fixing bugs in that code.
We have spent a lot of time fixing bugs. We get a big collective smile on our faces when we read about new bugs that have been discovered in MySQL, when we discover that we don’t have the bug. We have spent a lot of time fixing bugs, and a considerable amount of our time has been spent on finding new ways to test Drizzle. I am sure plenty of bugs exist to be found.
We also support storing/comparing/displaying time with microseconds. We also have a real BOOL type, which I have been told is handy for the SQL Alchemy folks, and a native ISO UUID type. The UUID is interesting in that it stores time as well as being unique. It isn't as fast as "please give me the next number", but I believe it will be useful for a lot of applications. We have refactored all of the types, and the only one that was not size related that we dropped was SET. If you wonder why we dropped it read the section in the MySQL manual about its limitations and bugs.
Why do we allow only DELETE against a single table at a time, like pretty much every other RDBMS? Beyond the conceptual issue that few can wrap their heads around how to form, let alone feel like they know what the query will do, we hit the problem of the "multiple execution path". There were a lot of one off execution paths in MySQL. In a lot of cases I know these were dead refactoring projects that were never completed. The "multiple execution path" problem is particularly disturbing when you think about fixing a bug. If you fix a bug in DELETE you need to know that there is an execution path for a single table that is different then the path used for multiple tables. This leads to odd behaviors, and a much richer set of bugs that exist.
SQL modes? Those are gone. If you wonder about what sort of problem they create inside of the server, I'd suggest reading about the "Legend of the Ambalappuzha Paal Paayasam".
In general in Drizzle we have tried to get rid of Gotchas that we have found. Things like declaring a column NOT NULL and discovering that somehow the database still stored a NULL is gone. Altering a table and adding a field that would violate the structure of the table? That is gone.
It is amazing that ALTER TABLE works, as the code there is Byzantine. We have made some effort to clean it up, but it is still way too tricky. I wish we could have done more there, but it is what it is. Are you using partitions? Make sure you back up your data before doing an ALTER TABLE. Wrapping partitions into the system in the way that was done at the time was simple, but it is far from robust.
I had hoped that with 5.1 we would have created a single logging API, but instead we ended up with multiple logging API internally in the server. With Drizzle we ripped all of them out and installed a single API. It is crazy simple to write a new logging plugin.
Which gets into the philosophy of plugins in general. Writing plugins should be low hanging fruit. Whenever possible we have tried to make it the case.
We have an entirely new INFORMATION_SCHEMA in Drizzle. It is based in table functions, which is a new concept in Drizzle. We keep a separate schema called DATA_DICTIONARY, in it we put whatever we like. Our INFORMATION_SCHEMA is only what the SQL standard has specified. We do zero vendor modification to it. Another hats off to SQL Server, their INFORMATION_SCHEMA is the closest to complying with the standard.
Drizzle's drivers are BSD. They were written outside of Sun, and Sun signed off on contributions to them under a BSD license. They speak Drizzle and MySQL's protocol. There is a JDBC version that was written. Their adoption is becoming wide spread. Licensing clarity around them is a big win for us, and for ISV who want to integrate. MySQL's licensing mess was related to a lot of hand waving involving them. Recently I noticed that MongoDB had written up a clear licensing policy with regards to their own drivers. Awesome.
We never got to finish all, or really much of any, of what we wanted to do with the Drizzle protocol. I believe this is an area where we will see change in the near future. Internally inside of drizzle we have a C++ interface that resembles JDBC that lets us execute queries. We will be doing a lot more with that interface going forward.
What about performance? With Drizzle we began doing benchmarks early on, using a few different benchmarks. The benchmark generated by sysbench has always been the benchmark we have used as our bellwether. Unlike a lot of databases we test Drizzle with up to 1024 concurrent executing queries. Most of the benchmarks I see people run are for far less connections. We have chosen time and time again to favor performance gains at the high end, over gains on the low end. We are roughly double in performance from where we began. We could still do a lot better. MySQL 5.5 has a new meta data locking system which should do well in a number of situations, we could do a bit better in some of these cases. Our lack of a MyISAM would make it simple for us to move forward in this direction if we want too.
There has been a lot asked about our claim on scaling with lots of cores. Our process there is simple, eliminate locks, favor performance gains when we find them that favor of additional CPU, and try whenever possible to remove strong ownership that require waits for locks. MySQL relies on MyISAM, and MyISAM has significant locks, especially around the keycache, we got rid of those by freeing ourselves from MyISAM. We have had some gains with our new scheduler and we have done some to improve how IO is handled.
I am sure we have a lot of tuning still to do. We won't be publishing benchmarks which compare us to others though. I've yet to see a comparison benchmark which wasn't completely flawed, and even when they are not, few people really understand them. They fall into the classic "how many angles does it take to dance on the head of pin" conversations.
Our authentication system is modular, and we need to iron out more of the authorization system.
I've seen someone say that Drizzle is designed for Google and Facebook. This is not the case at all. We built it so that the next Facebook, Google, etc would have a platform to build on. Facebook and Google have their own forks of MySQL, they aren't going to be using Drizzle. The pieces are there for the next company who needs to innovate, it is just a matter of someone making use of them. We speak the MySQL protocol, so the typical MySQL application runs just fine on Drizzle without change. We designed Drizzle to work as a piece of someone's current infrastructure, not be yet another application which has a costly integration. We have a NoSQL sort of solution via the blob streaming module, but we are first and foremost a relational database.
What will the next Google or Facebook find? A much more friendly platform than what MySQL provided to develop on and with. The big success for Drizzle has been in the people that have been involved. We are without a doubt the descendant of MySQL that has the largest contributor base, and we have long passed MySQL with regards to contributors. We are well into the hundreds when it comes to developers who have contributed code. We have had more then 921 commits in the last month across 20 people. Our numbers go up and down, but we are consistently more then double anyone else in size. If you just walked out of college, or skipped it all together, you are going to have a much easier time adjusting Drizzle to your needs . At least we believe this :)
The codebase is C++, we make use of Boost, and while we are cautious, we tend to favor more forward thought in how we code. Readability is the key to creating code that others will use. Because in the end? We can scale silicon, but carbon? People are much harder to scale.
The people to thank for the code:
Brian Aker
Monty Taylor
Stewart Smith
Lee Bieber
Jay Pipes
Padraig O'Sullivan
Andrew Hutchings
Marko Mäkelä
Joe Daly
Olaf van der Spek
Vijay Samuel
Patrick Crews
Toru Maesaka
David Shrewsbury
Eric Day
Zimin
Marisa Plumb
Joseph Daly
Barry Leslie
Asil Dimov
Mark Atwood
Tim Penhey
Jimmy Yang
Paul McCullagh
Nathan Williams
Paweł Blokus
Sunny Bains
Andy Lester
Hartmut Holzgraefe
Trond Norbye
Other people to thank?
David Douglas who at Sun supported us initially, and when we didn't think our internal support initially at Sun could get any better? Bob Brewen worked with us till the end came for Sun. An extra mention should be made for Lee Bieber, he has been working with the project from nearly the beginning as well. He has handled project management, done code refactoring, made flyers, organized dinners, and did everything else in between.
Mike Shadle for getting us machines, and making sure everything runs. Adrian Otto at Rackspace should be thanked (along with a number of other people as well).
A thank you should go to Chris Dibona for the Google Summer of Code project. We have a number of students who now work on databases for a living thanks to that program. While with MySQL we constantly failed at getting student's code into the server, with Drizzle we have had a lot of success.
There is an entire channel of people who have been involved with Drizzle on Freenode in #drizzle who should be thanked as well. IRC is how we communicate.
There are a lot of other people I am forgetting to thank, sorry about that.
So what next? There is a lot more to Drizzle then what I have written above. Having worked on this for years I often forget what the differences are anymore. There are lots of new features, plenty of new enhancements, and new bugs just waiting to be found. I'm giving a talk at Web 2.0 Expo in a couple of weeks in San Francisco were I will talk about some of what we have done and are doing for virtualization.
I will being giving a keynote at the O'Reilly MySQL Conference & Expo, and there are a handful of talks there on Drizzle as well. The MySQL Ecosystem is a radically different place then what it was a year ago, I'll be commenting on it in the future online and at the conference.
About a week ago Monty Taylor and I sat down and talked about what we wanted to do with Drizzle going forward. Monty has been working on this since the beginning with me, and he has been a lot of fun to work with. One conclusion that we both came to was that we want to see where people will take Drizzle before we determine too much about its future. It is easy to get caught up in new features, and we are interested in seeing how others use it before too many decisions are made about what to do next.
Originally posted on blog.krow.net.