Drizzle, Optimizer, Execution Flow
Sep. 21st, 2009 | 09:29 am
The present in my RSS feed this morning was an article by Peter Zaitsev on :
http://www.mysqlperformanceblog.com/2009/0 9/20/guidance-for-mysql-optimizer-develo pers/
One of the changes made recently to Drizzle, is a redesign of our executioner. This was done by Padraig. MySQL inherited a design where the parser uses one Global Lex Structure to fill in the members of the query for later use. This structure has members in it for every form of query that can be executed. The structure is also assigned an ENUM that will be used later for determining if the query is a SELECT, INSERT, etc... via a switch/case dispatcher.
This of course creates a number of limitations in the design, especially for Drizzle since we focus on micro-kernel over monolithic design. What Padraig completed recently was work that took apart the switch/case dispatcher and replaced it with an object executioner. Each of the query types, SELECT, INSERT, etc,... are now assigned to a Statement object which are assigned in the parser on the top level. We are now focusing on removing the Global Lex Structure and placing its members into the Statement execution objects.
This gives us a smaller footprint parsing and it makes the entire system pluggable in that we can now truly break down the parser in ways that we couldn't before. Before to extend the parser you would have to grow the global lex structure for any extension to the SQL syntax/execution functionality. The global lex structure has a very large footprint and you have to weigh the cost of any new feature vs growing the size of the global structure. Setting the default values for this alone is fairly expensive (memset() is not your friend).
It also meant that for someone who wanted to write a new execution path, that had to modify the switch/case executioner and fiddle with all of the code along the path.
Now in Drizzle? Anyone who wants to write a new execution path can focus on just a single class. This class is generated in the parser and encapsulates all of the logic for a given statement. BTW these Statement objects are not limited to the SQL parser, so any of the new methods for Connection can generate one as well (without going to deep here, what I mean is the work that Eric Day has been doing that will allow us to speak native HTTP/etc will now be able to use direct REST parsers... so SQL is just one method of execution among many).
This move to Statement gives us a design which is less complex then what we did in the past and it allows a developer who wants to extend the database to focus on just what they want to work on. Hell, maybe we should put together a talk for the East Coast No-SQL conference just to make a point about flexibility around SQL vs No-SQL designs.
So why did I open up this article up with a reference to Peter's article?
Just as the switch/case executioner existed for all queries, a similar one existed for the optimizer. About two months ago we started straightening it out so that we could do the same to it as we have been doing to the main switch/case dispatcher. From the top down we are able to use similar designs at each connecting point in the database.
The gain is? Just as our Statement Executioner cracks the door open to the main executioning at the core of Drizzle being pluggable, by reuse of design the same will be true of our join optimizer (!!!). Once we are complete, our Join Exectioner will be extendable by just adding in new plugins. Our plans are moving along, and the pluggable parser is on our horizon. The goal will be a set of classes that can be extended so that the Join Executioner can be easily extended as well.
When I read articles like Peter's, and Peter is one of the experts in tuning MySQL, it makes me pretty excited to see validation in the concepts we hold dearly. I'm not sure that we will pull off exactly what Peter wants longterm, time will only tell, but I do see that the concepts that we are pushing are shared by others as well.
http://www.mysqlperformanceblog.com/2009/0
One of the changes made recently to Drizzle, is a redesign of our executioner. This was done by Padraig. MySQL inherited a design where the parser uses one Global Lex Structure to fill in the members of the query for later use. This structure has members in it for every form of query that can be executed. The structure is also assigned an ENUM that will be used later for determining if the query is a SELECT, INSERT, etc... via a switch/case dispatcher.
This of course creates a number of limitations in the design, especially for Drizzle since we focus on micro-kernel over monolithic design. What Padraig completed recently was work that took apart the switch/case dispatcher and replaced it with an object executioner. Each of the query types, SELECT, INSERT, etc,... are now assigned to a Statement object which are assigned in the parser on the top level. We are now focusing on removing the Global Lex Structure and placing its members into the Statement execution objects.
This gives us a smaller footprint parsing and it makes the entire system pluggable in that we can now truly break down the parser in ways that we couldn't before. Before to extend the parser you would have to grow the global lex structure for any extension to the SQL syntax/execution functionality. The global lex structure has a very large footprint and you have to weigh the cost of any new feature vs growing the size of the global structure. Setting the default values for this alone is fairly expensive (memset() is not your friend).
It also meant that for someone who wanted to write a new execution path, that had to modify the switch/case executioner and fiddle with all of the code along the path.
Now in Drizzle? Anyone who wants to write a new execution path can focus on just a single class. This class is generated in the parser and encapsulates all of the logic for a given statement. BTW these Statement objects are not limited to the SQL parser, so any of the new methods for Connection can generate one as well (without going to deep here, what I mean is the work that Eric Day has been doing that will allow us to speak native HTTP/etc will now be able to use direct REST parsers... so SQL is just one method of execution among many).
This move to Statement gives us a design which is less complex then what we did in the past and it allows a developer who wants to extend the database to focus on just what they want to work on. Hell, maybe we should put together a talk for the East Coast No-SQL conference just to make a point about flexibility around SQL vs No-SQL designs.
So why did I open up this article up with a reference to Peter's article?
Just as the switch/case executioner existed for all queries, a similar one existed for the optimizer. About two months ago we started straightening it out so that we could do the same to it as we have been doing to the main switch/case dispatcher. From the top down we are able to use similar designs at each connecting point in the database.
The gain is? Just as our Statement Executioner cracks the door open to the main executioning at the core of Drizzle being pluggable, by reuse of design the same will be true of our join optimizer (!!!). Once we are complete, our Join Exectioner will be extendable by just adding in new plugins. Our plans are moving along, and the pluggable parser is on our horizon. The goal will be a set of classes that can be extended so that the Join Executioner can be easily extended as well.
When I read articles like Peter's, and Peter is one of the experts in tuning MySQL, it makes me pretty excited to see validation in the concepts we hold dearly. I'm not sure that we will pull off exactly what Peter wants longterm, time will only tell, but I do see that the concepts that we are pushing are shared by others as well.
Link | Leave a comment {3} | Add to Memories | Tell a Friend
Short Videos from OSCON
Aug. 11th, 2009 | 04:09 pm
Link | Leave a comment | Add to Memories | Tell a Friend
Assumptions, Drizzle
Oct. 22nd, 2008 | 11:00 am
What is the future of Drizzle? What sort of assumptions are you making?
Hardware
On the hardware front I get a lot of distance saying "the future is 64bit, multi-core, and runs on SSD". This is a pretty shallow answer, and is pretty obvious to most everyone. It suits a sound bite but it is not really that revolutionary of a thought. To me the real question is "how do we use them".
64bit means you have to change the way you code. Memory is now flat for the foreseeable future. Never focus on how to map around 32bit issues and always assume you have a large, flat, memory space available. Spend zero time thinking about 32bit.
If you are thinking "multi-core" then think about it massively. Right now adoption is at the 16 core point, which means that if you are developing software today, you need to be thinking about multiples of 16. I keep asking myself "how will this work with 256 cores". Yesterday someone came to me with a solution to a feature we have removed in drizzle. "Look we removed all the locks!". Problem was? The developer had used a compare and swap, CAS, operation to solve the problem. Here is the thing, CAS does not scale with this number of cores/chips that will be in machines. The good thing is the engineer got this, and has a new design :) We won't adopt short term solutions that just kneecap us in the near future.
SSD is here, but it is not here in the sizes needed. What I expect us to do is make use of SSD as a secondary cache, and not look at it as the primary at rest storage. I see a lot of databases sitting in the 20gig to 100gig range. The Library of Congress is 26 terabytes. I expect more scale up so systems will be growing faster in size. SSD is the new hard drive, and fixed disks are tape.
The piece that I have commented least on is the nature of our micro-kernel. We can push pieces of our design out to other nodes. I do not assume Drizzle will live on a single machine. Network speed keeps going up, and we need to be able to tier the database out across multiple computers.
One final thought about Hardware, we need 128bit ints. IPV6, UUID, etc, all of these types mean that we need single instruction operator for 16byte types.
Community Development
Today 2/3 of our development comes from outside of the developers Sun pays to work on Drizzle. Even if we add more developers, I expect our total percentage to decrease and not increase. I believe we will see forks and that we have to find ways to help people maintain their forks. One very central piece of what we have to do is move code to the Edge, aka plugins. Thinking about the Edge, has to be a share value.
I see forks as a positive development, they show potential ways we can evolve. Not all evolutionary paths are successful, but it makes us stronger to see where they go. I expect long term for groups to make distributions around Drizzle, I don't know that we will ever do that.
Code drives decisions, and those who provide developers drive those decisions.
While I started out focusing Drizzle on web technologies, we are seeing groups showing up to reuse our kernel in data warehousing and handsets (which is something I never predicted). By keeping the core small we invite groups to use us as a piece to build around.
Drizzle is not all about my vision, it is about where the collective vision takes us.
Directions in Database Technology
Map/Reduce will kill every traditional data warehousing vendor in the market. Those who adapt to it as a design/deployment pattern will survive, the rest won't. Database systems that have no concept of being multiple node are pretty much dead. If there is no scale out story, then there is not future going forward.
The way we store data will continue to evolve and diversify. Compression has gotten cheap and processor time has become massive. Column stores will continue to evolve, but they are not a "solves everything" sort of solution. One of the gambles we continue to make is to allow for storage via multiple methods (we refer to this as engines). We will be adding a column store in the near the future, it is an import piece for us to have. Multiple engines cost us in code complexity, but we continue to see value in it. We though will raise the bar on engine design in order to force the complexity of this down to the engine (which will give us online capabilities).
Stored procedures are the dodos for database technology. The languages vendors have designed are limited. On the same token though, putting processing near the data is key to performance for many applications. We need a new model badly, and this model will be a pushdown from two different directions. One direction is obvious, map/reduce, the other direction is the asynchronous queues we see in most web shops. There is little talk about this right now in the blogosphere, but there is a movement toward queueing systems. Queueing systems are a very popular topic in the hallway tracks of conferences.
Databases need to learn how to live in the cloud. We cannot have databases be silos of authentication, processing, and expect only to provide data. We must make our data dictionaries available in the cloud, we need to take our authentication from the cloud, etc...
We need to live in the cloud.
On the hardware front I get a lot of distance saying "the future is 64bit, multi-core, and runs on SSD". This is a pretty shallow answer, and is pretty obvious to most everyone. It suits a sound bite but it is not really that revolutionary of a thought. To me the real question is "how do we use them".
64bit means you have to change the way you code. Memory is now flat for the foreseeable future. Never focus on how to map around 32bit issues and always assume you have a large, flat, memory space available. Spend zero time thinking about 32bit.
If you are thinking "multi-core" then think about it massively. Right now adoption is at the 16 core point, which means that if you are developing software today, you need to be thinking about multiples of 16. I keep asking myself "how will this work with 256 cores". Yesterday someone came to me with a solution to a feature we have removed in drizzle. "Look we removed all the locks!". Problem was? The developer had used a compare and swap, CAS, operation to solve the problem. Here is the thing, CAS does not scale with this number of cores/chips that will be in machines. The good thing is the engineer got this, and has a new design :) We won't adopt short term solutions that just kneecap us in the near future.
SSD is here, but it is not here in the sizes needed. What I expect us to do is make use of SSD as a secondary cache, and not look at it as the primary at rest storage. I see a lot of databases sitting in the 20gig to 100gig range. The Library of Congress is 26 terabytes. I expect more scale up so systems will be growing faster in size. SSD is the new hard drive, and fixed disks are tape.
The piece that I have commented least on is the nature of our micro-kernel. We can push pieces of our design out to other nodes. I do not assume Drizzle will live on a single machine. Network speed keeps going up, and we need to be able to tier the database out across multiple computers.
One final thought about Hardware, we need 128bit ints. IPV6, UUID, etc, all of these types mean that we need single instruction operator for 16byte types.
Today 2/3 of our development comes from outside of the developers Sun pays to work on Drizzle. Even if we add more developers, I expect our total percentage to decrease and not increase. I believe we will see forks and that we have to find ways to help people maintain their forks. One very central piece of what we have to do is move code to the Edge, aka plugins. Thinking about the Edge, has to be a share value.
I see forks as a positive development, they show potential ways we can evolve. Not all evolutionary paths are successful, but it makes us stronger to see where they go. I expect long term for groups to make distributions around Drizzle, I don't know that we will ever do that.
Code drives decisions, and those who provide developers drive those decisions.
While I started out focusing Drizzle on web technologies, we are seeing groups showing up to reuse our kernel in data warehousing and handsets (which is something I never predicted). By keeping the core small we invite groups to use us as a piece to build around.
Drizzle is not all about my vision, it is about where the collective vision takes us.
Map/Reduce will kill every traditional data warehousing vendor in the market. Those who adapt to it as a design/deployment pattern will survive, the rest won't. Database systems that have no concept of being multiple node are pretty much dead. If there is no scale out story, then there is not future going forward.
The way we store data will continue to evolve and diversify. Compression has gotten cheap and processor time has become massive. Column stores will continue to evolve, but they are not a "solves everything" sort of solution. One of the gambles we continue to make is to allow for storage via multiple methods (we refer to this as engines). We will be adding a column store in the near the future, it is an import piece for us to have. Multiple engines cost us in code complexity, but we continue to see value in it. We though will raise the bar on engine design in order to force the complexity of this down to the engine (which will give us online capabilities).
Stored procedures are the dodos for database technology. The languages vendors have designed are limited. On the same token though, putting processing near the data is key to performance for many applications. We need a new model badly, and this model will be a pushdown from two different directions. One direction is obvious, map/reduce, the other direction is the asynchronous queues we see in most web shops. There is little talk about this right now in the blogosphere, but there is a movement toward queueing systems. Queueing systems are a very popular topic in the hallway tracks of conferences.
Databases need to learn how to live in the cloud. We cannot have databases be silos of authentication, processing, and expect only to provide data. We must make our data dictionaries available in the cloud, we need to take our authentication from the cloud, etc...
We need to live in the cloud.
Link | Leave a comment {8} | Add to Memories | Tell a Friend
Engines, On the State of
Oct. 13th, 2008 | 09:31 am
So many engines, and so little to choose from. This is one of our two major decision points in Drizzle right now.
Let me explain.
Today we have Innodb, Maria, Falcon, and PBXT.
Simple?
Not really. Innodb is not a single engine, it is three engines. We have the default one which is shipped. It has been the wunderkinder for years now but has been showing its age. Go buy a piece of hardware that has four cores and it quickly becomes apparent that it is not aging well. There is the Innodb plugin, and while it delivers on features, performance still evades it. Both are works of the Innodb team at Oracle. The development style for Innodb has never been open, but they have always consistently delivered. Right now though? This delivery seems to be slowing. Since they do not function in an open model it is very hard to work with them. This means we have to shoulder most of the work, though the Innodb team has been responsive to questions.
We have the Innodb produced by Google. It is of the standard design, but has been modified with performance patches. These are widely believed, and often show, performance increases on hardware above four cores. The issues around this engine are more about maintenance. Google is happy to drop its patches out the door, but shows no sign of wanting to bundle these into a release. This makes perfect sense, they aren't in the business of releasing databases. The Google developers are doing a good job of getting their patches out in chunks and seem genuinely interested in getting them into trees (though they themselves do not do this work). They are not though a committed team, they are group focused inwards who get open source enough to understand that publishing their patches is a good thing. There may be an answer in looking at Percona's builds, this is an unexplored option at this point. They have been doing releases with the Google Innodb code. Their development model is not open. They do have an outward facing view of the world though since they work as consultants.
Maria continues to move along, but it is not transactional at this point. This makes it a non-starter. When they get it working, then it gets a ticket to the ballpark. It also hooks in deeper to the server then any of the other engines (aka bypasses the engine interface). It relies on the mysys library that MySQL ships. This makes it for us more difficult to work with, though all problems are solve-able. It is not being developed at a very quick pace.
Falcon has been released in the Alpha 6.0 MySQL tree. It though is alpha and has not shown to perform well in general against Innodb. It keeps going through design changes so it is not really a contender for use at this point. On the plus side for me it keeps to itself and the code is distributed as a complete library. Which means if we did integrate it into Drizzle it would be relatively simple. It has an active development team. To this date though we have not worked with them at all.
PBXT has shown over time steady improvement. It is hard for me to gauge at this point where it is in its development cycle. We have just pulled it into Drizzle recently and we know it fails some of our tests (keep in mind, the test system is only designed to test MyISAM, we have found bugs galore in shifting to Innodb as the default engine). Right now its design lends more to performance around indexes. Scans are still a performance bottleneck. This might be fine in our world, since for the web you typically only read from indexes. It does require row based replication and this is at issue in the server at general (someday soon there will be a long blog post by me on the sorry state of replication). Paul, the main developer, has been very active though and this wins big kudos from me personally. I would rather work with active developers and help them fix their work, and skip working with folks who are not so active.
So this is the state of it. I have a few other random thoughts, but at the moment I am left with the question of "what to do in the future". We have had a few attempts at merges from the different Innodb trees, but so far none of these have been completed. PBXT is moving along well and we have begun to take patches from Paul to help him, and us, with testing. A couple of the Falcon folks have approached me about getting a tree working with their engine, but nothing has come of that. If the Maria team can kick out a better MyISAM I am open to replacing ours, though this is not a priority.
Paul's recent changes make it much easier for us to maintain an active PBXT tree and Innodb tree.
So what is the future?
I am not sure at this juncture. We will continue down the path of trees for PBXT and Innodb. Those are the contenders at this point and no matter the performance issues with Innodb, it is prudent to keep it around because of its stability.
Next year though? I am not sure.
Next year is coming quickly though.
Let me explain.
Today we have Innodb, Maria, Falcon, and PBXT.
Simple?
Not really. Innodb is not a single engine, it is three engines. We have the default one which is shipped. It has been the wunderkinder for years now but has been showing its age. Go buy a piece of hardware that has four cores and it quickly becomes apparent that it is not aging well. There is the Innodb plugin, and while it delivers on features, performance still evades it. Both are works of the Innodb team at Oracle. The development style for Innodb has never been open, but they have always consistently delivered. Right now though? This delivery seems to be slowing. Since they do not function in an open model it is very hard to work with them. This means we have to shoulder most of the work, though the Innodb team has been responsive to questions.
We have the Innodb produced by Google. It is of the standard design, but has been modified with performance patches. These are widely believed, and often show, performance increases on hardware above four cores. The issues around this engine are more about maintenance. Google is happy to drop its patches out the door, but shows no sign of wanting to bundle these into a release. This makes perfect sense, they aren't in the business of releasing databases. The Google developers are doing a good job of getting their patches out in chunks and seem genuinely interested in getting them into trees (though they themselves do not do this work). They are not though a committed team, they are group focused inwards who get open source enough to understand that publishing their patches is a good thing. There may be an answer in looking at Percona's builds, this is an unexplored option at this point. They have been doing releases with the Google Innodb code. Their development model is not open. They do have an outward facing view of the world though since they work as consultants.
Maria continues to move along, but it is not transactional at this point. This makes it a non-starter. When they get it working, then it gets a ticket to the ballpark. It also hooks in deeper to the server then any of the other engines (aka bypasses the engine interface). It relies on the mysys library that MySQL ships. This makes it for us more difficult to work with, though all problems are solve-able. It is not being developed at a very quick pace.
Falcon has been released in the Alpha 6.0 MySQL tree. It though is alpha and has not shown to perform well in general against Innodb. It keeps going through design changes so it is not really a contender for use at this point. On the plus side for me it keeps to itself and the code is distributed as a complete library. Which means if we did integrate it into Drizzle it would be relatively simple. It has an active development team. To this date though we have not worked with them at all.
PBXT has shown over time steady improvement. It is hard for me to gauge at this point where it is in its development cycle. We have just pulled it into Drizzle recently and we know it fails some of our tests (keep in mind, the test system is only designed to test MyISAM, we have found bugs galore in shifting to Innodb as the default engine). Right now its design lends more to performance around indexes. Scans are still a performance bottleneck. This might be fine in our world, since for the web you typically only read from indexes. It does require row based replication and this is at issue in the server at general (someday soon there will be a long blog post by me on the sorry state of replication). Paul, the main developer, has been very active though and this wins big kudos from me personally. I would rather work with active developers and help them fix their work, and skip working with folks who are not so active.
So this is the state of it. I have a few other random thoughts, but at the moment I am left with the question of "what to do in the future". We have had a few attempts at merges from the different Innodb trees, but so far none of these have been completed. PBXT is moving along well and we have begun to take patches from Paul to help him, and us, with testing. A couple of the Falcon folks have approached me about getting a tree working with their engine, but nothing has come of that. If the Maria team can kick out a better MyISAM I am open to replacing ours, though this is not a priority.
Paul's recent changes make it much easier for us to maintain an active PBXT tree and Innodb tree.
So what is the future?
I am not sure at this juncture. We will continue down the path of trees for PBXT and Innodb. Those are the contenders at this point and no matter the performance issues with Innodb, it is prudent to keep it around because of its stability.
Next year though? I am not sure.
Next year is coming quickly though.
Link | Leave a comment {6} | Add to Memories | Tell a Friend
Drizzle talk from MySQL Developer's Conference
Sep. 22nd, 2008 | 07:47 pm
Drizzle Talk
View SlideShare presentation
Drizzle talk for MySQL Developer's Meeting.
SlideShare Link
Link | Leave a comment {2} | Add to Memories | Tell a Friend
Just goes to show...
Jun. 11th, 2008 | 04:06 pm
Link | Leave a comment {3} | Add to Memories | Tell a Friend
CrippleWare, World of Open Source
May. 5th, 2008 | 12:03 pm
Ever since I did my original post on Crippleware I have been getting a lot of feedback from individuals about how the intersection of open source works with closed source extensions.
Open Source that is not crippleware but allows for third party extensions allows for the following:
Open and documented APIs with stable interfaces.
The ability to compile or load the software without "secret sauces".
The consumer right to always have access to the data they have entered.
The first two really deal with the issue of whether or not the vendor has created a "level playing field". Third party vendors who write modules expect an even handedness when dealing with APIs.
This means that there are no special tools required that cannot be obtained by a third party. No special, aka undocumented, interfaces that modify the API or service interfaces that the modules need to make use of.
No third party vendor has a right to create a closed source module, the GPL by its nature creates a cost that the third party must open source.
Quid Pro Quo works in favor of open source. If you write an extension to an open source project you play by the rules of the project's license. This means a closed source module should expect, or at least assume, to have to pay for the right to link. With exceptions to this being granted by the project. Open source adherence modules should expect that they are free to distribute their work, but realize the vendor of the original project may also distribute the module as well.
In the world of the BSD license anyone is free to extend and distribute. There is nothing inherent in the concept of open source that does not allow for proprietary extensions, it is the nature of the open source license chosen for the project which defines what is acceptable for the licensing of third party modules. Behavior which creates open source software which is crippleware comes from the author and their intent.
What do both closed source and open source modules share in common? A right to an Open API that has some level of definition and stability.
If the vendor changes the interface without notice with deliberate intent toward third party modules this is a behavior of crippleware.
Any project should be communicating API changes, this is a part of being a good steward of an organic open source model.
As an example of deliberate intent would be if a vendor created a substandard behavior via the open service API and chooses to hold back a more competitive interface for themselves. This is a behavior of crippleware.
Interfaces must be open and accessible in order to avoid being crippleware.
A consumer should expect to always be able to extract their data from an open source project in a common manner. Whether this is by printing or exporting, data portability is at the essence of the freedom open source is to provide.
Telling the user to "write it themselves" is unacceptable behavior. Users have a right to data portability, and I personally think this goes beyond the question of open source.
In an open source world you will not win a ribbon from anyone in the community for merely having an open source license. If you cripple the very nature of open, you should not be surprised when the community is not impressed.
At the end of the day it is about having appropriate table manners.
Open Source that is not crippleware but allows for third party extensions allows for the following:
The first two really deal with the issue of whether or not the vendor has created a "level playing field". Third party vendors who write modules expect an even handedness when dealing with APIs.
This means that there are no special tools required that cannot be obtained by a third party. No special, aka undocumented, interfaces that modify the API or service interfaces that the modules need to make use of.
No third party vendor has a right to create a closed source module, the GPL by its nature creates a cost that the third party must open source.
Quid Pro Quo works in favor of open source. If you write an extension to an open source project you play by the rules of the project's license. This means a closed source module should expect, or at least assume, to have to pay for the right to link. With exceptions to this being granted by the project. Open source adherence modules should expect that they are free to distribute their work, but realize the vendor of the original project may also distribute the module as well.
In the world of the BSD license anyone is free to extend and distribute. There is nothing inherent in the concept of open source that does not allow for proprietary extensions, it is the nature of the open source license chosen for the project which defines what is acceptable for the licensing of third party modules. Behavior which creates open source software which is crippleware comes from the author and their intent.
What do both closed source and open source modules share in common? A right to an Open API that has some level of definition and stability.
If the vendor changes the interface without notice with deliberate intent toward third party modules this is a behavior of crippleware.
Any project should be communicating API changes, this is a part of being a good steward of an organic open source model.
As an example of deliberate intent would be if a vendor created a substandard behavior via the open service API and chooses to hold back a more competitive interface for themselves. This is a behavior of crippleware.
Interfaces must be open and accessible in order to avoid being crippleware.
A consumer should expect to always be able to extract their data from an open source project in a common manner. Whether this is by printing or exporting, data portability is at the essence of the freedom open source is to provide.
Telling the user to "write it themselves" is unacceptable behavior. Users have a right to data portability, and I personally think this goes beyond the question of open source.
In an open source world you will not win a ribbon from anyone in the community for merely having an open source license. If you cripple the very nature of open, you should not be surprised when the community is not impressed.
At the end of the day it is about having appropriate table manners.
Link | Leave a comment {8} | Add to Memories | Tell a Friend
The Case for the Relational Database
Jul. 18th, 2007 | 04:25 pm
My framework of choice at the moment is called "everything". Everything has very few users in the world, but it holds to a certain number of
design characteristics that I like.
It deals with entities as objects
Objects are inheritable
Easy to hack
Revision control on objects
If I were a Java developer, which I have not been in more then a
decade, I would use Hibernate. Hibernate has many of the design characteristics that I list. Similar frameworks exist for PHP and Python.
Why do I like the approach of using objects? Objects are entities that I can serialize and store in caches as a single discrete item. Object caches are common in web architectures now(aka Memcached and other similar creatures). Object store works well, and it is constantly improving.
Many web infrastructures can be run out of object store systems, but not all.
Many, and I would go as far as to say most, web applications require search. Some search like capabilities can be done using Lucene or other fulltext search engines. Searches from these systems are fuzzy. Google, which uses a full text search, ranks and filters based on the loose structure of objects. It does this well.
Systems like Google and Lucene can frequently be "good enough" for simple problems.
However full text approaches cannot give you the exact number of objects that you have with a specific attribute with assurance.
Needing this assurance is a requirement that requires an accurate search. Frequently this requirement reflects the need for a more analytic approach to finding results, or a need to find ranges of results.
A search for an exact result in complex objects requires a relational database. This is an accurate search, not a fuzzy search.
The requirement for a relational database is grounded in the need for an accurate results.
Knowing that you need an accurate result leaves with the question of how you will make use of a relational database.
Fundamentally there are two approaches to object store and search:
Serialize all aspects of the object into a relational database.
Store objects serialized into a container and pluck out components for search.
I go with the first approach. Relational databases have a long history of handling durable store. Adding a new quick search means adding an index (in object stores you lack indices, so you can conceptually consider a non-indexed relational query and an object store search as being equivalent). Once you have an index, you have a speedy vector to search on.
The relational nature of storing objects in a database means that you can look at partial stores of objects. You can even de-normalize tables for further refinement.
For instance, if I have a "user", I can look at a subset of the attributes of "user" to find, for instance, males who like climbing, and who live in the city of Seattle. By keeping all elements de-serialized in a relational database, I never have to worry about whether an attribute is in the database or not.
If I use an object store, then I am required to map out my attribute storage ahead of time. There is no flexibility in this approach, since I need to know going into the design which attributes will need to be searchable.
There are advantages to leading with object stores.
Object stores have none of the overhead of having to rebuild themselves from many tables (or even attributes from a single table). This is a big win, and it’s required if your web site/application in usage (everyone wants to be Google right?). Object stores excel at speedy retrieval of single objects (or non-grouped objects).
The approach I lead with makes good use of the nature of object stores without harming the flexibility you gain by using a relational database. In this approach you cache objects in object stores, but maintain durability and accurate search in a relational database. Having durability be maintained in your database means that you can boost the object store's performance design by not having to be concerned about this issue in your object storage.
The reverse of this approach would mean that you could decide to not make your databases durable. Choosing to not have your database be durable can be expensive. Rebuilding your database is considerably more expensive, then generating new objects from it.
The big win for me is flexibility. Having access to all attributes in the relational store at my finger tips is why I stick with relational databases. That flexibility is something I require because it saves me time. Also, as tools go, SQL has been around for a while. I am not in love with SQL, but it is very flexible, and it is well understood. Tools for SQL are plentiful and easy to build.
So when I am asked "is there a case for getting rid of relational
databases?", I answer no.
Web architecture cannot be done in a way that leaves you with a flexible environment without using a relational database.
design characteristics that I like.
If I were a Java developer, which I have not been in more then a
decade, I would use Hibernate. Hibernate has many of the design characteristics that I list. Similar frameworks exist for PHP and Python.
Why do I like the approach of using objects? Objects are entities that I can serialize and store in caches as a single discrete item. Object caches are common in web architectures now(aka Memcached and other similar creatures). Object store works well, and it is constantly improving.
Many web infrastructures can be run out of object store systems, but not all.
Many, and I would go as far as to say most, web applications require search. Some search like capabilities can be done using Lucene or other fulltext search engines. Searches from these systems are fuzzy. Google, which uses a full text search, ranks and filters based on the loose structure of objects. It does this well.
Systems like Google and Lucene can frequently be "good enough" for simple problems.
However full text approaches cannot give you the exact number of objects that you have with a specific attribute with assurance.
Needing this assurance is a requirement that requires an accurate search. Frequently this requirement reflects the need for a more analytic approach to finding results, or a need to find ranges of results.
A search for an exact result in complex objects requires a relational database. This is an accurate search, not a fuzzy search.
The requirement for a relational database is grounded in the need for an accurate results.
Knowing that you need an accurate result leaves with the question of how you will make use of a relational database.
Fundamentally there are two approaches to object store and search:
I go with the first approach. Relational databases have a long history of handling durable store. Adding a new quick search means adding an index (in object stores you lack indices, so you can conceptually consider a non-indexed relational query and an object store search as being equivalent). Once you have an index, you have a speedy vector to search on.
The relational nature of storing objects in a database means that you can look at partial stores of objects. You can even de-normalize tables for further refinement.
For instance, if I have a "user", I can look at a subset of the attributes of "user" to find, for instance, males who like climbing, and who live in the city of Seattle. By keeping all elements de-serialized in a relational database, I never have to worry about whether an attribute is in the database or not.
If I use an object store, then I am required to map out my attribute storage ahead of time. There is no flexibility in this approach, since I need to know going into the design which attributes will need to be searchable.
There are advantages to leading with object stores.
Object stores have none of the overhead of having to rebuild themselves from many tables (or even attributes from a single table). This is a big win, and it’s required if your web site/application in usage (everyone wants to be Google right?). Object stores excel at speedy retrieval of single objects (or non-grouped objects).
The approach I lead with makes good use of the nature of object stores without harming the flexibility you gain by using a relational database. In this approach you cache objects in object stores, but maintain durability and accurate search in a relational database. Having durability be maintained in your database means that you can boost the object store's performance design by not having to be concerned about this issue in your object storage.
The reverse of this approach would mean that you could decide to not make your databases durable. Choosing to not have your database be durable can be expensive. Rebuilding your database is considerably more expensive, then generating new objects from it.
The big win for me is flexibility. Having access to all attributes in the relational store at my finger tips is why I stick with relational databases. That flexibility is something I require because it saves me time. Also, as tools go, SQL has been around for a while. I am not in love with SQL, but it is very flexible, and it is well understood. Tools for SQL are plentiful and easy to build.
So when I am asked "is there a case for getting rid of relational
databases?", I answer no.
Web architecture cannot be done in a way that leaves you with a flexible environment without using a relational database.
Link | Leave a comment | Add to Memories | Tell a Friend
Bridge Building... You have to pay for it somehow...
May. 22nd, 2007 | 12:26 am
"That is an awfully nice bridge you got there, be a pity if something happened to it..."
So Business as Usual means protection money.
"This IP bridge enables Open Source developers to develop software free from concerns about patents."
"Our IP bridge makes lawsuits unnecessary."
Both of those statements just give me a cold feeling. Its like listening to some gangster video, where the local hood has decided to make a business of asking for protection money from the new immigrants who just opened a Kwiki Mart. Bill Hilf talks about lawsuits being unnecessary, of course they are. The local hood never wants their business practices scrutinized, they operate in a cloak of uncertainty.
When did "Business as Usual" in engineering become haggling over nickel and dime changes that came from the synergy of human invention?
This sounds a lot more like lawyering to me.
Lawyering has never been "Business as Usual" to me, but then I am an engineer.
So Business as Usual means protection money.
"This IP bridge enables Open Source developers to develop software free from concerns about patents."
"Our IP bridge makes lawsuits unnecessary."
Both of those statements just give me a cold feeling. Its like listening to some gangster video, where the local hood has decided to make a business of asking for protection money from the new immigrants who just opened a Kwiki Mart. Bill Hilf talks about lawsuits being unnecessary, of course they are. The local hood never wants their business practices scrutinized, they operate in a cloak of uncertainty.
When did "Business as Usual" in engineering become haggling over nickel and dime changes that came from the synergy of human invention?
This sounds a lot more like lawyering to me.
Lawyering has never been "Business as Usual" to me, but then I am an engineer.
Link | Leave a comment {2} | Add to Memories | Tell a Friend
Building a Storage Engine: Reading Data
May. 3rd, 2007 | 01:00 pm
Now that you can load an engine, we are going to look at reading data. For this we will need to implement three methods. We will also need the following schema:
Storage engines provide "handler" objects that are used to read/write/update tables. They inherit from the handler class defined in sql/handler.h.
The file ha_skeleton.cc holds the implementation of the handler for the Skeleton engine. Once a handler object is created it is cached and can be used for different tables that a particular storage engine controls. MySQL uses the open() and close() methods of handler object to tell the handler what table it should currently work with.
open() and close() are not called for each usage of a table, or each usage of a transaction.
For reading data we are going to skip worrying about open() and close() and just concentrate on reading data. For this we will implement just the rnd_init(), rnd_next, and rnd_end() methods. These are used to handle table scans.
For rnd_init() we are going to open up /etc/services and read from it:
The my_fopen() comes from mysys, which is the MySQL portable runtime library (think APR, but for MySQL). For a complete description of it, you can look in the mysys/ directory of any MySQL distribution (there is an ongoing effort to add Doxygen to the server's source code so hopefully these will get documented in detail in the future). In the example above we use my_fopen() to open the /etc/services file. We also set line_number to equal zero. We will increment it with each read.
The variables file_stream and line_number are private variables that are declared in ha_skeleton.h.
Now we read our data:
By doing an fgets() we will grab one line of the file at a time. We then take that line and store it in a Field object.
Field objects represent the columns in the database. What we do in the example is walk through the array of Field objects. The array of Field objects are stored in the order of creation. In our case we are just looking at the first and second object. We store data via the store() method on the field object. We also make sure the object is set not null (and yes... this is odd and will be fixed... it would make a better interface if when you called store() you just set the notnull method directly).
DBUG_ENTER() and DBUG_RETURN() are used in the creation of debug trace files, and are only enabled in debug builds. The my_bitmap_map can be used to determine if particular fields are needed to be accessed. The main purpose for this is to allow engines to not return costly attributes, like blobs, if they are not needed.
The rnd_next() method will be called until we have exhausted the lines in the files. At that point we will return HA_ERR_END_OF_FILE to signal that we have no more rows to return.
Then we will close the file in rnd_end():
While it is required to implement rnd_init() and rnd_next(), it is not required to implement rnd_end(). In our case we need to close the file we opened, so we implement it.
Finally to be safe we are going to throw an error during a create() method call if more then two fields were declared:
Notice that we are using table_arg. It contains the definition of the table as we created it. The variable "name" holds the name of the tables (or an encoded version depending on the type of engine that was declared (more on this later)). "create_info" will contain the arguments from the "CREATE TABLE (...)" that are MySQL extensions.
At this point you should know how to open a file and return its contents via a scan on the table.
You can find the example source code to play with here under "Chapter 2":
http://hg.tangent.org/writing_engines_fo r_mysql
The previous entries in this series:
Getting The Skeleton to Compile
CREATE TABLE `services` (
`a` varchar(125) NOT NULL DEFAULT '',
`b` text
) ENGINE=SKELETON DEFAULT CHARSET=latin1
Storage engines provide "handler" objects that are used to read/write/update tables. They inherit from the handler class defined in sql/handler.h.
The file ha_skeleton.cc holds the implementation of the handler for the Skeleton engine. Once a handler object is created it is cached and can be used for different tables that a particular storage engine controls. MySQL uses the open() and close() methods of handler object to tell the handler what table it should currently work with.
open() and close() are not called for each usage of a table, or each usage of a transaction.
For reading data we are going to skip worrying about open() and close() and just concentrate on reading data. For this we will implement just the rnd_init(), rnd_next, and rnd_end() methods. These are used to handle table scans.
For rnd_init() we are going to open up /etc/services and read from it:
int ha_skeleton::rnd_init(bool scan)
{
DBUG_ENTER("ha_skeleton::rnd_init");
if (!(file_stream= my_fopen("/etc/services", O_RDONLY, MYF(0))))
{
DBUG_RETURN(-1);
}
line_number= 0;
DBUG_RETURN(0);
}
The my_fopen() comes from mysys, which is the MySQL portable runtime library (think APR, but for MySQL). For a complete description of it, you can look in the mysys/ directory of any MySQL distribution (there is an ongoing effort to add Doxygen to the server's source code so hopefully these will get documented in detail in the future). In the example above we use my_fopen() to open the /etc/services file. We also set line_number to equal zero. We will increment it with each read.
The variables file_stream and line_number are private variables that are declared in ha_skeleton.h.
Now we read our data:
int ha_skeleton::rnd_next(byte *buf)
{
char buffer[2048];
Field **field=table->field;
my_bitmap_map *org_bitmap= dbug_tmp_use_all_columns(table, table- >read_set);
DBUG_ENTER("ha_skeleton::rnd_next");
if (!(fgets(buffer, 2048, file_stream)))
DBUG_RETURN(HA_ERR_END_OF_FILE);
line_number++;
(*field)->store(line_number);
(*field)->set_notnull();
field++;
(*field)->store(buffer, strlen(buffer), system_charset_info);
(*field)->set_notnull();
dbug_tmp_restore_column_map(table->read_set, org_bitmap);
DBUG_RETURN(0);
}
By doing an fgets() we will grab one line of the file at a time. We then take that line and store it in a Field object.
Field objects represent the columns in the database. What we do in the example is walk through the array of Field objects. The array of Field objects are stored in the order of creation. In our case we are just looking at the first and second object. We store data via the store() method on the field object. We also make sure the object is set not null (and yes... this is odd and will be fixed... it would make a better interface if when you called store() you just set the notnull method directly).
DBUG_ENTER() and DBUG_RETURN() are used in the creation of debug trace files, and are only enabled in debug builds. The my_bitmap_map can be used to determine if particular fields are needed to be accessed. The main purpose for this is to allow engines to not return costly attributes, like blobs, if they are not needed.
The rnd_next() method will be called until we have exhausted the lines in the files. At that point we will return HA_ERR_END_OF_FILE to signal that we have no more rows to return.
Then we will close the file in rnd_end():
int ha_skeleton::rnd_end()
{
DBUG_ENTER("ha_skeleton::rnd_end");
my_fclose(file_stream, MYF(0));
DBUG_RETURN(0);
}
While it is required to implement rnd_init() and rnd_next(), it is not required to implement rnd_end(). In our case we need to close the file we opened, so we implement it.
Finally to be safe we are going to throw an error during a create() method call if more then two fields were declared:
int ha_skeleton::create(const char *name, TABLE *table_arg,
HA_CREATE_INFO *create_info)
{
DBUG_ENTER("ha_skeleton::create");
if (table_arg->s->fields != 2)
DBUG_RETURN(1);
DBUG_RETURN(0);
}
Notice that we are using table_arg. It contains the definition of the table as we created it. The variable "name" holds the name of the tables (or an encoded version depending on the type of engine that was declared (more on this later)). "create_info" will contain the arguments from the "CREATE TABLE (...)" that are MySQL extensions.
At this point you should know how to open a file and return its contents via a scan on the table.
You can find the example source code to play with here under "Chapter 2":
http://hg.tangent.org/writing_engines_fo
The previous entries in this series:
Getting The Skeleton to Compile
Link | Leave a comment {1} | Add to Memories | Tell a Friend
Slides from PHP Vancouver
Feb. 16th, 2007 | 04:00 pm
From the keynote presentation this past Tuesday:
http://krow.net/talks/ScalingVancouver2 007.pdf
I have an audio recording of the talk as well but it is in poor shape. If I can clean it up I will post it as well.
http://krow.net/talks/ScalingVancouver2
I have an audio recording of the talk as well but it is in poor shape. If I can clean it up I will post it as well.
Link | Leave a comment | Add to Memories | Tell a Friend
Open Tables, why should I pay attention?
Feb. 8th, 2007 | 01:43 am
In discussion the other day a co-worker was looking at a benchmark
that I had created for a problem I was studying on threads with write
performance.
He asked "What is open tables set at?"
I responded that I had it set to a thousand, which is normal the
value I keep on the machine I do development on. I normally look at
300 connections as an average load, but every so often I crank it up
to around a 1000 depending on the test (which is a pretty high
simulation, and not common among the average large websites).
It crossed my mind at this point that I had no idea how the database
behaved with default value in MySQL, which is 64, when you were
running with this many threads. Anyone who has a large number of
threads touching tables knows to crank up the table cache.
As the graph points out, depending on which engine you are dealing
with, it can make quite a bit of difference :)
The my.cnf setting for this looks like this:
set-variable = table_open_cache=1000

that I had created for a problem I was studying on threads with write
performance.
He asked "What is open tables set at?"
I responded that I had it set to a thousand, which is normal the
value I keep on the machine I do development on. I normally look at
300 connections as an average load, but every so often I crank it up
to around a 1000 depending on the test (which is a pretty high
simulation, and not common among the average large websites).
It crossed my mind at this point that I had no idea how the database
behaved with default value in MySQL, which is 64, when you were
running with this many threads. Anyone who has a large number of
threads touching tables knows to crank up the table cache.
As the graph points out, depending on which engine you are dealing
with, it can make quite a bit of difference :)
The my.cnf setting for this looks like this:
set-variable = table_open_cache=1000
Link | Leave a comment | Add to Memories | Tell a Friend
Friday, Finding a bug...
Jan. 12th, 2007 | 05:44 pm
You know, sometimes there is nothing like finding the bug you have been hunting for, for a couple of days, on a Friday at 5:40 PM.
The optimize() call in Archive didn't have the right table buffer after someone updated the call to field->offset(). Tricky little thing... heisenbug (and no this is not in any release of 5.1 as far as I can tell, recent change).
Ok, technically this is a shroedinbug.
The optimize() call in Archive didn't have the right table buffer after someone updated the call to field->offset(). Tricky little thing... heisenbug (and no this is not in any release of 5.1 as far as I can tell, recent change).
Ok, technically this is a shroedinbug.
Link | Leave a comment {3} | Add to Memories | Tell a Friend
Dentist Visit, Thank you Sybase
Jan. 10th, 2007 | 01:01 pm
Went to the dentist today. No cavities which has been the case for
two decades. I do need to have one part of my gums repaired. The
dentist was trying to take a picture of the gum so that I can see the
required work.
And what appears on the screen?
A windows warning complaining about Sybase crashing via the ODBC driver.
The dentist laughed when he saw my face and asked me if I knew
anything about computers :)

two decades. I do need to have one part of my gums repaired. The
dentist was trying to take a picture of the gum so that I can see the
required work.
And what appears on the screen?
A windows warning complaining about Sybase crashing via the ODBC driver.
The dentist laughed when he saw my face and asked me if I knew
anything about computers :)
Link | Leave a comment {4} | Add to Memories | Tell a Friend
Dear Lazy Web, NAS devices
Nov. 15th, 2006 | 08:05 am
location: Alohahaus, Seattle
Dear Lazy Web,
In my current list of plans is to try out Zmanda's backup solution. The technology all looks good but I want to try it out for myself.
My problem?
Disk space. I am rather low on it at the moment, and to really test things out I want to backup several databases and see how it really works. My Linux boxes at this point and time are all 1U machines, so adding disks is a bit problematic. I keep reading though about some of the ~$100 NAS devices that have come out. I am thinking that I want to buy one of these and put a disk in it. 500gigs minimal should work.
Any recommendations?
From Engadget I found these:
http://www.engadget.com/tag/nas
I want something that will do NFS. I am assuming that NFS is the easiest protocol for me to deal with. I can put the box on a 1gig ethernet port (my HP switch has one 1 gig port on it), though I am suspecting the bottleneck will be more in the fact that the data will go through multiple points.
Anyone tried this? I bought a CoolMax NAS a while ago, but it died within a month.
Cheers,
-Brian
In my current list of plans is to try out Zmanda's backup solution. The technology all looks good but I want to try it out for myself.
My problem?
Disk space. I am rather low on it at the moment, and to really test things out I want to backup several databases and see how it really works. My Linux boxes at this point and time are all 1U machines, so adding disks is a bit problematic. I keep reading though about some of the ~$100 NAS devices that have come out. I am thinking that I want to buy one of these and put a disk in it. 500gigs minimal should work.
Any recommendations?
From Engadget I found these:
http://www.engadget.com/tag/nas
I want something that will do NFS. I am assuming that NFS is the easiest protocol for me to deal with. I can put the box on a 1gig ethernet port (my HP switch has one 1 gig port on it), though I am suspecting the bottleneck will be more in the fact that the data will go through multiple points.
Anyone tried this? I bought a CoolMax NAS a while ago, but it died within a month.
Cheers,
-Brian
Link | Leave a comment {2} | Add to Memories | Tell a Friend
memcache_engine, American Pie
Aug. 28th, 2006 | 01:09 am
location: Alohahaus, Seattle
Wrapped up a few final touches this evening on the memcache engine:
http://tangent.org/index.pl?lnode_id=50 6
What works:
SELECT, UPDATE, DELETE, INSERT
INSERT into foo SELECT ...
What doesn't work?
Probably ORDER BY operations.
REPLACE (I think)
IN ()
NULL
multiple memcache servers (this would be cake though to add)
table namespace, right now it treats the entire server as one big namespace
There is probably a lot more I haven't thought up that is now working. Right now the version I put up only allows you to have a primary key and one attribute. Doing multiple attributes means coming up with a way to store MySQL row format, aka UNIREG, in something which would be easy for anyone to take apart. Mark suggested XML, but XML would be damn slow to parse.
If you really want to use this, skip using the loadable interface and compile it directly in. I put some instructions in the README on how to do this. You will want to use the latest tree.
All of this was done tonight while watching the original "American Pie" and one Farscape episode that is quickly fading from memory.
Want the odd thought? This means that there is now a COBOL interface to memcached. Never thought that a COBOL interface would exist did you?
Pretty sick stuff.
http://tangent.org/index.pl?lnode_id=50
What works:
SELECT, UPDATE, DELETE, INSERT
INSERT into foo SELECT ...
What doesn't work?
Probably ORDER BY operations.
REPLACE (I think)
IN ()
NULL
multiple memcache servers (this would be cake though to add)
table namespace, right now it treats the entire server as one big namespace
There is probably a lot more I haven't thought up that is now working. Right now the version I put up only allows you to have a primary key and one attribute. Doing multiple attributes means coming up with a way to store MySQL row format, aka UNIREG, in something which would be easy for anyone to take apart. Mark suggested XML, but XML would be damn slow to parse.
If you really want to use this, skip using the loadable interface and compile it directly in. I put some instructions in the README on how to do this. You will want to use the latest tree.
All of this was done tonight while watching the original "American Pie" and one Farscape episode that is quickly fading from memory.
Want the odd thought? This means that there is now a COBOL interface to memcached. Never thought that a COBOL interface would exist did you?
Pretty sick stuff.
Link | Leave a comment {2} | Add to Memories | Tell a Friend
MySQL's Yachting Future
Aug. 17th, 2006 | 01:44 pm
location: Alohahaus, Seattle
Slate posted an article today on "The CEO Bought a Yacht? Then it's
time to sell."
http://www.slate.com/id/2147788/?nav=ta p3
A good quote from article:
"If you look at the recent record of CEOs who have become
yachtsmen, it's clear that when they buy a boat, it's the
shareholders who usually get soaked"
Computerworld is commenting on how this is not true for Oracle's Larry Ellison,
and I feel the need to share that this also not true with MySQL
Founder David Axmark.
Let me present the evidence, any guesses on who's boat is who's in
the photos?


time to sell."
http://www.slate.com/id/2147788/?nav=ta
A good quote from article:
"If you look at the recent record of CEOs who have become
yachtsmen, it's clear that when they buy a boat, it's the
shareholders who usually get soaked"
Computerworld is commenting on how this is not true for Oracle's Larry Ellison,
and I feel the need to share that this also not true with MySQL
Founder David Axmark.
Let me present the evidence, any guesses on who's boat is who's in
the photos?
Link | Leave a comment {3} | Add to Memories | Tell a Friend
Evolving Open Source Business Models
Aug. 15th, 2006 | 08:21 pm
Let us recap the successful open source business models thus far:
1) Sell stuff around Open Source. O'Reilly is the obvious winner in this arena. Their books and conferences go hand in hand with the open source community.
2) Support. IBM Global Services does an amazing job at this. Do you have something built on an Open Source stack that you need supported? They support most anything. Look at HP's announcements as of late and you can see that they are quickly trying to move into this area.
3) Update Services. Ask those who buy Redhat Network what the value is in the model and they will tell you that it is in updates. There is minimal monitoring built into Redhat Network, but the real value is in the updates.
4) Dual License. Give away the software and for those who can not use the software under an open source license, sell them a commercial license. This is the MySQL model.
5) Services.
The fifth model that is emerging is the one that has me excited right now. I saw this post from Redmonk:
http://www.redmonk.com/sogrady/arch ives/002066.html
While the author is all excited about having his background image delivered by Flickr, I am all excited about the potential integration points this means for Flickr.
As an example, lets say Flickr goes and creates an iPhoto for Linux, or just simply starts extending the one in GNOME today. If the iPhoto knockoff is a good tool it will be picked up by Ubuntu, Fedora, Suse, etc... and this translates into millions of eyeballs looking at a tool that Flickr helps to create. To make money from this Flickr places itself as the default hosting site in the application. What does this mean for Flickr?
It acquires more users.
Is this sort of business model unheard of in the open source world? Of course not, just look at who the default search is in Firefox. It is Google. While no one knows how much Google is paying for that pleasure, it is rumored to be in the tens of millions.
Tim O'Reilly recently commented on the Web 2.0 stack and how applications fit and are enabled in an Internet based world. I believe that the most potential which exists today in our industry is the hybrid approach where applications are further enabled by the network.
For developers this model means we need to rethink applications. Not by asking "what can I extract of value from my application to create an artificial need", but by asking "what can I add in value" that will enable new applications which are enhanced by internet connectivity.
At some point in my life I picked up, and have carried with me, the analogy "software is not like bread". Bread has a finite number of times it can be sliced. The loaf of bread which represents software is infinite. Slicing the load will never create value since people know that, unlike bread, software does not come from a limited supply.
The constraint on the supply of bread creates price.
Since there is relatively no cost in copying software its value is often seen as zero.
With services there is a loaf, and there are limits. The limits create value that people understand and are willing to pay for.
1) Sell stuff around Open Source. O'Reilly is the obvious winner in this arena. Their books and conferences go hand in hand with the open source community.
2) Support. IBM Global Services does an amazing job at this. Do you have something built on an Open Source stack that you need supported? They support most anything. Look at HP's announcements as of late and you can see that they are quickly trying to move into this area.
3) Update Services. Ask those who buy Redhat Network what the value is in the model and they will tell you that it is in updates. There is minimal monitoring built into Redhat Network, but the real value is in the updates.
4) Dual License. Give away the software and for those who can not use the software under an open source license, sell them a commercial license. This is the MySQL model.
5) Services.
The fifth model that is emerging is the one that has me excited right now. I saw this post from Redmonk:
http://www.redmonk.com/sogrady/arch
While the author is all excited about having his background image delivered by Flickr, I am all excited about the potential integration points this means for Flickr.
As an example, lets say Flickr goes and creates an iPhoto for Linux, or just simply starts extending the one in GNOME today. If the iPhoto knockoff is a good tool it will be picked up by Ubuntu, Fedora, Suse, etc... and this translates into millions of eyeballs looking at a tool that Flickr helps to create. To make money from this Flickr places itself as the default hosting site in the application. What does this mean for Flickr?
It acquires more users.
Is this sort of business model unheard of in the open source world? Of course not, just look at who the default search is in Firefox. It is Google. While no one knows how much Google is paying for that pleasure, it is rumored to be in the tens of millions.
Tim O'Reilly recently commented on the Web 2.0 stack and how applications fit and are enabled in an Internet based world. I believe that the most potential which exists today in our industry is the hybrid approach where applications are further enabled by the network.
For developers this model means we need to rethink applications. Not by asking "what can I extract of value from my application to create an artificial need", but by asking "what can I add in value" that will enable new applications which are enhanced by internet connectivity.
At some point in my life I picked up, and have carried with me, the analogy "software is not like bread". Bread has a finite number of times it can be sliced. The loaf of bread which represents software is infinite. Slicing the load will never create value since people know that, unlike bread, software does not come from a limited supply.
The constraint on the supply of bread creates price.
Since there is relatively no cost in copying software its value is often seen as zero.
With services there is a loaf, and there are limits. The limits create value that people understand and are willing to pay for.
Link | Leave a comment {5} | Add to Memories | Tell a Friend
Hello World is the most beautiful phrase somedays....
Aug. 12th, 2006 | 02:05 pm
mysql> CREATE TABLE `a` (
-> `a` text
-> ) ENGINE=TABLE_FUNCTIONS DEFAULT CHARSET=latin1 CONNECTION='/ usr/lib/libhello.so' ;
Query OK, 0 rows affected (0.04 sec)
mysql> select * from a;
+--------------+
| a |
+--------------+
| Hello World! |
| Hello World! |
+--------------+
2 rows in set (0.02 sec)
mysql>
-> `a` text
-> ) ENGINE=TABLE_FUNCTIONS DEFAULT CHARSET=latin1 CONNECTION='/ usr/lib/libhello.so' ;
Query OK, 0 rows affected (0.04 sec)
mysql> select * from a;
+--------------+
| a |
+--------------+
| Hello World! |
| Hello World! |
+--------------+
2 rows in set (0.02 sec)
mysql>
Link | Leave a comment {2} | Add to Memories | Tell a Friend
Genesis: Application Clustering
Jul. 25th, 2006 | 11:59 pm
location: Portland, Oregon
In the previous article I discussed using Read Replication Clustering to scale out reads for a website. What I will now do is describe a refined approach to the problem of scaling by creating "Application Clusters with Replication".
A common approach to website design is that a web designer creates a website and decides that search is a feature that they want to implement. If they use the MyISAM engine this means that they can add fulltext indexes to their tables and then make use of them in queries. I will ignore the case where the developer decides that an unanchored LIKE clause is an appropriate solution, since this developer will quickly hit a wall on performance and will need to learn what a fulltext index is.
So the developer adds a fulltext and is good to go? Sounds like an easy solution?
If the site the developer has written begins to see significant traffic then one of three things will occur. One day they will have a crash and discover that the recovery time for a MyISAM table does not fit into a plan of "I want a 24/7 website". The second situation is that they decide that they need transactions and discover that none off the current engines support both fulltext and transactions. The third scenario is that search becomes very popular. With this popularity they start to hit concurrency issues on the MyISAM table.
The solution for these problems is replication, though with a twist. While read replication is fine for third scenario, the first two require having knowledge of how to mix multiple storage engines with replication.
A transaction is only logged to the binlog, aka where replication information is stored, if the transaction is completed. If the transaction is not committed it is never stored. This means that you can safely replicate from a transactional engine, to a non- transactional engine. So if you need transactions or faster recovery, pick an engine with these features and use it on your master. You can then safely replicate into a non-transactional engine like MyISAM. On the slave side you can then add fulltext indexes to the MyISAM table. There is no requirement that a table on the slave must be the same engine type that was used on the master. Multiple slaves can be used to provide scale out for fulltext databases.
What about an example for logging with statistical analysis? Logging is a high I/O use case scenario. In this use case you will want to limit additional writes and all reads against the master database.
In the solution I am going to suggest, the goal will be to limit I/O on the master, and push reads, and possibly writes, to slave databases.
On your master database you can choose an engine like either Archive or Blackhole.
The Archive engine is best used in cases where you want the master database to retain a copy of the data. The Archive engine writes less data to disk then other engines by using compression and row concatenation, instead of a block structure. One story I heard at OSCON was where a user got four times the performance for storing Apache logs by using Archive instead of MyISAM. Neither the MyISAM table or the Archive table had indexes.
The Blackhole engine will work for logging because all INSERTS will be replicated to the slave. While a table created with the Blackhole engine stores no data, the binary log for replication will store any data that needs to be replicated. The Blackhole engine supports transactions, so rollbacks will work to prevent data from being replicated if transactional logging is required. The Blackhole engine was designed for this sort of use in replication at the request of a German customer.
The use case scenario mentioned will keep IO to a minimum on the master so that it can log data as quickly as needed. The slave can then be used for analytics. What happens if you want to store ongoing results you get from doing analytics? You don't want to put these results on the master since it would mean more write I/O on the master. The answer is to store the results on the slave. A table which is on the slave but not on the master will not be touched by replication in all cased but a "DROP SCHEMA".
Both of the situations described above are more then just simple "read replication". They are an application of the concept of spreading data out to servers where specific applications can have private copies of data. With replication you can spread data out so that specific applications can have private copies of data to work against.
Previous Articles:
Genesis: The Search for Scaling
Genesis: Read Replication Cluster
A common approach to website design is that a web designer creates a website and decides that search is a feature that they want to implement. If they use the MyISAM engine this means that they can add fulltext indexes to their tables and then make use of them in queries. I will ignore the case where the developer decides that an unanchored LIKE clause is an appropriate solution, since this developer will quickly hit a wall on performance and will need to learn what a fulltext index is.
So the developer adds a fulltext and is good to go? Sounds like an easy solution?
If the site the developer has written begins to see significant traffic then one of three things will occur. One day they will have a crash and discover that the recovery time for a MyISAM table does not fit into a plan of "I want a 24/7 website". The second situation is that they decide that they need transactions and discover that none off the current engines support both fulltext and transactions. The third scenario is that search becomes very popular. With this popularity they start to hit concurrency issues on the MyISAM table.
The solution for these problems is replication, though with a twist. While read replication is fine for third scenario, the first two require having knowledge of how to mix multiple storage engines with replication.
A transaction is only logged to the binlog, aka where replication information is stored, if the transaction is completed. If the transaction is not committed it is never stored. This means that you can safely replicate from a transactional engine, to a non- transactional engine. So if you need transactions or faster recovery, pick an engine with these features and use it on your master. You can then safely replicate into a non-transactional engine like MyISAM. On the slave side you can then add fulltext indexes to the MyISAM table. There is no requirement that a table on the slave must be the same engine type that was used on the master. Multiple slaves can be used to provide scale out for fulltext databases.
What about an example for logging with statistical analysis? Logging is a high I/O use case scenario. In this use case you will want to limit additional writes and all reads against the master database.
In the solution I am going to suggest, the goal will be to limit I/O on the master, and push reads, and possibly writes, to slave databases.
On your master database you can choose an engine like either Archive or Blackhole.
The Archive engine is best used in cases where you want the master database to retain a copy of the data. The Archive engine writes less data to disk then other engines by using compression and row concatenation, instead of a block structure. One story I heard at OSCON was where a user got four times the performance for storing Apache logs by using Archive instead of MyISAM. Neither the MyISAM table or the Archive table had indexes.
The Blackhole engine will work for logging because all INSERTS will be replicated to the slave. While a table created with the Blackhole engine stores no data, the binary log for replication will store any data that needs to be replicated. The Blackhole engine supports transactions, so rollbacks will work to prevent data from being replicated if transactional logging is required. The Blackhole engine was designed for this sort of use in replication at the request of a German customer.
The use case scenario mentioned will keep IO to a minimum on the master so that it can log data as quickly as needed. The slave can then be used for analytics. What happens if you want to store ongoing results you get from doing analytics? You don't want to put these results on the master since it would mean more write I/O on the master. The answer is to store the results on the slave. A table which is on the slave but not on the master will not be touched by replication in all cased but a "DROP SCHEMA".
Both of the situations described above are more then just simple "read replication". They are an application of the concept of spreading data out to servers where specific applications can have private copies of data. With replication you can spread data out so that specific applications can have private copies of data to work against.
Previous Articles:
Genesis: The Search for Scaling
Genesis: Read Replication Cluster