Brian "Krow" Aker (krow) wrote,
Brian "Krow" Aker

Innodb Embedded Engine

A couple of notes:

1) I have already shared these thoughts with the Innodb team (and received some encouraging thoughts).
2) I wrote this about a week ago. Drizzle/Memcached/Gearman keep me busy so I only got to spend a couple of weekend days to look over the Innobase Embedded Engine. I wish I had more time to go in depth.
3) Innodb has a forum for the engine here where you can get more answers:
4) You can download the technology from here:
5) I'd love to see someone take the embedded engine and port it directly to the Drizzle Engine Interface. I think that the interface they have done would make a much better starting point for integration then what we have today.


The Innodb Embedded Engine is the same technology used for the Innodb Plugin. That is awesome... think about it. You are getting a threaded storage engine for embedded use that understands schema. Schema-less certainly has its place but there are a lot of cases where schema matters. Better? We are talking about using a real OLTP engine and being able to skip all of the SQL and go straight to the interface. A number of the storage engines vendors have talked about doing this over the years, but this is the first time one of them as delivered.

All of the settings that you can do from the normal MySQL/Drizzle configuration are available. You are exposed to the direct API so you can get at the bits if all you really need is a simple interface (joins are not supported).


How are the examples?

Without a bit of work they won't compile. There are problems in the include declarations for paths.

Simple stuff like this:
/usr/local/include/embedded_innodb-1.0/api0api.h:26:23: error: ib0config.h: No such file or directory

The paths were incorrectly setup in the header file. My suggestion to the authors would be to look at libxml2 to see how to properly setup header files.

Also? Don't include the config file. You break everyone else who is using auto tools. If you see errors like this you know you will have issues:

/usr/local/include/embedded_innodb-1.0/ib0config.h:348:1: warning: "PACKAGE_NAME" redefined

The naming conventions are pretty poor. For instance the include file names make no sense:
api0api.h db0err.h ib0config.h

The naming scheme is reflective of Innodb's history but for end-developer usage they could have picked something a little bit cleaner (unless you are a Solid DB author, since they use exactly the same naming conventions) .

I noticed when writing code that I must insert as a table name "something/something". You are required to build names like this:
snprintf(table_name, sizeof(table_name), "%s/%s", dbname, name);

From what I gather from the assert the "/" is required for the embedded plugin to know the name of a schema to create.

The error system lacks some sort of error to string function. Having this would make debugging much simpler.

I may be wrong about this though. The examples are a bit confusing and the call ib_database_create() makes me think that there is more to explain then what the current manual has in it. The documentation is far above the quality of what is frequently found for a first release like this... kudos to Oracle for doing this. The only issue I found was a discrepancy in the ib_cursor_moveto() function and a few other random mistakes.

Unlike SQLite, the Embedded Innodb requires multiple files to run. Depending on your use case this will be annoying. The "single file" feature of SQLite really makes it useful as a replacement for writing your own file formats.

The library makes use of its own types, like ib_bool_t, instead of just using standard C99 types. This type of programming is currently a pet peeve of mine. I am getting tired of dealing with "yet another set of types". It makes it a pain to integrate with other code and just increases complexity for the end developer for no good reason.

And writing to stdout on startup? That is a no no. I don't want my libraries writing to stdout on me. Libraries should be quiet, not noisy.

Errors like this make the point that it is not only noisy, but that it still isn't really ready for prime time yet:

090425 14:38:31 InnoDB: Error: table test/Foo does not exist in the InnoDB internal
InnoDB: data dictionary though the client is trying to drop it.
InnoDB: Have you copied the .frm file of the table ?


090425 15:36:52 InnoDB: Error: Client is freeing a thd

That remind you of anything? :)

The API is based on what has been needed so far to make Innodb work with MySQL, so MySQL above specific errors are not surprising. The lack of API calls to list all tables in a schema is another example of this. With MySQL you had the FRM files, so there was no need to be able to get at the list of tables Innodb owned, so no API call exists for this (which is a common need in a standalone database).

One other big item, this library is using a lot of global variables. Take a look at ib_cfg_set_int(). Notice the lack of context for setting variables. This means that the library really was not meant to be used in any sort of multi-tenancy use case. This really limits its usage pattern. A better design would be to create a "context" and pass that to the startup of the library. I've wanted Innodb to cleanup its usage of global variables for years, and I had hoped that with the creation of a library this would have been solved.

Final Thoughts

I can see a lot of use for this library. Concurrency with SQLite is a big issue for write, and libraries like Berkeley DB lack schema, and I personally like the concept of an embedded engine knowing more about the data then the length of the byte array being stored in it.

Still? The library could use a lot of polish. It still shows the warts of being an internal project that has been pushed out into the public. I really hope that before a final release occurs that the authors will clean up the interface and consider the end developer.

As far Drizzle/MySQL goes? I can certainly see basing a future storage engine interface around the concepts found in this library. It is not perfect but it is certainly better then what we have today. It is also obvious after looking at the interface that there is more that can be done in regard to performance if the interfaces were better aligned.

There is a lot of potential in this project.
  • Post a new comment


    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded