?

Log in

No account? Create an account

Master.info, Re-factoring Code

« previous entry | next entry »
Oct. 7th, 2008 | 09:55 pm

I was just able to do this for the first time:

[root@piggy var]# /tmp/drizzle/drizzled/serialize/master_list_reader ./master.info
HOSTNAME piggy.tangent.org
USERNAME
PASSWORD
PORT 4427
CONNECT RETRY 60
LOG NAME
LOG POSITION 16777216

So what is the big deal?

In the reworking of replication I ran across this bird's nest of a code that exists for the master.info file. Pretty much all of the code dates back to around 2000 (despite the recent work in row based replication, most of the code in replication has been the same for the last eight years).

One of the things no one has ever tackled was getting rid of the master.info file. Even at this point in Drizzle we still have it, though we have moved it to being a protocol buffer file and now can list multiple masters in the file (so yes, that means we will most likely have multi-master support sometime shortly).

In the next few months this file will go away and we will move the data into a table, but for now it was worth spending the two days to clean up the code in order to get rid of the "master database server has died and forgotten to write out the correct data" problems that have existed for so long. I suspect we will evolve the interface at least two more times before we are done. Sure I could shoot for the "end design" and try to be done with it, but I have found that attacking our problems in bite size chunks tends to buy us more distance than just tearing it out and hoping to find the one "right way" for the solution.

I am starting to wonder what the rule of thumb should be for refactoring. The code base we are working from last evolved over a decade ago. This was when Unireg became MySQL. I am starting to think we should be spending anywhere between 70 and 80 percent of our time going forward on just refactoring work. This does not leave a lot of room for features, but I believe that features are a lot less important then what people make them out to be (and in our case we are just working on the micro-kernel, so others can continue innovate on the edge).

The older the code base gets the more important it becomes to do this sort of work, though I am sure someone a decade from now is going to find themselves just as annoyed as I am most days :)

Link | Leave a comment |

Comments {7}

awfief

(no subject)

from: awfief
date: Oct. 8th, 2008 11:11 am (UTC)
Link

Great work! One of the things that's really important from the cloud perspective (and in general, DBA perspective), is that there isn't a lot of stuff that *needs* to deal with OS-level things like files. Having replication information stored in a table makes a lot of sense; it also means that when you have a consistent backup of the database you immediately know where replication left off.

I think if you refactor the code, it will be much easier for others to write the features they want/need. So while the Drizzle core team may not be writing them, they'll be written -- the first plugins are already being written, outside the paid Drizzle core team, inside the Drizzle community.

Reply | Thread