Log in

No account? Create an account

Why Non-Distributed Systems Suck

« previous entry | next entry »
Feb. 1st, 2008 | 12:56 pm

I have a patch that I just emailed to an upstream committer.

The main repository is in Subversion.

Now I want to work on a second patch.

What to do?

Toss my current patch out, revert, and work on another patch.

If this was a distributed system?

I would just commit locally. I would then work on a new patch. When the patch was committed to the main system I would just pick it up with a pull. I would never notice the issue because the remote patch would come in and be merged locally.

And what about the process where I work on a patch in pieces and commit along the way?

Well you can forget about that with a non-distributed system.

The solution to my problem? It looks like Fedora has SVK packages. With those I should be able to get around all the limitations for the remote server being Subversion.

What do I want?

The remote server to be Mercurial (or any other modern system...).

If you go back a decade I adored CVS. Compared to the options at the time it was a winner.

Today? Not so much.

Link | Leave a comment |

Comments {10}

Ted Tso

(no subject)

from: tytso
date: Feb. 4th, 2008 01:19 am (UTC)

Well, the problem is that by definition a distributed SCM allows you to do all operations without being connected to "a server". This is true for BitKeeper, Mercurial, git, and bzr. That means the client needs to have the full functionality of the server --- and a universal client needs to have the union of all distributed SCM's. Worse yet, different DSCM's store things differently; some use SCCS-style weaves; some use RCS-style delta's; and some store individual file versions. This influences what kind of merges each of the DSCM's can do --- and each DSCM can have different levels of "smartness" in their merges. So a universal client would either have to have the union of all DSCM's storage systems and union of all DSCM merge algorithms --- or effectively you are trying to force all of the DSCM's to use one storage system and one merge algorithm.

Put another way, a distributed SCM necessarily implies a very fat client, so a universal client means either that you force all of the DSCM's to be the same, with the same client-side repository format and the same merge algorithms, etc., or the a "universal client" basically means taking all of the DSCM's and unifying them under the some least common denominator UI front-end.

I would also note that forcing a common DSCM would of necessity stifle innovation in SCM's. For example, one of the thing you could do with an integrated editing environment, such as record the fact that the change that was made was the rename of a class method function. That is, if eclipse has a low-level editing command which is "change in the class rename the method function to ", which changes both the definition and all uses of said method function, then when you subsequently do a merge, and in another branch new code was introduced which used the method function ., the merge could automatically rename new instances of the use of the renamed method function. This is similar to the ability of bk to track directory renames, so that new files which are added to the directory can automatically get moved to the new directory after a merge. But in order to do this, you need to actually change the data schema of the DSCM to record the user intention of "I renamed this method function", just as in order to support directory renames, the DSCM needs to record the user intention "I renamed this directory".

The problem though is that if you have a universal client which doesn't know about this new kind of user intention accounting, it could never support this new DSCM. So trying to create a universal client by definition means that you are freezing DSCM development at current levels. It would be like trying to dictate to MySql, Postgres, Oracle, et. al, that the database format had to be unified and no new back-end storage could be allowed in order to allow for a universal distributed client. I'm not deeply involved in database development, but I suspect if I walked into a Database conference and asked various database developers to stop innovating, I'd be told, with varying levels of politeness, to f*ck off. :-)

What specific goals are you trying to achieve by having a universal client?

Reply | Parent | Thread

Brian "Krow" Aker

(no subject)

from: krow
date: Feb. 6th, 2008 03:46 am (UTC)

A merge can be done locally (and is).

It is a matter of mapping features... there is a super set of features (and attributes). If a given server lacks the ability to store a certain amount of data, it muddles through. This is about sync really. I don't believe that source code is all that different from any other data.

I suspect developers can innovate on the server by making the best server :)

What do I care about? The original rant was about me being annoyed about one project not finishing their conversion to git, away from Subversion.

I contribute to a number of projects, I don't want to juggle three (currently five) different clients. I want a client that allows me to work in the way I do. I'll push what I can to projects as I need too.

Reply | Parent | Thread

Ted Tso

(no subject)

from: tytso
date: Feb. 6th, 2008 05:15 am (UTC)

But *how* the merge is done is dependent on the data storage model of the DSCM, and Bzr, Hg, and git all have different data storage models. So if the merge is done locally, then the "universal client" by definition needs to understand every DSCM's data storage model, and the merge algorithm used by each DSCM. In particular, BitKeeper, Bzr, and git use *different* merge algorithms, all different from the simple, stupid 3-way merge.

So how would you unify the different merge algorithms in your "universal client". People are already innovating in the client-side merge algorithms, and people like Larry McVoy will tell you that this is one of the most important parts of BitKeeper. You could have a least-common-denominator "universal client", but it wouldn't be able to handle directory renames, for example. And if you try to do a superset of features, and different DSCM's store the information differently, it effectively means that the universal client need to implement the union of all DSCM's clients.

If your goal is you're tired of needing to learn 3 or more different DSCM's, then it's really more about UI integration than anything else, and it maybe it wouldn't be hard to have a front-end shim in front of git, bzr, and hg so that as long as the developer only need to use the least common demoinator features. That wouldn't be hard to do, and it's far easier than trying to design the theoretical universal client.

Reply | Parent | Thread