?

Log in

No account? Create an account

Why Non-Distributed Systems Suck

« previous entry | next entry »
Feb. 1st, 2008 | 12:56 pm

I have a patch that I just emailed to an upstream committer.

The main repository is in Subversion.

Now I want to work on a second patch.

What to do?

Toss my current patch out, revert, and work on another patch.

If this was a distributed system?

I would just commit locally. I would then work on a new patch. When the patch was committed to the main system I would just pick it up with a pull. I would never notice the issue because the remote patch would come in and be merged locally.

And what about the process where I work on a patch in pieces and commit along the way?

Well you can forget about that with a non-distributed system.

The solution to my problem? It looks like Fedora has SVK packages. With those I should be able to get around all the limitations for the remote server being Subversion.

What do I want?

The remote server to be Mercurial (or any other modern system...).

If you go back a decade I adored CVS. Compared to the options at the time it was a winner.

Today? Not so much.

Link | Leave a comment | Share

Comments {10}

dormando

(no subject)

from: dormando
date: Feb. 1st, 2008 09:29 pm (UTC)
Link

git-svn ;)

Reply | Thread

Brian "Krow" Aker

(no subject)

from: krow
date: Feb. 2nd, 2008 05:42 am (UTC)
Link

Shame that "said" repository is not just git in the first place :)

I've been meaning to ask why you have the hots for git. Played with mercurial?

Reply | Parent | Thread

dormando

(no subject)

from: dormando
date: Feb. 2nd, 2008 08:56 am (UTC)
Link

Real shame. I hear people are still frustratingly working on it :)

I've used mercurial a bit; but it's been almost only with memcached.

I got into git a few years ago when SVN shit the bed at Gaia. At the time a few friends were using it and a few projects I followed were as well (such as the linux kernel).

Eh, it has absolutely killer performance and blew through all of the crazy multi-thousand-file merges we were doing at gaia, and the bulk of the projects I follow are still git. In the years since the interface has become a lot more solid, and it's still really fast.

I see the two as being interchangeable. I'll be quicker with git until I learn all of the "behavioral aliases" for mercurial.

Reply | Parent | Thread

Ted Tso

(no subject)

from: tytso
date: Feb. 3rd, 2008 12:28 am (UTC)
Link

No, the suggestion wasn't converting the repository to git, but to use the "git svn" command, which is part of git release, yes, but completely different from the normal way of using git. This is using the git client to talk to the svn repository.

What's really nice about git-svn is that it converts SVN repositories on the fly. You can periodically use "git svn fetch" to get updates from the svn repository. In the meantime, you can create branches in the git repository and work on patches on top of patches. When the first patch is accepted, you can rebase and then submit subsequent ones.

Even better, if you have write access to svn, you can do a whole series of patches while on the plane or otherwise off-line, by committing them into git, and then use the "git svn dcommit" command to commit each change into the svn repository. So this actually allows you to use all of the power of a distributed SCM, without forcing everyone using the SVN repository to switch to git. This is unique to git, and not something you can do with Mercurial.

See An introduction to git-svn for Subversion/SVK users and deserters for more details.

Reply | Parent | Thread

Brian "Krow" Aker

(no subject)

from: krow
date: Feb. 3rd, 2008 12:46 am (UTC)
Link

I still continue to think that clients and servers are different beasts in SCM, and that we are moving to universal clients. Mercurial can pull from a number of systems today, so can BZR.

The question still in my mind is what is the superset of features that any one server might have.

Reply | Parent | Thread

Ted Tso

(no subject)

from: tytso
date: Feb. 3rd, 2008 01:10 am (UTC)
Link

Mercurial can convert repositories from svn to hg, yes, but it doesn't have an automated way for someone to do commits in hg, and then later on, push those commits back to svn and then rewrite the hg branch so it looks as if the commit was pulled down from svn originally.

So it's not enough enough just to pull from a large number of systems. You also want to be able to make changes while off-line and disconnected from the network, and then be able push these changes back to SVN.

So what git-svn is doing is a kind of universal client, if you want to think about it that way, except that it extends svn by giving it off-line capabilities, while also working with git repositories as well. As far as I am concerned, read-only universal clients are totally uninteresting to me; what I want is to be able to commit changes off-line and then push them back to the server when I am connected back to the network.

Reply | Parent | Thread

Brian "Krow" Aker

(no subject)

from: krow
date: Feb. 3rd, 2008 09:10 pm (UTC)
Link

I am assuming that all of the projects will eventually have universal clients (at least git, mercurial and bzr). They all seem to be working toward this.

I really want my client to be separate from my server. I believe if we break client/server from being connected at the hip, that we will better off.

Has anyone organized a meeting recently between the three major non-distributed to come up with a universal language that could be spoken over http?

Reply | Parent | Thread

Ted Tso

(no subject)

from: tytso
date: Feb. 4th, 2008 01:19 am (UTC)
Link

Well, the problem is that by definition a distributed SCM allows you to do all operations without being connected to "a server". This is true for BitKeeper, Mercurial, git, and bzr. That means the client needs to have the full functionality of the server --- and a universal client needs to have the union of all distributed SCM's. Worse yet, different DSCM's store things differently; some use SCCS-style weaves; some use RCS-style delta's; and some store individual file versions. This influences what kind of merges each of the DSCM's can do --- and each DSCM can have different levels of "smartness" in their merges. So a universal client would either have to have the union of all DSCM's storage systems and union of all DSCM merge algorithms --- or effectively you are trying to force all of the DSCM's to use one storage system and one merge algorithm.

Put another way, a distributed SCM necessarily implies a very fat client, so a universal client means either that you force all of the DSCM's to be the same, with the same client-side repository format and the same merge algorithms, etc., or the a "universal client" basically means taking all of the DSCM's and unifying them under the some least common denominator UI front-end.

I would also note that forcing a common DSCM would of necessity stifle innovation in SCM's. For example, one of the thing you could do with an integrated editing environment, such as record the fact that the change that was made was the rename of a class method function. That is, if eclipse has a low-level editing command which is "change in the class rename the method function to ", which changes both the definition and all uses of said method function, then when you subsequently do a merge, and in another branch new code was introduced which used the method function ., the merge could automatically rename new instances of the use of the renamed method function. This is similar to the ability of bk to track directory renames, so that new files which are added to the directory can automatically get moved to the new directory after a merge. But in order to do this, you need to actually change the data schema of the DSCM to record the user intention of "I renamed this method function", just as in order to support directory renames, the DSCM needs to record the user intention "I renamed this directory".

The problem though is that if you have a universal client which doesn't know about this new kind of user intention accounting, it could never support this new DSCM. So trying to create a universal client by definition means that you are freezing DSCM development at current levels. It would be like trying to dictate to MySql, Postgres, Oracle, et. al, that the database format had to be unified and no new back-end storage could be allowed in order to allow for a universal distributed client. I'm not deeply involved in database development, but I suspect if I walked into a Database conference and asked various database developers to stop innovating, I'd be told, with varying levels of politeness, to f*ck off. :-)

What specific goals are you trying to achieve by having a universal client?

Reply | Parent | Thread

Brian "Krow" Aker

(no subject)

from: krow
date: Feb. 6th, 2008 03:46 am (UTC)
Link

A merge can be done locally (and is).

It is a matter of mapping features... there is a super set of features (and attributes). If a given server lacks the ability to store a certain amount of data, it muddles through. This is about sync really. I don't believe that source code is all that different from any other data.

I suspect developers can innovate on the server by making the best server :)

What do I care about? The original rant was about me being annoyed about one project not finishing their conversion to git, away from Subversion.

I contribute to a number of projects, I don't want to juggle three (currently five) different clients. I want a client that allows me to work in the way I do. I'll push what I can to projects as I need too.

Reply | Parent | Thread

Ted Tso

(no subject)

from: tytso
date: Feb. 6th, 2008 05:15 am (UTC)
Link

But *how* the merge is done is dependent on the data storage model of the DSCM, and Bzr, Hg, and git all have different data storage models. So if the merge is done locally, then the "universal client" by definition needs to understand every DSCM's data storage model, and the merge algorithm used by each DSCM. In particular, BitKeeper, Bzr, and git use *different* merge algorithms, all different from the simple, stupid 3-way merge.

So how would you unify the different merge algorithms in your "universal client". People are already innovating in the client-side merge algorithms, and people like Larry McVoy will tell you that this is one of the most important parts of BitKeeper. You could have a least-common-denominator "universal client", but it wouldn't be able to handle directory renames, for example. And if you try to do a superset of features, and different DSCM's store the information differently, it effectively means that the universal client need to implement the union of all DSCM's clients.

If your goal is you're tired of needing to learn 3 or more different DSCM's, then it's really more about UI integration than anything else, and it maybe it wouldn't be hard to have a front-end shim in front of git, bzr, and hg so that as long as the developer only need to use the least common demoinator features. That wouldn't be hard to do, and it's far easier than trying to design the theoretical universal client.

Reply | Parent | Thread