Branches - Subversion Conversion to Mercurial, Part 2
After reading about how the hg convert extension can convert the trunk of a Subversion repository to Mercurial, you’re probably thinking: “But we have more than just a single line of development! We branch our code! We merge it! We tie it into knots! It’s like a great monster!” Of course it is. You wouldn’t be good developers if it weren’t. If a simple trunk is like a single snake, unbroken from head to tail, then any actively developed repository is more like the Hydra, with more heads then you can count, and poisonous breath besides. Even worse, the Hydra is immortal, and so cannot be killed - sounds like a legacy codebase to me! Heracles couldn’t kill the Hydra, but he could bring it under submission, and put it to his own uses.
So lets look at how hg convert can take the branches in a Subversion repository and bring them into a Mercurial repository, thus taming the beast.
Finding the Hydra’s Heads
Getting the head revision when we were just dealing with the trunk was easy - just find the latest revision under the trunk’s path. But now we’re dealing with the Hydra - we need to keep track of many heads, one for each branch. The source converter object (remember that one? It’s Subversion specific) does this work. After getting the head of the trunk, it then lists the contents of the branches directory. This directory is either detected as a child of the source url passed into the hg convert command, or it is specified in the convert configuration section. Typically that’s done on the command line using the —config parameter. So the source converter object lists the children of the branches directory. For each child, it checks if it’s a directory, and finds the latest revision in that directory. As long as the latest revision wasn’t the one that created it (i.e. it’s a branch with no changes in it), then it adds that latest revision to the set of heads.
As with the trunk, we then need to follow the parents of each head back to track down all of the revisions that need to be included in the convert. And as with the trunk, if hg convert has been run before against the same source, then we only track back till we find changes that have already been converted into the destination repository.
Sorting Things Out
The biggest challenge in fighting the Hydra isn’t the poisonous breath - Heracles overcame that with a simple cloth over his mouth and nose. No, it’s that when you cut off one of it’s heads, it grows back two more. So one of the first things you need to know is whether it’s heads will grow back in parallel or one after the other. When converting Subversion repositories to Mercurial, this concept corresponds to the sort order.
Now that we’re going to be converting multiple branches, the order in which we sort the changes to be imported is important. The hg convert extension offers three types of sorts: branchsort, datesort, and sourcesort. We can ignore sourcesort, because that only applies when importing from a Mercurial repository. The default for Subversion repositories is branchsort. This means that when importing from Subversion, the algorithm essentially sorts them as a depth first search. It imports one branch all the way to its head revision, then goes back and imports the next branch. In other words, the Hydra grows back one head, then grows a second. This is in contrast to datesort, which imports each revision in date order, or rather, the hydra’s heads grow in parallel. Datesort is more intuitive, and leads to repositories that are organized the way we expect them to be, with development going on in the trunk and in branches at the same time. Branchsort, however, actually creates smaller Mercurial repositories, because the diffs to files tend to be much smaller when they’re organized by branch, rather than intermingled.
Given that disk space is cheap, you will be happiest if you specify datesort, and only rerun hg convert with branchsort if you experience any problems with the resulting repository size.
Hacking at the Hydra
Heracles defeated the Hydra by cutting off it’s heads, then having his nephew cauterize the wounds so that new heads would not grow back. Finally he placed it’s one remaining immortal head under a heavy rock, trapping it.
The trick to defeating the Hydra is to limit the number of heads you’re dealing with. Unfortunately, normal conversions from Subversion to Mercurial will often leave more heads than expected. Remember, this is the Hydra - we should expect more heads than expected. In this case, it is because the convert extension cannot recognize Subversion merges . Doing so is a tricky problem because Subversion merges are so flexible. So after the conversion, merges don’t produce a nice revision with two parents. Rather, it looks like any other revision with one parent, while the other parent is left dangling. Because Subversion branches are often closed when they are merged back into trunk development, this means that we’re leaving an extra head in the converted repository that need not be there. Even if the Subversion branch was used further after the merge, we’ve still lost an important piece of history by not recording the revision as a merge in the new repository. Fortunately, the convert extension does provide a manual workaround to this limitation: the splicemap. Like Heracles firebrand wielding nephew, the splicemap can safely eliminate the Hydra’s heads. The splicemap does this by allowing you to specify the parents of any given revision.
As you might imagine, you can easily shoot yourself in the foot with this ability. You could rearrange revisions in any number of ways, creating an odd tree, or switching revisions around in ways that significantly increase the size of the converted repository. But rather than hacking up and grafting together a totally new Hydra, let’s just use it to get rid of a few of the hydra’s heads. It’s most obvious benefit is to specify the two parents of any merge operation, in most cases eliminating one head from the converted repository. It can also be used to bring together two disparate lines of development, which may occasionally be useful, e.g. when you realize that two separate repositories should really be combined into one.
The hg convert extensions implements the splicemap using a simple lookup of revision based on the ids specified. Then it replaces the parents on a commit that has been retrieved from the source converter object, before having the destination converter object put the commit into the destination repository.
One trick to using the splicemap is understanding the revision format used in the splicemap file. For subversion repositories, it is important to get this right, or it will be as if the splicemap hadn’t even been specified. A subversion repository has its revision in the splicemap formatted like so:
svn:<uuid>/path/to/module@<revnum
So for example, revision 931750 in the trunk of the official subversion repository would be specified like this:
svn:13f79535-47bb-0310-9956-ffa450edef68/subversion/trunk@931750
How Many Hydras?
In taming the Hydra of a subversion repository, we have an option that Heracles did not have: rather than having to eliminate Hydra heads by chopping them off and cauterizing the wounds, it’s as if Heracles could chop the Hydra up into n Hydras, each with one head, so it’s really more like a snake. Instead of a many headed Hydra, we end up with a bunch of one headed snakes. Indiana Jones might not like that solution, but it’s much easier for a hero like Heracles to tackle, because he can just deal with one at a time.
Likewise, instead of removing heads using the splicemap, we can split the repository into multiple repositories, each with one head. To understand the differences between these having a bunch of named branches in one repository, versus having a separate repository for each branch, it is helpful to read Steven Losh’s excellent branching in Mercurial guide.
If you decide to just create named branches in the destination repository, the source converter object records the branch that a given revision is on, and the destination converter object creates a commit with that named branch. Nothing too spectacular here.
Creating cloned branches, where there is a separate Mercurial repository for each Subversion branch, takes a little more work. For you, the little more work is just to specify —clonebranches on the command line. For the converter, it needs to make sure that each revision goes into the right repository. First, when copying each revision from the source to the destination, the converter object finds the branches that a revisions parents are on. It then tells the destination converter object which branch the child revision is on, as well as the branches that the parent revisions are on. The destination convert object first sets the correct repository to commit the revision to. If it doesn’t exist yet (i.e. the revision is the first in this branch), then the destination repository is created. Then it needs to make sure that the destination repository has all of the revisions leading up to the one being copied. So it pulls all of the appropriate revisions in from each of the parents branches. Finally it is ready to commit the child revision to the appropriate branch repository.
Cleaning up the mess
Fighting the Hydra can be messy. Here are some things we can do to clean up once we’re done. One thing to note when doing Subversion to Mercurial conversions is that you’ll want to eliminate empty revisions. Typically, these occur because the subversion revision either only changes subversion properties (and not files), or because it only creates the directory at the root of the trunk or one of the branches. They can also occur if a filemap is specified. But if it isn’t, then the hg convert extension doesn’t try to eliminate empty revisions. So the rule when doing conversions should be to always specify a filemap file, even if you just leave it empty. This will make sure that the hg convert extension still tries to eliminate empty revisions.
You may also want to eliminate branches from the history, preserving only the merge commit. The easiest way to do this is to not use the splicemap to merge the branch into development, and then strip that branch from the mercurial repository after the conversion is done. The trunk will still have the proper changes from the revision that performed the merge, but all of the history of how that revision was created in the branch will be gone.
Victorious
Hopefully, that’s enough information to both understand how conversion of branches works, as well as to successfully convert the unique repositories you’ve got on hand. Once you’re done you’ll still have to deal with the immortal Hydra, i.e. all your existing code. But hopefully you’ve made things more manageable along the way, making it easier to leverage all the goodness in that code. Like Heracles, it should now be possible to go forth and conquer other monsters using the Hydra’s venomous poison.