Tags - Subversion Conversion to Mercurial, Part 3
Well, if converting the trunk was a necessary appetizer, and converting branches was the meat and potatoes of Subversion to Mercurial conversion, then surely converting the tags is the dessert. We now understand all of the basics of converting the trunk and branches of a Subversion repository to Mercurial. All we need to do is pull over the tags that label specific revisions. Before we dive into dessert, let’s admire it a little by examining how tags are stored in both Subversion and Mercurial.
What Are Tags?
First, in Subversion, because revisions are stored as a set of pointers to the files that make up a revision, then a tag is just a set of pointers to the files at specific states in their history. This is done be making a copy of the file pointers we want in the tag at another location in the repository directory structure. O course, because it uses this generic way of creating tags, then tags might not only be tags. They could be branches, if further changes were made to the copied files. Additionally, a tag need not include all of the files in the trunk or in a branch.
Mercurial, on the other hand, records a tag as just a name associated with a revision id. That revision id specifies the entire state of the repository at that time, so by definition a tag includes all files in a repository. Additionally, tags in Mercurial are versioned, as any other file is. This means that your repository won’t know about tags created in another repository unless you’ve pulled the changes that create those tags, even though you may have the revisions specified by the tags.
Generically
As you can imagine, these fundamental differences make converting tags a very ambiguous process. Not any more ambiguous than converting branches, of course, but not any less either. The generic part of the algorithm that hg convert uses for converting tags, or the work done no matter what the type of the source destination, is fairly straightforward. The converter asks the source converter object for the tags, as a simple dictionary from the tag name to the revision id. The source revision ids are then mapped to destination revision ids, as long as that revision wasn’t skipped in the conversion process, e.g. by a file mapping that excluded it. The destination converter object then records the tags in the destination repository. The converter then updates the revision map, so that future conversions parent new children to the revision that creates the tags, rather than its parent. This avoids the problem of branching to create the tags. It also means that all tags are created after all other revisions have been converted into the new repository, which may seem odd. Unfortunately, it’s the least ambiguous way to convert the tags.
Finding the tags
Of course, this high-level description abstracts away the real work - getting the tags from the Subversion repository and putting them into the Mercurial repository. So let’s dive into the Subversion side first. Because of the way tags are recorded in Subversion repositories, as well as all of the non-tag things you can do with and to the files there, this is a rather tricky process, that can easily miss tags if the Subversion repository is in any way unconventional. Additionally, the algorithm may change if someone comes up with better heuristics for determining the tags in a Subversion repository. This comment before the algorithm for getting tags sums things up well:
# svn tags are just a convention, project branches left in a
# 'tags' directory. There is no other relationship than
# ancestry, which is expensive to discover and makes them hard
# to update incrementally. Worse, past revisions may be
# referenced by tags far away in the future, requiring a deep
# history traversal on every calculation. Current code
# performs a single backward traversal, tracking moves within
# the tags directory (tag renaming) and recording a new tag
# everytime a project is copied from outside the tags
# directory. It also lists deleted tags, this behaviour may
# change in the future.
The Subversion converter object goes through the svn log from the latest revision to the revision specified as the start revision in the converter config section (defaults to 0). From each log entry, it finds changes that are copies from one location to another, and sorts those copies from more specific to more general. Then it looks to see if the most generic copy is actually copying the tags directory itself. If so, we make sure that as we continue backward through the log we start looking in the old tags directory, rather than where it was moved to. Then, for each copy we check to see if the file(s) copied are copied into the tags directory.
After that check, we then look through all the copies seen so far, to see if the current copy is just a rename of an existing tag. If so, we update our record of the tag. The subversion converter object then checks each of the files added in the revision against the pending tags. If any file added creates a tag that tags files from different branches of the repository—i.e. files from the trunk as well as files from a branch—then it is discarded, since it cannot be represented in Mercurial.
Finally the source converter object goes through each pending tag, and determines the name. If the tag is a rename of another tag, it leaves it in the pending list, and continues to the next tag. Next it gets the revision id for the tag, by looking at the source revision from which the tag was created. Finally, if it hasn’t yet been added to the official tags list, it is.
Tagging the new repository
The work to put these tags into the new repository is much simpler. But first, one limitation of the tags conversion code is that all converted tags must be placed in a single cloned branch. This limitation exists because tags are not converted in line with other revisions, so the challenge of ensuring that changes to the .hgtags file are in all the right cloned branch repos is a tricky one. Each cloned branch could determine which tags apply to it, but then each cloned branch would have a different set of changes to the .hgtags file at its tip after the conversion, all of which would have to be merged when doing merges between these cloned branches.
So first, the destination converter object determines in which repository to place the updated tags. This is necessary if the —clonebranches option was specified on the command line, otherwise there is only one repository to put them in. It then loads up the old tags from the .hgtags file, and creates the full list of entries to be placed in that file from the tags dictionary retrieved from Subversion. If no new tags have been created, then it returns. If some have, then it saves the full list of entries to the file and commits the changes to the repository.
Summing Up
Now we’ve gone over all the basics of converting a repository from Subversion to Mercurial: the trunk, the branches, and the tags. But we’ve just barely touched on the many different tricks of converting repositories, and cleaning them up after the fact. Tools like svnadmin dump with dumpfilters, the mq extension to Mercurial, hg histedit, the hgsubversion extension, not to mention the possibilities with going through another VCS in the process, such as Git, all offer possibilities worth exploring when you run into issues with conversion. Though I don’t have concrete plans for writing about each of these, I will occasionally share tips and tricks as I learn about them. In the meantime, happy coding!