Migrating the CDO SVN repository to Git

The day has finally come. After months and days of living with the inconveniences of SVN (which, actually, we used to paise and worship after the days of CVS...), we decided to finally migrate the CDO repositories from SVN to Git.

In this blog entry, I will describe the steps we took to perform the migration. Before starting, however, let me summarize the history of the CDO repository, which leads to some of the specialities we have to deal during the migration.

CDO has been initially created and has lived for a long time in a CVS repository. Thus, we have worked with CVS branches and tags during a long period of our development. Then, the CVS repository has been replaced by SVN and the repository has been migrated and restructured to fit in the new SVN scheme. Also, to make everything cleaner, we have renamed and restructured branches and tags. We have also made use of SVN's capability to organize branches and tags in a hierarchical way (e.g. /branches/maintenance/2.0 or /tags/drops/S20100523-1540). Unfortunately, it became clear that hierarchical branches and tags would cause problems in the build and release infrastructure and so the structure was once again reworked into a flat structure, but this time with a canonical naming scheme to keep tags and branches manageable.

The challenge with SVN is here that it remembers tags and branches as they are at any point in time, even if they do not exist in that form in a later revision. In other words, if an SVN tag (or branch) exists for a particular revision, it will always exist if this revision is checked out later, even though the tag or branch might have been renamed, moved, or deleted in a later revision (and it is, therefore, not "visible" in the repository browser of the current SVN repository). For example, consider a branch in our repository which has existed as sw-rangebased-step1, then as swinkler/rangebased-step1 and finally as swinkler-rangebased-step1, depending on which revision you check out, the branch is still known by one of its older names. For the migration from SVN to Git this means that the migrated Git repository will end up containing all three branches. The same applies to tags as well.

In addition to this, at some point, three artificial branches have been created in the CDO SVN repository: INFRASTRUCTURE, INCUBATING, and DEPRECATED. The purpose of these branches was to keep projects in the repository, which should not interfere with the main repository, for example, because they contain deprecated code. The additional goal of the migration to Git was to factor out these branches into separate repositories.

But enough talk about challenges, lets dive into the migration process. This process should be executable both locally and on a remote shell. For the CDO migration, an Internet linux server has been used (to have a suitably fast connection for the SVN access). So let's go ...

Step 1: Initializing the Git repository

We initialize an empty repository with

git svn init --no-metadata -s https://dev.eclipse.org/svnroot/modeling/org.eclipse.emf.cdo

This just creates a .git directory in the target directory which contains the config file. The initial configuration looks like this:

[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
        autocrlf = false
[svn-remote "svn"]
        noMetadata = 1
        url = https://dev.eclipse.org/svnroot/modeling/org.eclipse.emf.cdo
        fetch = trunk:refs/remotes/trunk
        branches = branches/*:refs/remotes/*
        tags = tags/*:refs/remotes/tags/*

Because of the deep branches and tags directory structure present in the history of our SVN repository, we also add additional mappings. We could have specified these at the command line above, but that leads to a rather large command and it's easier to edit the config file with a text editor. We append the following lines to the svn-remote section:

        tags = tags/bugs/*:refs/remotes/tags/svn-bugs/*
        tags = tags/drops/*:refs/remotes/tags/svn-drops/*
        tags = tags/estepper/*:refs/remotes/tags/svn-estepper/*
        tags = tags/smcduff/*:refs/remotes/tags/svn-smcduff/*
        tags = tags/swinkler/*:refs/remotes/tags/svn-swinkler/*
        branches = branches/bugs/*:refs/remotes/svn-bugs/*
        branches = branches/cdegroot/*:refs/remotes/svn-cdegroot/*
        branches = branches/swinkler/*:refs/remotes/svn-swinkler/*
        branches = branches/estepper/*:refs/remotes/svn-estepper/*
        branches = branches/mfluegge/*:refs/remotes/svn-mfluegge/*
        branches = branches/mtaal/*:refs/remotes/svn-mtaal/*
        branches = branches/scmduff/*:refs/remotes/svn-smcduff/*

Next, we have to create a mapping file to map SVN committers to Git identities. This mapping file is called authors file. It contains entries like this:

(no author) = estepper <This email address is being protected from spambots. You need JavaScript enabled to view it.>
estepper = estepper <This email address is being protected from spambots. You need JavaScript enabled to view it.>
swinkler = swinkler <This email address is being protected from spambots. You need JavaScript enabled to view it.>

The first line is to map all anonymous commits (which have, e.g., been created by CVS to SVN migration scripts) to Eike's identity. The other lines just add the email address to the committer user IDs. The authors file must contain an entry for each committer in the SVN repository. Else the fetch operation in the next step will fail. To make the authors file known to Git, we have to issue

git config svn.authorsfile authors

Step 2: Initially importing the SVN repository to Git

Now it's time to import the SVN history into the Git repo. This step involves the simple command

git svn fetch

and a long time of waiting (around 12 hours for the CDO repository). Git will go through the SVN history from the first revision to the latest and will in turn commit each revision to the Git repository one by one. This takes several hours and if you are doing this on a remote server, you'd better use a screen session for this to be immune to network connection losses.

Step 3: Adjust SVN tags and branches

The git-svn module we have used to populate our repository more or less creates a 1:1 clone of the SVN structures in the Git repository. Per default, the connection between this clone and the upstream SVN repository is even bidirectional: you can work locally in your git repository and perform svn commits using git svn dcommit. On the other hand, you can still use git svn fetch or git svn rebase to update your local git repository.

The downside of this is that our new Git repository does not actually look like a plain Git repository: The branches are still remote refs, the SVN tags are represented as git branches, and our main branch is called trunk. Therefore, we need to

convert all remote branches to local branches (for each SVN branch execute git checkout $branch; git checkout -b $branch)
convert all SVN tag branches to native Git tags (for each SVN tag execute git checkout $tagBranch; git tag $tag)
convert the branch trunk to master (git checkout trunk; git branch -D master; git checkout -f -b master)

As this can be a lot of typing with many tags and branches, there are already scripts to perform these steps. These scripts are usually called svn2git and are written in Ruby or in Perl. I took mine (a perl script by Michael C. Schwern) from https://github.com/schwern/svn2git.git and invoked

svn2git --no-clone

Note that the --no-clone option skips the repository cloning, as we have already have a cloned repository. (The reason, I did not use svn2git to clone the repository was the complex branch structure described initially. The svn2git cloning might work well for you in which case you could replace the previous steps by a simple call to svn2git).

After this step is done, it is a good idea to make a backup of the complete repository by simply copying the directory to some other place.

Step 4: Creating the factored-out repositories for infrastructure, incubating, and deprecated

Now is the time to factor out the three branches infrastructure, incubating and deprecated into separate repositories.

Performing this step is quite easy with git: We initialize a new, empty repository and pull the desired branch into this repository:

git init org.eclipse.emf.cdo.deprecated.git
cd org.eclipse.emf.cdo.deprecated.git/
git pull ../org.eclipse.emf.cdo/ DEPRECATED
rm .git/FETCH_HEAD # remove the trace to the original repo

That's it: we have a new repository only containing the history of the DEPRECATED branch from SVN. The repository is already prepared for the final wrap-up steps (see Step 7). (Of cource, the same steps have to be done for INFRASTRUCTURE and INCUBATING as well).

Lets come back to our main repository:

Step 5: Remove the SVN remote

Now everything we need is contained in our local Git repository. So it is time to cut the umbilical cord and remove the references to the SVN repository. Once again we open the .git/config file in an editor and remove the svn-remote and svn sections including all their options. Also, the authors file created in Step 1 is no longer needed, so we can delete it from the filesystem as well.

Furthermore, the SVN branches are still present as remote refs. As we also have them in our local Git repository thanks to the svn2git script, we can get rid of the remote refs:

git branch -rd `git branch -r` # please mind the backticks!

Step 6: Clean up and restructure branches and tags

Because the CDO repository had been restructured multiple times (as described initially), the Git repository contains several obsolete branches and tags. Additionally, we want to have a new canonical and hierarchical naming scheme for our branches and tags, namely

drops/Xxxxxxxxx-xxxx for build tags
committers/<commiterName>/xxxx for committer tags
bugs/nnnnnn for feature branches and bug fixes
streams/n.n-maintenance for maintenance branches
committers/<commiterName>/xxxx for committer branches

To easily perform the cleanup, I have created a perl script which reads a file with branches and tags, respectively and which performs the necessary rename and move actions. This is the perl sourcecode of the script:

cleanup-gitmig.pl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120	#!/usr/bin/perl use strict; my $file = ""; my @field = (); delete_tags(); delete_branches(); rename_tags(); rename_branches(); print "Finished successfully.\n"; exit; sub run { print ">> @_\n"; system @_; my $exit = $? >>8; die "@_ exited with $exit" if $exit; return 1; } sub delete_branches { open( INFILE, "branches.csv" ) or die("Can not open input file: $!"); while ( $file = <INFILE> ) { @field = parse_csv($file); chomp(@field); my $branch = $field[0]; my $newBranch = $field[1]; if (($newBranch eq "DEL" )) { print "Deleting branch $branch\n"; run("git","branch","-D",$branch); } } close(INFILE); } sub rename_branches { open( INFILE, "branches.csv" ) or die("Can not open input file: $!"); while ( $file = <INFILE> ) { @field = parse_csv($file); chomp(@field); my $branch = $field[0]; my $newBranch = $field[1]; if (not ($newBranch eq "DEL" )) { print "Renaming branch $branch to tag $newBranch\n"; run("git","branch","-m",$branch,$newBranch); } } close(INFILE); } sub delete_tags { open( INFILE, "tags.csv" ) or die("Can not open input file: $!"); while ( $file = <INFILE> ) { @field = parse_csv($file); chomp(@field); my $tag = $field[0]; my $newTag = $field[1]; if (($newTag eq "DEL" )) { print "Deleting tag $tag\n"; run("git","tag","-d","$tag"); } } close(INFILE); } sub rename_tags { open( INFILE, "tags.csv" ) or die("Can not open input file: $!"); while ( $file = <INFILE> ) { @field = parse_csv($file); chomp(@field); my $tag = $field[0]; my $newTag = $field[1]; if (not ($newTag eq "DEL" )) { print "Renaming tag $tag to tag $newTag\n"; run("git","tag",$newTag,$tag); run("git","tag","-d","$tag"); } } close(INFILE); } sub parse_csv { my $text = shift; my @new = (); push( @new, $+ ) while $text =~ m{ "([^\"\\](?:\\.[^\"\\])*)",? \| ([^,]+),? \| , }gx; push( @new, undef ) if substr( $text, -1, 1 ) eq ','; return @new; }

cleanup-gitmig.pl

#!/usr/bin/perl
use strict;
 
my $file            = "";
my @field           = ();
 
delete_tags();
delete_branches();
rename_tags();
rename_branches();
 
print "Finished successfully.\n";
 
exit;
 
sub run {
    print ">> @_\n";
    system @_;
 
    my $exit = $? >>8;
    die "@_ exited with $exit" if $exit;
 
    return 1;
}
 
 
sub delete_branches {
    open( INFILE, "branches.csv" )
      or die("Can not open input file: $!");
 
    while ( $file = <INFILE> ) {
        @field = parse_csv($file);
        chomp(@field);
 
        my $branch = $field[0];
        my $newBranch = $field[1];
 
        if (($newBranch eq "DEL" )) {
            print "Deleting branch $branch\n";
            run("git","branch","-D",$branch);
        }
    }
 
    close(INFILE);
}
 
sub rename_branches {
    open( INFILE, "branches.csv" )
      or die("Can not open input file: $!");
 
    while ( $file = <INFILE> ) {
        @field = parse_csv($file);
        chomp(@field);
 
        my $branch = $field[0];
        my $newBranch = $field[1];
 
        if (not ($newBranch eq "DEL" )) {
            print "Renaming branch $branch to tag $newBranch\n";
            run("git","branch","-m",$branch,$newBranch);
        }
    }
 
    close(INFILE);
 
}
 
sub delete_tags {    
    open( INFILE, "tags.csv" )
      or die("Can not open input file: $!");
 
    while ( $file = <INFILE> ) {
        @field = parse_csv($file);
        chomp(@field);
 
        my $tag = $field[0];
        my $newTag = $field[1];
 
        if (($newTag eq "DEL" )) {
            print "Deleting tag $tag\n";
            run("git","tag","-d","$tag");
        }
    }
 
    close(INFILE);
}
 
sub rename_tags {    
    open( INFILE, "tags.csv" )
      or die("Can not open input file: $!");
 
    while ( $file = <INFILE> ) {
        @field = parse_csv($file);
        chomp(@field);
 
        my $tag = $field[0];
        my $newTag = $field[1];
 
        if (not ($newTag eq "DEL" )) {
            print "Renaming tag $tag to tag $newTag\n";
            run("git","tag",$newTag,$tag);
            run("git","tag","-d","$tag");
        }    
    }
 
    close(INFILE);
}
 
sub parse_csv {
    my $text = shift;
    my @new  = ();
    push( @new, $+ ) while $text =~ m{
       "([^\"\\]*(?:\\.[^\"\\]*)*)",?
           |  ([^,]+),?
           | ,
       }gx;
    push( @new, undef ) if substr( $text, -1, 1 ) eq ',';
    return @new;
}

To produce the input files for this script, we let git give us a list of all branches and tags:

git tag > tags.csv
git branch > branches.csv

Then we edit the files (we have to make sure to delete the "* master" line from the branches.csv file as we don't want to touch the master branch) and to each line we add a comma and either the magic string DEL if we want the tag/branch deleted or we the new name if we want the tag/branch renamed.

For CDO, part of the tags file looks as follows:

drop-M20111007-0410,/drops/M20111007-0410
drop-S20110923-0630,/drops/S20110923-0630
drop-S20110927-0522,/drops/S20110927-0522
drops,DEL
eike-initial001,DEL
eike-initial002,DEL
estepper-2.0-end-of-maintenance,/committers/estepper/2.0-end-of-maintenance
estepper-before-revision-holder,/committers/estepper/before-revision-holder

After finishing the files, we just invoke the perl script:

./cleanup-gitmig.pl

and voilà: we have a nice and clean repository.

Step 7: Wrap up everything and deploy to git.eclipse.org

At this point we have four repositories, which are basically ready to be used. Before deploying them to git.eclipse.org, we should convert them to bare repositories. To do this, I have followed the steps mentioned in http://stackoverflow.com/questions/2199897/git-convert-normal-to-bare-repository:

mv .git ..
rm -rf *
mv ../.git/* .
rmdir ../.git
git config --bool core.bare true

And, as a last step, we should set a description for our repositories:

echo Git repository of the org.eclipse.emf.cdo project > description

Now we can zip the repositories, upload them to a suitable location and ask the Eclipse Webmasters nicely to deploy the new repositories (as we have done in Bug 360970).

Conclusion

In this blog entry, I have described the steps we took to migrate the CDO repository from SVN to Git. Your problems or requirements may be different, but I hope that one or two steps help you in migrating your project.

However, we are not entirely done yet. We are still working on our initial workspace setup workflow and our build system. But the basic migration is done. Hooray!

Details: Category: Eclipse; Published: 15 October 2011

Dr. Stefan Winkler
freier Softwareentwickler und IT-Berater

Stefan Winkler's Blog