The day has finally come. After months and days of living with the inconveniences of SVN (which, actually, we used to paise and worship after the days of CVS...), we decided to finally migrate the CDO repositories from SVN to Git.
In this blog entry, I will describe the steps we took to perform the migration. Before starting, however, let me summarize the history of the CDO repository, which leads to some of the specialities we have to deal during the migration.
CDO has been initially created and has lived for a long time in a CVS repository. Thus, we have worked with CVS branches and tags during a long period of our development. Then, the CVS repository has been replaced by SVN and the repository has been migrated and restructured to fit in the new SVN scheme. Also, to make everything cleaner, we have renamed and restructured branches and tags. We have also made use of SVN's capability to organize branches and tags in a hierarchical way (e.g. /branches/maintenance/2.0 or /tags/drops/S20100523-1540). Unfortunately, it became clear that hierarchical branches and tags would cause problems in the build and release infrastructure and so the structure was once again reworked into a flat structure, but this time with a canonical naming scheme to keep tags and branches manageable.
The challenge with SVN is here that it remembers tags and branches as they are at any point in time, even if they do not exist in that form in a later revision. In other words, if an SVN tag (or branch) exists for a particular revision, it will always exist if this revision is checked out later, even though the tag or branch might have been renamed, moved, or deleted in a later revision (and it is, therefore, not "visible" in the repository browser of the current SVN repository). For example, consider a branch in our repository which has existed as sw-rangebased-step1, then as swinkler/rangebased-step1 and finally as swinkler-rangebased-step1, depending on which revision you check out, the branch is still known by one of its older names. For the migration from SVN to Git this means that the migrated Git repository will end up containing all three branches. The same applies to tags as well.
In addition to this, at some point, three artificial branches have been created in the CDO SVN repository: INFRASTRUCTURE, INCUBATING, and DEPRECATED. The purpose of these branches was to keep projects in the repository, which should not interfere with the main repository, for example, because they contain deprecated code. The additional goal of the migration to Git was to factor out these branches into separate repositories.
But enough talk about challenges, lets dive into the migration process. This process should be executable both locally and on a remote shell. For the CDO migration, an Internet linux server has been used (to have a suitably fast connection for the SVN access). So let's go ...
Step 1: Initializing the Git repository
We initialize an empty repository with
git svn init --no-metadata -s https://dev.eclipse.org/svnroot/modeling/org.eclipse.emf.cdo
This just creates a .git directory in the target directory which contains the config file. The initial configuration looks like this:
[core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true autocrlf = false [svn-remote "svn"] noMetadata = 1 url = https://dev.eclipse.org/svnroot/modeling/org.eclipse.emf.cdo fetch = trunk:refs/remotes/trunk branches = branches/*:refs/remotes/* tags = tags/*:refs/remotes/tags/*
Because of the deep branches and tags directory structure present in the history of our SVN repository, we also add additional mappings. We could have specified these at the command line above, but that leads to a rather large command and it's easier to edit the config file with a text editor. We append the following lines to the svn-remote section:
tags = tags/bugs/*:refs/remotes/tags/svn-bugs/* tags = tags/drops/*:refs/remotes/tags/svn-drops/* tags = tags/estepper/*:refs/remotes/tags/svn-estepper/* tags = tags/smcduff/*:refs/remotes/tags/svn-smcduff/* tags = tags/swinkler/*:refs/remotes/tags/svn-swinkler/* branches = branches/bugs/*:refs/remotes/svn-bugs/* branches = branches/cdegroot/*:refs/remotes/svn-cdegroot/* branches = branches/swinkler/*:refs/remotes/svn-swinkler/* branches = branches/estepper/*:refs/remotes/svn-estepper/* branches = branches/mfluegge/*:refs/remotes/svn-mfluegge/* branches = branches/mtaal/*:refs/remotes/svn-mtaal/* branches = branches/scmduff/*:refs/remotes/svn-smcduff/*
Next, we have to create a mapping file to map SVN committers to Git identities. This mapping file is called authors file. It contains entries like this:
(no author) = estepper <
estepper = estepper <
swinkler = swinkler <
The first line is to map all anonymous commits (which have, e.g., been created by CVS to SVN migration scripts) to Eike's identity. The other lines just add the email address to the committer user IDs. The authors file must contain an entry for each committer in the SVN repository. Else the fetch operation in the next step will fail. To make the authors file known to Git, we have to issue
git config svn.authorsfile authors
Step 2: Initially importing the SVN repository to Git
Now it's time to import the SVN history into the Git repo. This step involves the simple command
git svn fetch
and a long time of waiting (around 12 hours for the CDO repository). Git will go through the SVN history from the first revision to the latest and will in turn commit each revision to the Git repository one by one. This takes several hours and if you are doing this on a remote server, you'd better use a screen session for this to be immune to network connection losses.
Step 3: Adjust SVN tags and branches
The git-svn module we have used to populate our repository more or less creates a 1:1 clone of the SVN structures in the Git repository. Per default, the connection between this clone and the upstream SVN repository is even bidirectional: you can work locally in your git repository and perform svn commits using git svn dcommit
. On the other hand, you can still use git svn fetch
or git svn rebase
to update your local git repository.
The downside of this is that our new Git repository does not actually look like a plain Git repository: The branches are still remote refs, the SVN tags are represented as git branches, and our main branch is called trunk. Therefore, we need to
- convert all remote branches to local branches (for each SVN branch execute
git checkout $branch; git checkout -b $branch
) - convert all SVN tag branches to native Git tags (for each SVN tag execute
git checkout $tagBranch; git tag $tag
) - convert the branch trunk to master (
git checkout trunk; git branch -D master; git checkout -f -b master
)
As this can be a lot of typing with many tags and branches, there are already scripts to perform these steps. These scripts are usually called svn2git
and are written in Ruby or in Perl. I took mine (a perl script by Michael C. Schwern) from https://github.com/schwern/svn2git.git and invoked
svn2git --no-clone
Note that the --no-clone
option skips the repository cloning, as we have already have a cloned repository. (The reason, I did not use svn2git
to clone the repository was the complex branch structure described initially. The svn2git cloning might work well for you in which case you could replace the previous steps by a simple call to svn2git).
After this step is done, it is a good idea to make a backup of the complete repository by simply copying the directory to some other place.
Step 4: Creating the factored-out repositories for infrastructure, incubating, and deprecated
Now is the time to factor out the three branches infrastructure, incubating and deprecated into separate repositories.
Performing this step is quite easy with git: We initialize a new, empty repository and pull the desired branch into this repository:
git init org.eclipse.emf.cdo.deprecated.git cd org.eclipse.emf.cdo.deprecated.git/ git pull ../org.eclipse.emf.cdo/ DEPRECATED rm .git/FETCH_HEAD # remove the trace to the original repo
That's it: we have a new repository only containing the history of the DEPRECATED branch from SVN. The repository is already prepared for the final wrap-up steps (see Step 7). (Of cource, the same steps have to be done for INFRASTRUCTURE and INCUBATING as well).
Lets come back to our main repository:
Step 5: Remove the SVN remote
Now everything we need is contained in our local Git repository. So it is time to cut the umbilical cord and remove the references to the SVN repository. Once again we open the .git/config
file in an editor and remove the svn-remote and svn sections including all their options. Also, the authors file created in Step 1 is no longer needed, so we can delete it from the filesystem as well.
Furthermore, the SVN branches are still present as remote refs. As we also have them in our local Git repository thanks to the svn2git
script, we can get rid of the remote refs:
git branch -rd `git branch -r` # please mind the backticks!
Step 6: Clean up and restructure branches and tags
Because the CDO repository had been restructured multiple times (as described initially), the Git repository contains several obsolete branches and tags. Additionally, we want to have a new canonical and hierarchical naming scheme for our branches and tags, namely
drops/Xxxxxxxxx-xxxx
for build tagscommitters/<commiterName>/xxxx
for committer tagsbugs/nnnnnn
for feature branches and bug fixesstreams/n.n-maintenance
for maintenance branchescommitters/<commiterName>/xxxx
for committer branches
To easily perform the cleanup, I have created a perl script which reads a file with branches and tags, respectively and which performs the necessary rename and move actions. This is the perl sourcecode of the script:
cleanup-gitmig.pl | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
#!/usr/bin/perl use strict; my $file = ""; my @field = (); delete_tags(); delete_branches(); rename_tags(); rename_branches(); print "Finished successfully.\n"; exit; sub run { print ">> @_\n"; system @_; my $exit = $? >>8; die "@_ exited with $exit" if $exit; return 1; } sub delete_branches { open( INFILE, "branches.csv" ) or die("Can not open input file: $!"); while ( $file = <INFILE> ) { @field = parse_csv($file); chomp(@field); my $branch = $field[0]; my $newBranch = $field[1]; if (($newBranch eq "DEL" )) { print "Deleting branch $branch\n"; run("git","branch","-D",$branch); } } close(INFILE); } sub rename_branches { open( INFILE, "branches.csv" ) or die("Can not open input file: $!"); while ( $file = <INFILE> ) { @field = parse_csv($file); chomp(@field); my $branch = $field[0]; my $newBranch = $field[1]; if (not ($newBranch eq "DEL" )) { print "Renaming branch $branch to tag $newBranch\n"; run("git","branch","-m",$branch,$newBranch); } } close(INFILE); } sub delete_tags { open( INFILE, "tags.csv" ) or die("Can not open input file: $!"); while ( $file = <INFILE> ) { @field = parse_csv($file); chomp(@field); my $tag = $field[0]; my $newTag = $field[1]; if (($newTag eq "DEL" )) { print "Deleting tag $tag\n"; run("git","tag","-d","$tag"); } } close(INFILE); } sub rename_tags { open( INFILE, "tags.csv" ) or die("Can not open input file: $!"); while ( $file = <INFILE> ) { @field = parse_csv($file); chomp(@field); my $tag = $field[0]; my $newTag = $field[1]; if (not ($newTag eq "DEL" )) { print "Renaming tag $tag to tag $newTag\n"; run("git","tag",$newTag,$tag); run("git","tag","-d","$tag"); } } close(INFILE); } sub parse_csv { my $text = shift; my @new = (); push( @new, $+ ) while $text =~ m{ "([^\"\\]*(?:\\.[^\"\\]*)*)",? | ([^,]+),? | , }gx; push( @new, undef ) if substr( $text, -1, 1 ) eq ','; return @new; } |
To produce the input files for this script, we let git give us a list of all branches and tags:
git tag > tags.csv
git branch > branches.csv
Then we edit the files (we have to make sure to delete the "* master" line from the branches.csv file as we don't want to touch the master branch) and to each line we add a comma and either the magic string DEL if we want the tag/branch deleted or we the new name if we want the tag/branch renamed.
For CDO, part of the tags file looks as follows:
drop-M20111007-0410,/drops/M20111007-0410
drop-S20110923-0630,/drops/S20110923-0630
drop-S20110927-0522,/drops/S20110927-0522
drops,DEL
eike-initial001,DEL
eike-initial002,DEL
estepper-2.0-end-of-maintenance,/committers/estepper/2.0-end-of-maintenance
estepper-before-revision-holder,/committers/estepper/before-revision-holder
After finishing the files, we just invoke the perl script:
./cleanup-gitmig.pl
and voilĂ : we have a nice and clean repository.
Step 7: Wrap up everything and deploy to git.eclipse.org
At this point we have four repositories, which are basically ready to be used. Before deploying them to git.eclipse.org, we should convert them to bare repositories. To do this, I have followed the steps mentioned in http://stackoverflow.com/questions/2199897/git-convert-normal-to-bare-repository:
mv .git ..
rm -rf *
mv ../.git/* .
rmdir ../.git
git config --bool core.bare true
And, as a last step, we should set a description for our repositories:
echo Git repository of the org.eclipse.emf.cdo project > description
Now we can zip the repositories, upload them to a suitable location and ask the Eclipse Webmasters nicely to deploy the new repositories (as we have done in Bug 360970).
Conclusion
In this blog entry, I have described the steps we took to migrate the CDO repository from SVN to Git. Your problems or requirements may be different, but I hope that one or two steps help you in migrating your project.
However, we are not entirely done yet. We are still working on our initial workspace setup workflow and our build system. But the basic migration is done. Hooray!