A code migration adventure: moving a 25 GB repo from SVN to GitHub

We’ve had our code hosted in an internal SVN repository for a long time, running on the stereotypical machine under someone’s desk. In a small team with limited branching, SVN has worked very well for us. If you serve SVN over HTTP with port forwarding, people can connect to the repository from within or outside the company network with little local reconfiguration.

Despite the advantages of this model, I recently moved our local SVN repository to GitHub. The main reasons to do this were twofold:

1. Better support for distributed development: GitHub is better at serving requests from across the world than a single machine

2. Better backup (hopefully): GitHub is less likely to lose our data than we are

At $7 per month, moving seemed like an obvious choice, but the process was more complicated than I expected. This was primarily because our repository was large (25 GB, with commits going back 3 years), and I wanted to maintain revision history.

Here are the steps I followed. If you have a similar large SVN repository, these steps should provide a useful guide on moving it to GitHub.

Export, compress and move the SVN repo

Export the SVN repository to a dump file using svnadmin dump:

svnadmin dump /path/to/repo > myRepo.dump

This exports *all revisions* of the repository, which can result in a very large file: 25 GB for our repo. I was shutting down our SVN server machine anyway, so I compressed the file and transferred it to a new machine:

tar cvzf myRepo.tar.gz myRepo.dump

My trusty Western Digital MyBook did a great job of moving the tar file to other computers.

2013-08-08 15.34.19

Extract to a working SVN repository (on a Mac)

All the decompression programs I tried on a Windows machine (7zip, Winzip, PowerArchiver) were able to uncompress the gzip file, but failed with an error when trying to untar the resulting tar file. This is probably due to the size of the file.

On a Mac, I was able to simply use the inbuilt Archive Utility program, which really means just double-clicking the file.

When this is done you should have the original myRepo.dump file. Extract this into a working SVN repository on the Mac:

svnadmin load /path/to/repository < myRepo.dump

Start the SVN daemon:

sudo svnserve --daemon --root /path/to/repo/parent

The repository is now accessible via the svn:// protocol:

svn://localhost/myRepo

Use svn2git to convert into a Git repository

svn2git is a tool for converting an SVN repo to a Git repo while maintaining branches and tags.

Install svn2git:

$ sudo gem install svn2git
Successfully installed svn2git-2.2.2
1 gem installed
Installing ri documentation for svn2git-2.2.2...
Installing RDoc documentation for svn2git-2.2.2...

Create an authors.txt file that lists the name and email of each person who committed into your repo. I used the following awk script from http://awesomegeekness.com/blog/?p=17:

$ svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); \
sub(" $", "", $2); print $2" = "$2" <"$2"@yourdomain.com>"}' \
| sort -u > authors.txt

Make a directory in which you want the Git repo to live, and move into it:

$ mkdir myGitRepo

$ cd myGitRepo

Then run svn2git and tell it to use this authors.txt file:

svn2git svn://localhost/myRepo --authors authors.txt --verbose

(This will probably take a long time.)

When it’s done, verify that the git repo is set up by doing a git log on one of your files:

$ git log myFile.txt

commit 361a778a6d3aed906d26acc6c7b93d079fa29aeb
Author: ABC <abc@yourdomain.com>
Date: Sat Jun 5 07:26:40 2013 +0000

help_movies file is moved to its parent directory for the web- based application

...

If you get similar output, you’ve successfully converted your SVN repository into a local Git repo.

Push up to GitHub

Log in to your GitHub account and create a new repo through the web interface. Add the interface as an SSH “remote” to your local repository:

git remote add origin git@github.com:MyGitUsername/MyGitRepo.git

(GitHub has a handy guide to working with repositories using SSH.)

I used SSH rather than HTTPS since it seems the more recommended mode when making the initial push for large repositories.

The next step is to try naively pushing the entire repository to the remote server:

git push -u origin master

Because the repository is large, this is likely to fail with the following error:

fatal: pack exceeds maximum allowed size

The only solution I found was to push a smaller range of commits at a time. In Git parlance, HEAD is the name for the current “tip” of your local repository, and “HEAD~n” means “n commits before the current HEAD”. So to push a series of commits to the remote server, you might do something like:


$ git push -u origin HEAD~1000:refs/heads/master

$ git push -u origin HEAD~500:master

$ git push -u origin master

This pushes all revisions from the start up to (tip-1000) to the remote server, then revisions from (tip-999) to (tip-500), and then finally the last 500 revisions.

Why /refs/heads/master for the first push? If you don’t do that, and simply use git push -u origin HEAD~1000:master, you’ll get the following error:

error: unable to push to unqualified destination: master
The destination refspec neither matches an existing ref on the remote nor
begins with refs/, and we are unable to guess a prefix based on the source ref. 

This is something called a detached head. Fully qualifying the destination for the first push allows you to use master for subsequent pushes.

Pull the repo down from GitHub to another machine

The repo is large enough that trying it to pull it down with git defaults on another machine may fail with the following error:

The remote end hung up unexpectedly

If you see this, increase the value of a setting called http.postBuffer:

git config --global http.postBuffer 524288000

This changes the size of the data “chunks” that Git uses to transfer data.

Congratulations! You should now have a working GitHub repository with all your revision history intact, and with local mirrors on a couple of machines. Another data loss disaster averted!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s