Tuesday, March 8, 2011

Removing and Subsetting Repositories in git versus svn

If ever you decide to take a bit repository apart and make smaller ones, do it in git. In subversion, you first make a dump of the whole thing, then you create a new repository and use svndumpfilter to filter out what you want to add and add it with svnadmin:
svnadmin dump svnroot/ > svn.dump
svnadmin create svnroot-fixed
cd svnroot-fixed
svndumpfilter include what-to-include < ../svn.dump | svnadmin load ./

This works alright, but it's pretty slow and for any kind of complex subsetting/removal, you're going to have to do multiple passes. In the end, since I was transitioning things to git, I gave up after a little while.

Git, although it doesn't make the process entirely trivial, is a bit better. The workflow is about the same, but it's faster and more flexible. You clone the directory and then use 'git filter-branch' to do what you want to do. For example to remove a directory (or specific files):
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch $files" HEAD
Note that this won't yet delete the files from disk, since git keeps around a cache of the files. To totally delete them forever, David Underhill made a nice little script.

To make a copy of a repository only containing a directory:
git clone $PARENT $NEW_REPO
cd $NEW_REPO
git filter-branch --subdirectory-filter $CHILD -- --all
And in case it wasn't obvious from the variable names, I made a quick script to promote all the subdirectories of a repository to their own repository:
#!/bin/bash
WRKDIR=`pwd`
PARENT=$1
echo $PARENT

cd $PARENT
CHILDREN=`git ls-tree --name-only HEAD`

for CHILD in $CHILDREN; do
cd $WRKDIR
NEW_REPO="$PARENT-$CHILD"
git clone $PARENT $NEW_REPO
cd $NEW_REPO
git filter-branch --subdirectory-filter $CHILD -- --all
done

That's all for now.

No comments:

Post a Comment