Sitaram Chamarty 0316baf726 mirror code learns receive.fsckObjects

2010-10-26 20:30:10 +05:30

11 KiB

Raw Blame History

mirroring a gitolite setup

Mirroring git repos is essentially a one-liner. For each mirror you want to update, you just add a post-receive hook that says

#!/bin/bash
git push --mirror slave_user@mirror.host:/path/to/repo.git

But life is never that simple...

This document has been tested using a 3-server setup, all installed using the "non-root" method (see doc/1-INSTALL.mkd). However, the process is probably not going to be very forgiving of human error -- like anything that is this deep in "system admin" territory, errors are likely to be costly. If you're the kind who hits enter first and then thinks about what he typed, you're in for some fun times ;-)

On the plus side, everything we do is done using git commands, so things are never really lost until you do a git gc.

In this document:

RULE NUMBER ONE!
things that will NOT be mirrored by this process
conventions in this document
setting up mirroring
efficiency versus paranoia
syncing the mirrors the first time
switching over
the return of foo
- switching back
- making foo a slave
URLs that your users will use

RULE NUMBER ONE!

RULE OF GIT MIRRORING: users should push directly to only one server! All the other machines (the slaves) should be updated by the master server.

If a user pushes directly to one of the slaves, those changes will get wiped out on the next mirror push from the real master server.

Corollary: if the primary went down and you effected a changeover, you must make sure that the primary does not come up in a push-enabled mode when it recovers.

things that will NOT be mirrored by this process

Let's get this out of the way. This procedure will only mirror your git repositories, using git push --mirror. Therefore, certain files will not be mirrored:

gitolite log files
"gl-creator" and "gl-perms" files
"projects.list", "description", and entries in the "config" files within each repo

None of these affect actual repo contents of course, but they could be important, (especially the gl-creator, although if your wildcard pattern had "CREATOR" in it you can recreate those files easily enough anyway).

Your best bet is to use rsync for the log files, and tar for the others, at regular intervals.

conventions in this document

The userid hosting gitolite is gitolite on all machines. The servers are foo, bar, and baz. At the beginning, foo is the master, the other 2 are slaves.

setting up mirroring

install gitolite on all servers

before running the final step in the install sequence, make sure you go to the hooks/common directory and rename post-receive.mirrorpush to post-receive. See doc/hook-propagation.mkd if you're not sure where you should look for hooks/common.
if the server already has gitolite installed, use the normal methods to make sure this hook gets in.
Use the same "admin key" on all the machines, so that the same person has gitolite-admin access to all of them.

generate keypairs

Each server will be potentially logging on to one or more of the other servers, so first generate keypairs for all of them (ssh-keygen) and copy the .pub files to all other servers, named appropriately. So foo will have bar.pub and baz.pub, etc.

setup the mirror-shell on each server

If you installed gitolite using the from client method, run the following:

# on foo
export GL_ADMINDIR=` cd $HOME;perl -e 'do ".gitolite.rc"; print $GL_ADMINDIR'`
cat bar.pub baz.pub |
    sed -e 's,^,command="'$GL_ADMINDIR'/src/gl-mirror-shell" ,' >> ~/.ssh/authorized_keys

If you installed using any of the other 3 methods do this:

cat bar.pub baz.pub |
    sed -e 's,^,command="'$(which gl-mirror-shell)'" ,' >> ~/.ssh/authorized_keys

Also do the same thing on the other machines.

Now test this access:

# on foo
ssh gitolite@bar pwd
    # should print /home/gitolite/repositories
ssh gitolite@bar uname -a
    # should print the appropriate info for that server

Similarly test the other combinations.

set slaves to slave mode

Set slave mode on all the slave servers by setting $GL_SLAVE_MODE = 1 (uncommenting the line if necessary).

Leave the master server's file as is.

set slave server lists

On the master (foo), set the names of the slaves by editing the ~/.gitolite.rc to contain:

$ENV{GL_SLAVES} = 'gitolite@bar gitolite@baz';

Note the syntax well; this is critical:

this must be in single quotes (or you must remember to escape the @)
the variable is an ENV var, not a plain perl var
the values are space separated
each value represents the userid and hostname for one server

The basic idea is that this string, should be usable in both the following syntaxes:

git clone gitolite@bar:repo
ssh gitolite@bar pwd

You can also use ssh host aliases. Let's say server "bar" has a non-standard port number:

# in ~/.ssh/config on foo
host mybar
    hostname bar
    user gitolite
    port 2222

# in ~/.gitolite.rc on foo
$ENV{GL_SLAVES} = 'bar gitolite@baz';

And that's really all there is, unless...

efficiency versus paranoia

If you're paranoid enough to use mirrors, you should be paranoid enough to like the receive.fsckObjects setting we now default to :-) However, informal tests indicate a 40-50% CPU overhead from this. If you don't like that, remove that line from the post-receive code.

Please also note that we only set it on mirrors, and that too at the time the mirrored repo is created. This means, when you start using your old "main" server as a mirror (see later sections on switching over to a mirror, etc.), it's repos do not have this setting. Repos created by previous versions of gitolite also will not have this setting.

Personally, I just set git config --global receive.fsckObjects true, since those servers aren't doing anything else anyway, and are idle for long stretches of time. It's upto you what you want to do here.

syncing the mirrors the first time

This is fine if you're setting up everything from scratch. But if your master server already had some repos with commits on them, you have to manually sync them up once.

# on foo
gl-mirror-sync gitolite@bar
    # path to "sync" program is ~/.gitolite/src if "from-client" install

switching over

Let's say foo goes down. You want to make bar the main server, and continue to have "baz" be a slave.

on bar, edit ~/.gitolite.rc and set

$GL_SLAVE_MODE = 0;
$ENV{GL_SLAVES} = 'gitolite@baz';

sanity check: go to your gitolite-admin clone, add a remote for "bar", fetch it, and make sure they are the same:

git remote add bar gitolite@bar:gitolite-admin
git fetch bar
git branch -a -v
    # check that all SHAs are the same

inform everyone of the new URL for their repos (see next section for more on this)
make sure that if "foo" does come up, it will not immediately start serving requests. You'll be in trouble if (a) foo comes up as it was before, and (b) some developer still had the old URL lying around and started pushing changes to it.

You could jump in quickly and set $GL_SLAVE_MODE = 1 as soon as the system comes up. Better still, use extraneous means to block incoming connections from normal users (out of scope for this document).

the return of foo

switching back

Switching back is fairly easy.

synchronise all repos from bar to foo. This may take some time, depending on how long foo was down.

# on bar
gl-mirror-sync gitolite@foo
    # path to "sync" program is ~/.gitolite/src if "from-client" install

turn off pushes on "bar" by setting slave mode to 1
run the sync once again; this should complete quickly

double check by comparing some the repos on both sides if needed. You could run the following snippet on all servers for a quick check:

cd ~/repositories   # or wherever $REPO_BASE is
find . -type d -name "*.git" | sort |
while read r
do
    echo $r
    git ls-remote $r | sort
done | md5sum

on foo, set the slave list (or check that it is correct)
on foo, set slave mode off
tell everyone to switch back

making foo a slave

If "foo" does come up in a controlled manner, you might not want to switch back right away. Unless you're doing DNS tricks, users may be peeved at having to do 2 switches.

If you want to make foo a slave, you know the drill by now:

set slave mode to 1 on foo

on bar, add foo as a slave

# in ~/.gitolite.rc on bar
$ENV{GL_SLAVES} = 'gitolite@foo gitolite@baz';

I think that should cover pretty much everything. I have tested most of this, but YMMV.

URLs that your users will use

Unless you play DNS tricks, it is more than likely that your users would have to change the URLs they use to access their repos if you change the server they push to.

I cannot speak for the plethora of git client software out there but for normal git, this problem can be mitigated somewhat by doing this:

in ~/.ssh/config on my workstation, I have

host gl
    hostname=primary.server.ip
    user=gitolite

all my git clone commands use gl:reponame as the URL
if the primary goes down, and I have to access the secondary, I just change the hostname line in ~/.ssh/config.

That's it. Every clone of every repo used anywhere in this userid is now changed.

To repeat, this may or may not work with all the git clients that exist (like jgit, or any of the GUI tools, and especially if you're on Windows).

If anyone has a better idea, something that works more universally, I'd love to hear it.

11 KiB Raw Blame History