12 KiB
mirroring a gitolite setup
Mirroring git repos is essentially a one-liner. For each mirror you want to update, you just add a post-receive hook that says
#!/bin/bash
git push --mirror slave_user@mirror.host:/path/to/repo.git
But life is never that simple...
This document has been tested using a 3-server setup, all installed using the "non-root" method (see doc/1-INSTALL.mkd). However, the process is probably not going to be very forgiving of human error -- like anything that is this deep in "system admin" territory, errors are likely to be costly. If you're the kind who hits enter first and then thinks about what he typed, you're in for some fun times ;-)
On the plus side, everything we do is done using git commands, so things are
never really lost until you do a git gc
.
Update 2011-03-10: I wrote this with a typical "corporate" setup in mind
where all the servers involved are owned and administered by the same group of
people. As a result, the scripts assume the servers trust each other
completely. If that is not your situation, you will have to add code into
gl-mirror-shell
to limit the commands the remote may send. Patches welcome
:-)
In this document:
- RULE NUMBER ONE!
- things that will NOT be mirrored by this process
- conventions in this document
- setting up mirroring
- efficiency versus paranoia
- syncing the mirrors the first time
- switching over
- the return of foo
- URLs that your users will use
RULE NUMBER ONE!
RULE OF GIT MIRRORING: users should push directly to only one server! All the other machines (the slaves) should be updated by the master server.
If a user pushes directly to one of the slaves, those changes will get wiped out on the next mirror push from the real master server.
Corollary: if the primary went down and you effected a changeover, you must make sure that the primary does not come up in a push-enabled mode when it recovers.
things that will NOT be mirrored by this process
Let's get this out of the way. This procedure will only mirror your git
repositories, using git push --mirror
. Therefore, certain files will not be
mirrored:
- gitolite log files
- "gl-creator" and "gl-perms" files
- "projects.list", "description", and entries in the "config" files within each repo
None of these affect actual repo contents of course, but they could be important, (especially the gl-creator, although if your wildcard pattern had "CREATOR" in it you can recreate those files easily enough anyway).
Your best bet is to use rsync for the log files, and tar for the others, at regular intervals.
conventions in this document
The userid hosting gitolite is gitolite
on all machines. The servers are
foo, bar, and baz. At the beginning, foo is the master, the other 2 are
slaves.
setting up mirroring
install gitolite on all servers
-
before running the final step in the install sequence, make sure you go to the
hooks/common
directory and renamepost-receive.mirrorpush
topost-receive
. See doc/hook-propagation.mkd if you're not sure where you should look forhooks/common
. -
if the server already has gitolite installed, use the normal methods to make sure this hook gets in.
-
Use the same "admin key" on all the machines, so that the same person has gitolite-admin access to all of them.
generate keypairs
Each server will be potentially logging on to one or more of the other
servers, so first generate keypairs for all of them (ssh-keygen
) and copy
the .pub
files to all other servers, named appropriately. So foo will have
bar.pub and baz.pub, etc.
setup the mirror-shell on each server
XXX review this document after testing mirroring...
If you installed gitolite using the from client method, run the following:
# on foo
export GL_BINDIR=$HOME/.gitolite/src
cat bar.pub baz.pub |
sed -e 's,^,command="'$GL_BINDIR'/gl-mirror-shell" ,' >> ~/.ssh/authorized_keys
If you installed using any of the other 3 methods do this:
# on foo
export GL_BINDIR=`gl-query-rc GL_BINDIR`
cat bar.pub baz.pub |
sed -e 's,^,command="'$GL_BINDIR'/gl-mirror-shell" ,' >> ~/.ssh/authorized_keys
Also do the same thing on the other machines.
Now test this access:
# on foo
ssh gitolite@bar pwd
# should print /home/gitolite/repositories
ssh gitolite@bar uname -a
# should print the appropriate info for that server
Similarly test the other combinations.
set slaves to slave mode
Set slave mode on all the slave servers by setting $GL_SLAVE_MODE = 1
(uncommenting the line if necessary).
Leave the master server's file as is.
set slave server lists
On the master (foo), set the names of the slaves by editing the
~/.gitolite.rc
to contain:
$ENV{GL_SLAVES} = 'gitolite@bar gitolite@baz';
Note the syntax well; this is critical:
- this must be in single quotes (or you must remember to escape the
@
) - the variable is an ENV var, not a plain perl var
- the values are space separated
- each value represents the userid and hostname for one server
The basic idea is that this string, should be usable in both the following syntaxes:
git clone gitolite@bar:repo
ssh gitolite@bar pwd
You can also use ssh host aliases. Let's say server "bar" has a non-standard port number:
# in ~/.ssh/config on foo
host mybar
hostname bar
user gitolite
port 2222
# in ~/.gitolite.rc on foo
$ENV{GL_SLAVES} = 'bar gitolite@baz';
And that's really all there is, unless...
efficiency versus paranoia
If you're paranoid enough to use mirrors, you should be paranoid enough to
like the receive.fsckObjects
setting we now default to :-) However, informal
tests indicate a 40-50% CPU overhead from this. If you don't like that,
remove that line from the post-receive code.
Please also note that we only set it on mirrors, and that too at the time the mirrored repo is created. This means, when you start using your old "main" server as a mirror (see later sections on switching over to a mirror, etc.), it's repos do not have this setting. Repos created by previous versions of gitolite also will not have this setting.
Personally, I just set git config --global receive.fsckObjects true
, since
those servers aren't doing anything else anyway, and are idle for long
stretches of time. It's upto you what you want to do here.
syncing the mirrors the first time
This is fine if you're setting up everything from scratch. But if your master server already had some repos with commits on them, you have to manually sync them up once.
# on foo
gl-mirror-sync gitolite@bar
# path to "sync" program is ~/.gitolite/src if "from-client" install
switching over
Let's say foo goes down. You want to make bar the main server, and continue to have "baz" be a slave.
-
on bar, edit
~/.gitolite.rc
and set$GL_SLAVE_MODE = 0; $ENV{GL_SLAVES} = 'gitolite@baz';
-
sanity check: go to your gitolite-admin clone, add a remote for "bar", fetch it, and make sure they are the same:
git remote add bar gitolite@bar:gitolite-admin git fetch bar git branch -a -v # check that all SHAs are the same
-
inform everyone of the new URL for their repos (see next section for more on this)
-
make sure that if "foo" does come up, it will not immediately start serving requests. You'll be in trouble if (a) foo comes up as it was before, and (b) some developer still had the old URL lying around and started pushing changes to it.
You could jump in quickly and set
$GL_SLAVE_MODE = 1
as soon as the system comes up. Better still, use extraneous means to block incoming connections from normal users (out of scope for this document).
the return of foo
switching back
Switching back is fairly easy.
-
synchronise all repos from bar to foo. This may take some time, depending on how long foo was down.
# on bar gl-mirror-sync gitolite@foo # path to "sync" program is ~/.gitolite/src if "from-client" install
-
turn off pushes on "bar" by setting slave mode to 1
-
run the sync once again; this should complete quickly
-
double check by comparing some the repos on both sides if needed. You could run the following snippet on all servers for a quick check:
cd ~/repositories # or wherever $REPO_BASE is find . -type d -name "*.git" | sort | while read r do echo $r git ls-remote $r | sort done | md5sum
-
on foo, set the slave list (or check that it is correct)
-
on foo, set slave mode off
-
tell everyone to switch back
making foo a slave
If "foo" does come up in a controlled manner, you might not want to switch back right away. Unless you're doing DNS tricks, users may be peeved at having to do 2 switches.
If you want to make foo a slave, you know the drill by now:
-
set slave mode to 1 on foo
-
on bar, add foo as a slave
# in ~/.gitolite.rc on bar $ENV{GL_SLAVES} = 'gitolite@foo gitolite@baz';
I think that should cover pretty much everything. I have tested most of this, but YMMV.
URLs that your users will use
Unless you play DNS tricks, it is more than likely that your users would have to change the URLs they use to access their repos if you change the server they push to.
I cannot speak for the plethora of git client software out there but for normal git, this problem can be mitigated somewhat by doing this:
-
in
~/.ssh/config
on my workstation, I havehost gl hostname=primary.server.ip user=gitolite
-
all my
git clone
commands usegl:reponame
as the URL -
if the primary goes down, and I have to access the secondary, I just change the
hostname
line in~/.ssh/config
.
That's it. Every clone of every repo used anywhere in this userid is now changed.
To repeat, this may or may not work with all the git clients that exist (like jgit, or any of the GUI tools, and especially if you're on Windows).
If anyone has a better idea, something that works more universally, I'd love to hear it.