## mirroring a gitolite setup Mirroring git repos is essentially a one-liner. For each mirror you want to update, you just add a post-receive hook that says #!/bin/bash git push --mirror slave_user@mirror.host:/path/to/repo.git But life is never that simple... **This document has been tested using a 3-server setup, all installed using the "non-root" method (see doc/1-INSTALL.mkd). However, the process is probably not going to be very forgiving of human error -- like anything that is this deep in "system admin" territory, errors are likely to be costly. If you're the kind who hits enter first and then thinks about what he typed, you're in for some fun times ;-)** **On the plus side, everything we do is done using git commands, so things are never *really* lost until you do a `git gc`**. ---- **Update 2011-03-10**: I wrote this with a typical "corporate" setup in mind where all the servers involved are owned and administered by the same group of people. As a result, the scripts assume the servers trust each other completely. If that is not your situation, you will have to add code into `gl-mirror-shell` to limit the commands the remote may send. Patches welcome :-) ---- In this document: * RULE NUMBER ONE! * things that will NOT be mirrored by this process * conventions in this document * setting up mirroring * install gitolite on all servers * generate keypairs * setup the mirror-shell on each server * set slaves to slave mode * set slave server lists * efficiency versus paranoia * syncing the mirrors the first time * switching over * the return of foo * switching back * making foo a slave * URLs that your users will use ### RULE NUMBER ONE! **RULE OF GIT MIRRORING: users should push directly to only one server**! All the other machines (the slaves) should be updated by the master server. If a user pushes directly to one of the slaves, those changes will get wiped out on the next mirror push from the real master server. Corollary: if the primary went down and you effected a changeover, you must make sure that the primary does not come up in a push-enabled mode when it recovers. ### things that will NOT be mirrored by this process Let's get this out of the way. This procedure will only mirror your git repositories, using `git push --mirror`. Therefore, certain files will not be mirrored: * gitolite log files * "gl-creator" and "gl-perms" files * "projects.list", "description", and entries in the "config" files within each repo None of these affect actual repo contents of course, but they could be important, (especially the gl-creator, although if your wildcard pattern had "CREATOR" in it you can recreate those files easily enough anyway). Your best bet is to use rsync for the log files, and tar for the others, at regular intervals. ### conventions in this document The userid hosting gitolite is `gitolite` on all machines. The servers are foo, bar, and baz. At the beginning, foo is the master, the other 2 are slaves. ### setting up mirroring #### install gitolite on all servers * before running the final step in the install sequence, make sure you go to the `hooks/common` directory and rename `post-receive.mirrorpush` to `post-receive`. See doc/hook-propagation.mkd if you're not sure where you should look for `hooks/common`. * if the server already has gitolite installed, use the normal methods to make sure this hook gets in. * Use the same "admin key" on all the machines, so that the same person has gitolite-admin access to all of them. #### generate keypairs Each server will be potentially logging on to one or more of the other servers, so first generate keypairs on each of them (`ssh-keygen`) and copy the `.pub` files to all other servers, named appropriately. So foo will have bar.pub and baz.pub, etc. #### setup the mirror-shell on each server XXX review this document after testing mirroring... If you installed gitolite using the from client method, run the following: # on foo export GL_BINDIR=$HOME/.gitolite/src cat bar.pub baz.pub | sed -e 's,^,command="'$GL_BINDIR'/gl-mirror-shell" ,' >> ~/.ssh/authorized_keys If you installed using any of the other 3 methods do this: # on foo export GL_BINDIR=`gl-query-rc GL_BINDIR` cat bar.pub baz.pub | sed -e 's,^,command="'$GL_BINDIR'/gl-mirror-shell" ,' >> ~/.ssh/authorized_keys Also do the same thing on the other machines. Now test this access: # on foo ssh gitolite@bar pwd # should print /home/gitolite/repositories ssh gitolite@bar uname -a # should print the appropriate info for that server Similarly test the other combinations. #### set slaves to slave mode Set slave mode on all the *slave* servers by setting `$GL_SLAVE_MODE = 1` (uncommenting the line if necessary). Leave the master server's file as is. #### set slave server lists On the master (foo), set the names of the slaves by editing the `~/.gitolite.rc` to contain: $ENV{GL_SLAVES} = 'gitolite@bar gitolite@baz'; **Note the syntax well; this is critical**: * **this must be in single quotes** (or you must remember to escape the `@`) * the variable is an ENV var, not a plain perl var * the values are *space separated* * each value represents the userid and hostname for one server The basic idea is that this string, should be usable in both the following syntaxes: git clone gitolite@bar:repo ssh gitolite@bar pwd You can also use ssh host aliases. Let's say server "bar" has a non-standard port number: # in ~/.ssh/config on foo host mybar hostname bar user gitolite port 2222 # in ~/.gitolite.rc on foo $ENV{GL_SLAVES} = 'bar gitolite@baz'; And that's really all there is, unless... ### efficiency versus paranoia If you're paranoid enough to use mirrors, you should be paranoid enough to like the `receive.fsckObjects` setting we now default to :-) However, informal tests indicate a 40-50% CPU overhead from this. If you don't like that, remove that line from the post-receive code. Please also note that we only set it on mirrors, and that too at the time the mirrored repo is *created*. This means, when you start using your old "main" server as a mirror (see later sections on switching over to a mirror, etc.), it's repos do not have this setting. Repos created by previous versions of gitolite also will not have this setting. Personally, I just set `git config --global receive.fsckObjects true`, since those servers aren't doing anything else anyway, and are idle for long stretches of time. It's upto you what you want to do here. ### syncing the mirrors the first time This is fine if you're setting up everything from scratch. But if your master server already had some repos with commits on them, you have to manually sync them up once. # on foo gl-mirror-sync gitolite@bar # path to "sync" program is ~/.gitolite/src if "from-client" install ### switching over Let's say foo goes down. You want to make bar the main server, and continue to have "baz" be a slave. * on bar, edit `~/.gitolite.rc` and set $GL_SLAVE_MODE = 0; $ENV{GL_SLAVES} = 'gitolite@baz'; * **sanity check**: go to your gitolite-admin clone, add a remote for "bar", fetch it, and make sure they are the same: git remote add bar gitolite@bar:gitolite-admin git fetch bar git branch -a -v # check that all SHAs are the same * inform everyone of the new URL for their repos (see next section for more on this) * make sure that if "foo" does come up, it will not immediately start serving requests. You'll be in trouble if (a) foo comes up as it was before, and (b) some developer still had the old URL lying around and started pushing changes to it. You could jump in quickly and set `$GL_SLAVE_MODE = 1` as soon as the system comes up. Better still, use extraneous means to block incoming connections from normal users (out of scope for this document). ### the return of foo #### switching back Switching back is fairly easy. * synchronise all repos from bar to foo. This may take some time, depending on how long foo was down. # on bar gl-mirror-sync gitolite@foo # path to "sync" program is ~/.gitolite/src if "from-client" install * turn off pushes on "bar" by setting slave mode to 1 * run the sync once again; this should complete quickly * **double check by comparing some the repos on both sides if needed**. You could run the following snippet on all servers for a quick check: cd ~/repositories # or wherever $REPO_BASE is find . -type d -name "*.git" | sort | while read r do echo $r git ls-remote $r | sort done | md5sum * on foo, set the slave list (or check that it is correct) * on foo, set slave mode off * tell everyone to switch back #### making foo a slave If "foo" does come up in a controlled manner, you might not want to switch back right away. Unless you're doing DNS tricks, users may be peeved at having to do 2 switches. If you want to make foo a slave, you know the drill by now: * set slave mode to 1 on foo * on bar, add foo as a slave # in ~/.gitolite.rc on bar $ENV{GL_SLAVES} = 'gitolite@foo gitolite@baz'; I think that should cover pretty much everything. I *have* tested most of this, but YMMV. ---- ### URLs that your users will use Unless you play DNS tricks, it is more than likely that your users would have to change the URLs they use to access their repos if you change the server they push to. I cannot speak for the plethora of git client software out there but for normal git, this problem can be mitigated somewhat by doing this: * in `~/.ssh/config` on my workstation, I have host gl hostname=primary.server.ip user=gitolite * all my `git clone` commands use `gl:reponame` as the URL * if the primary goes down, and I have to access the secondary, I just change the `hostname` line in `~/.ssh/config`. That's it. Every clone of every repo used anywhere in this userid is now changed. To repeat, this may or may not work with all the git clients that exist (like jgit, or any of the GUI tools, and especially if you're on Windows). If anyone has a better idea, something that works more universally, I'd love to hear it.