maintaining a partial copy of a repo...

...with gl-pre-git and update.secondary hooks
This commit is contained in:
Sitaram Chamarty 2011-11-08 20:08:44 +05:30
parent 5858ecb56e
commit f3eae5e170
6 changed files with 448 additions and 20 deletions

51
contrib/partial-copy/gl-pre-git Executable file
View file

@ -0,0 +1,51 @@
#!/usr/bin/perl
use strict;
use warnings;
# called from gitolite before any git operations are run
# "we", "our repo" => the partial copy
# "main", "pco" => the one which we are a "partial copy of"
my $main=`git config --file $ENV{GL_REPO_BASE_ABS}/$ENV{GL_REPO}.git/config --get gitolite.partialCopyOf`;
chomp ($main);
exit 0 unless $main;
die "ENV GL_RC not set\n" unless $ENV{GL_RC};
die "ENV GL_BINDIR not set\n" unless $ENV{GL_BINDIR};
unshift @INC, $ENV{GL_BINDIR};
require gitolite or die "parse gitolite.pm failed\n";
gitolite->import;
# go to the main repo. Find a list of all the refs it has, and for each one,
# check if this user is allowed to read that ref from our repo. If he is, add
# it to a list.
my %allowed;
wrap_chdir("$ENV{GL_REPO_BASE_ABS}/$main.git");
for my $ref (`git for-each-ref refs/heads '--format=%(refname)'`) {
chomp($ref);
my $ret = check_access($ENV{GL_REPO}, $ref, 'R', 1);
$allowed{$ref} = 1 unless $ret =~ /DENIED/;
}
# now go to our repo and...
wrap_chdir("$ENV{GL_REPO_BASE_ABS}/$ENV{GL_REPO}.git");
# delete all existing refs that are not "allowed" (e.g., refs that were
# previously allowed but now are not, due to config file/rules change)
for my $ref (`git for-each-ref refs '--format=%(refname)'`) {
chomp($ref);
next if $allowed{$ref};
system("git", "update-ref", "-d", $ref);
}
# now copy all allowed branches (and their tags, implicitly)
for my $ref (sort keys %allowed) {
system("git", "fetch", "-f", "$ENV{GL_REPO_BASE_ABS}/$main.git", "$ref:$ref");
}
# now allow the git operation to proceed
exit 0

View file

@ -0,0 +1,130 @@
# F=partialcopy maintaining a partial copy of a repo
The regular documentation on basic access control mentions [here][rpr_] that
it is easy to maintain two repositories if you need a (set of) branch(es) to
be "secret", with one repo that has everything, and another that has
everything but the secret branches.
Here's how gitolite can help do that sanely, with minimal hassles for all
concerned. This will ensure the right branches propagate correctly when
people pull/push -- you don't have to do anything manually after setting it up
unless the rules change.
To start with, here's a **NON-WORKING** config that merely describes what
we're **trying** to achieve:
# THIS WILL NOT WORK!
repo foo
- secret-1$ = wally
RW+ dev/USER/ = wally
RW+ = dilbert alice ashok wally
We want Wally the slacker to not be able to see the "secret-1" branch.
The only way to do this is to have two repos -- one with and the other without
the secret branch.
<font color="gray">These two repos cannot share git objects (to save disk
space) using hardlinks etc. Doing so would cause a data leak if Wally decides
to stop slacking and start hacking. See my conversation with Shawn
[here][gitlog1] for more on this, but it basically involves Wally finding out
the SHA of one of the secret branches, pushing a branch that he claims to have
built on that SHA, then fetching that branch again.
It requires a serious understanding of the git transport protocol, how objects
are sent/received, how thin packs are created, etc., to implement it. Or to
convince yourself that someone's implementation is correct.
Meanwhile, the method described here, once you accept the disk space cost, is
quite understandable to mere mortals like me :-)</font>
In the above example you had 2 sets of read access -- (1) all branches (2) all
branches except secret-1. If you end up with one more set (say, "all branches
except secret-2") then you need one more repo to handle it. If you can afford
the storage, the following recipe can certainly make it *manageable*.
[gitlog1]: http://colabti.org/irclogger/irclogger_log/git?date=2010-09-17#l2710
## first, as usual, the caveats!
* if you change the config to disallow something that used to be allowed,
any tags pointing to objects that Wally's repo acquired before the change,
will keep coming back! That is, branch B1 had a tag T1 within it. Later,
B1 was disallowed for Wally. However, Wally's repo will still retain the
tag T1!
So, if you ever disallow a branch that used to be allowed, it's best to
purge Wally's repo manually and let it get rebuilt on the next access.
Just delete it from the disk, push the gitolite-admin config to force it
to re-create, then access it as a legitimate user.
* this recipe has not been, and will not be, tested with smart http.
* it probably won't play well with wildcard repos either; not tested.
* finally, mirroring support for such repos has not been tested too.
## the basic idea
The basic idea is very simple.
* one repo is the "main" one. It contains all the branches, and is the one
that people with full access will use.
* the other repo (or all the other repos, if you have more than one set, as
described above) is a "partial copy", with only a subset of the branches
in the main repo.
* every time someone accesses the partial copy, the branches that that user
is allowed to read are fetched from the main repo. **See note in example
below**.
* every time someone pushes to the partial copy, the branch being pushed is
sent back to the main repo before the update succeeds.
The main repo is always the canonical/current one. The others may or may not
be uptodate.
## the config file
Here's what we actually need to put in the config file. Note that the
reponames can be whatever you want of course.
repo foo
RW+ = dilbert alice ashok
repo foo-partialcopy-1
- secret-1$ = wally
R = wally
RW+ dev/USER/ = wally
config gitolite.partialCopyOf = foo
**Important notes**:
* Wally must not have any access to "foo". Absolutely none at all.
* Wally's rules for `foo-partialcopy-1` must be written such that restricted
branches are denied. You could list only the branches he's allowed to
read, or you could deny the ones he's not allowed and add a blanket "R"
for the others later, as in this example.
Note that this is the same [deny][] logic that is normally used for write
operations, but applied to "read" in this case. All we're doing is using
this logic to determine what branches from `foo` are allowed to propagate
to the partial copy repo. *This is NOT being used by git to restrict
reads; at the risk of repetition, git does NOT have that capability*.
* All the other users with access to `foo-partialcopy-1` must be under the
same restrictions as Wally. So let's say Ashok is not allowed to view a
branch called "USCO". That needs to be defined in yet another partial
copy repo, and `ashok` must be removed from the access list for `foo`.
## the hooks
The code for both hooks is included in the source directory
`contrib/partial-copy`. Note that this is all done *without* touching
gitolite core at all -- we only use two hooks; both described in the [hooks][]
section. A pictorial representation of all the stuff gitolite runs is
[here][flow]; it may help you understand the role that these two hooks are
playing in this scenario.

203
contrib/partial-copy/t.sh Executable file
View file

@ -0,0 +1,203 @@
#!/bin/bash
# test script for partial copy feature
# WARNING 1: will wipe out your gitolite.conf file (you can recover by usual
# git methods if you need of course).
# WARNING 2: will wipe out (completely) the following directories:
rm -rf ~/repositories/{foo,foo-pc}.git ~/td
# REQUIRED 1: please make sure rc file allows config 'gitolite.partialCopyOf'.
# REQUIRED 2: please make sure you copied the 2 hooks in contrib/partial-copy
# and installed them into gitolite
# REQUIRED 3: the 'git-test' command from my 'git-tools' project
# ----
set -e
mkdir ~/td
# ----
cd ~/gitolite-admin
cat << EOF1 > conf/gitolite.conf
repo gitolite-admin
RW+ = tester
repo testing
RW+ = @all
EOF1
git test '## setup base conf' 'add conf' 'commit -m start' 'commit-empty' 'ok' 'push -f' 'ok'
cat << EOF2 >> conf/gitolite.conf
repo foo
RW+ = u1 u2
repo foo-pc
- secret-1$ = u4
R = u4 # marker 01
RW next = u4
RW+ dev/USER/ = u4
RW refs/tags/USER/ = u4
config gitolite.partialCopyOf = foo
EOF2
git test << SETUP
## setup partial-repos conf
add conf; commit -m partial-repos; commit-empty; ok;
# /master.*partial-repos/
push; ok;
/Init.*empty.*foo\\.git/
/Init.*empty.*foo-pc\\.git/
/u3.*u5.*u6/; !/u1/; !/u2/; !/u4/
SETUP
cd ~/td; rm -rf foo foo-pc
git test << FOO
## populate repo foo, by user u1
# create foo with a bunch of branches and tags
clone u1:foo
/appear.*cloned/
cd foo
a1; a2
checkout -b dev/u1/foo; f1; f2
checkout master; m1; m2
checkout master; checkout -b next; n1; n2; tag nt1
checkout -b secret-1; s11; s12; tag s1t1
checkout next; checkout -b secret-2; s21; s22; tag s2t1
push --all
/new branch/; /secret-1/; /secret-2/
push --tags
/new tag/; /s1t1/; /s2t1/
FOO
git test << FOOPC
## user u4 tries foo, fails, tries foo-pc
cd $HOME/td
clone u4:foo foo4; !ok
/R access for foo DENIED to u4/
clone u4:foo-pc ; ok;
/Cloning into foo-pc/
/new branch.* dev/u1/foo .* dev/u1/foo/
/new branch.* master .* master/
/new branch.* next .* next/
/new branch.* secret-2 .* secret-2/
!/new branch.* secret-1 .* secret-1/
/new tag.* nt1 .* nt1/
/new tag.* s2t1 .* s2t1/
!/new tag.* s1t1 .* s1t1/
FOOPC
git test << FOOPC2
## user u4 pushes to foo-pc
cd $HOME/td/foo-pc
checkout master
u4m1; u4m2; push; !ok
/W refs/heads/master foo-pc u4 DENIED by fallthru/
/hook declined to update refs/heads/master/
/To u4:foo-pc/
/remote rejected/
/failed to push some refs to 'u4:foo-pc'/
checkout next
u4n1; u4n2
push origin next; ok
/To /home/gl-test/repositories/foo.git/
/new branch\] ca3787119b7e8b9914bc22c939cefc443bc308da -> br-\d+/
/u4:foo-pc/
/52c7716..ca37871 next -> next/
tag u4/nexttag; push --tags
/To u4:foo-pc/
/\[new tag\] u4/nexttag -> u4/nexttag/
/\[new branch\] ca3787119b7e8b9914bc22c939cefc443bc308da -> br-\d+/
checkout master
checkout -b dev/u4/u4master
devu4m1; devu4m2
push origin HEAD; ok
/To /home/gl-test/repositories/foo.git/
/new branch\] 228353950557ed1eb13679c1fce4d2b4718a2060 -> br-\d+/
/u4:foo-pc/
/new branch.* HEAD -> dev/u4/u4master/
FOOPC2
git test << FOO2
## user u1 gets u4's updates, makes some more
cd $HOME/td/foo
git remote update
/Fetching origin/
/From u1:foo/
/new branch\] dev/u4/u4master -> origin/dev/u4/u4master/
/new tag\] u4/nexttag -> u4/nexttag/
/52c7716..ca37871 next -> origin/next/
checkout master; u1ma1; u1ma2;
/\[master 8ab1ff5\] u1ma2 at Thu Jul 7 06:23:20 2011/
tag mt2; push-om; ok
checkout secret-1; u1s1b1; u1s1b2
/\[secret-1 5f96cb5\] u1s1b2 at Thu Jul 7 06:23:20 2011/
tag s1t2; push origin HEAD; ok
checkout secret-2; u1s2b1; u1s2b2
/\[secret-2 1ede682\] u1s2b2 at Thu Jul 7 06:23:20 2011/
tag s2t2; push origin HEAD; ok
push --tags; ok
git ls-remote origin
/8ab1ff512faf5935dc0fbff357b6f453b66bb98b\trefs/tags/mt2/
/5f96cb5ff73c730fb040eb2d01981f7677ca6dba\trefs/tags/s1t2/
/1ede6829ec7b75a53cd6acb7da64e5a8011e6050\trefs/tags/s2t2/
FOO2
git test << FOOPC3
## u4 gets updates but without the tag in secret-1
cd $HOME/td/foo-pc
git ls-remote origin;
!/ refs/heads/secret-1/; !/s1t1/; !/s1t2/
/8ab1ff512faf5935dc0fbff357b6f453b66bb98b\tHEAD/
/8ced4a374b3935bac1a5ba27ef8dd950bd867d47\trefs/heads/dev/u1/foo/
/228353950557ed1eb13679c1fce4d2b4718a2060\trefs/heads/dev/u4/u4master/
/8ab1ff512faf5935dc0fbff357b6f453b66bb98b\trefs/heads/master/
/ca3787119b7e8b9914bc22c939cefc443bc308da\trefs/heads/next/
/1ede6829ec7b75a53cd6acb7da64e5a8011e6050\trefs/heads/secret-2/
/8ab1ff512faf5935dc0fbff357b6f453b66bb98b\trefs/tags/mt2/
/52c7716c6b029963dd167c647c1ff6222a366499\trefs/tags/nt1/
/01f04ece6519e7c0e6aea3d26c7e75e9c4e4b06d\trefs/tags/s2t1/
/1ede6829ec7b75a53cd6acb7da64e5a8011e6050\trefs/tags/s2t2/
git remote update
/3ea704d..8ab1ff5 master -> origin/master/
/01f04ec..1ede682 secret-2 -> origin/secret-2/
/\[new tag\] mt2 -> mt2/
/\[new tag\] s2t2 -> s2t2/
!/ refs/heads/secret-1/; !/s1t1/; !/s1t2/
FOOPC3
git ls-remote u4:foo-pc
cd ~/gitolite-admin
perl -ni -e 'print unless /marker 01/' conf/gitolite.conf
git test 'add conf' 'commit -m erdel' 'ok' 'push -f' 'ok'
git ls-remote u4:foo-pc
cat <<RANT
This is where things go all screwy. Because we still have the *objects*
pointed to by these tags, we still get them back from the main repo.
<sigh>
RANT

View file

@ -0,0 +1,32 @@
#!/usr/bin/perl
use strict;
use warnings;
# called from gitolite before any git operations are run
# "we", "our repo" => the partial copy
# "main", "pco" => the one which we are a "partial copy of"
my $main=`git config --file $ENV{GL_REPO_BASE_ABS}/$ENV{GL_REPO}.git/config --get gitolite.partialCopyOf`;
chomp ($main);
exit 0 unless $main;
die "ENV GL_RC not set\n" unless $ENV{GL_RC};
die "ENV GL_BINDIR not set\n" unless $ENV{GL_BINDIR};
unshift @INC, $ENV{GL_BINDIR};
require gitolite or die "parse gitolite.pm failed\n";
gitolite->import;
my ($ref, $old, $new) = @ARGV;
my $rand = int(rand(100000000));
$ENV{GL_BYPASS_UPDATE_HOOK} = 1;
system("git", "push", "-f", "$ENV{GL_REPO_BASE_ABS}/$main.git", "$new:refs/heads/br-$rand") and die "FATAL: failed to send $new\n";
wrap_chdir("$ENV{GL_REPO_BASE_ABS}/$main.git");
system("git", "update-ref", "-d", "refs//heads/br-$rand");
system("git", "update-ref", $ref, $new, $old) and die "FATAL: update-ref for $ref failed\n";
exit 0;

View file

@ -98,11 +98,11 @@ branch or tag on it.
Wally can only read the repo. Alice and Ashok can push but not rewind; only
Sitaram and Dilbert can do that.
R master = wally # MEANINGLESS! WILL NOT DO WHAT YOU THINK IT DOES!!
And now, a common misunderstanding:
This won't work. You can only restrict "read" access at the repo level, not
at the branch level. This is a git issue, not a gitolite issue. Go bother
them, or switch to gerrit.
R master = wally # WILL NOT DO WHAT YOU THINK IT DOES!!
This won't work. Please see [here][rpr_] for more on this.
repo foo
RW master$ = dilbert alice

View file

@ -86,22 +86,6 @@ tag with a new value.
In a later section you'll see some more advanced permissions.
<font color="gray">
Side note: apparently it needs to be spelled out that "R" permissions can only
apply to the entire repo and not to individual branches/tags. Mention was
made of a certain popular Linux distribution named after animals with
adjectives, chosen merely for alliterative purposes, prefixed to their names,
and of their users not being clueful enough to know that this (the "read"
thing, not the alliterative adjective thing, in case you lost track) is an
inherent git characteristic.
Meanwhile, people who *desperately* need this are directed to gerrit, which
can do this because they have their own git stack and dont use the one written
by Linus and currently maintained by Junio.
</font>
### how rules are matched
It's important to understand that there're two levels at which access control
@ -229,6 +213,34 @@ documentation for [`~/.gitolite.rc`][rc].
When used as a reponame, it includes all repos.
### F=rpr_ side note: "R" permissions for refs
You can control "read" access only at the repo level, not at the branch level.
For example, this **won't** limit Wally to reading only the master branch:
repo foo
R master = wally # WILL NOT DO WHAT YOU THINK IT DOES!!
and this **won't** prevent him from reading it:
repo foo
- master = wally # WILL NOT DO WHAT YOU THINK IT DOES!!
R = wally
This (inability to distinguish one ref from another during a read operation)
is a git issue, not a gitolite issue.
There are 3 ways around this, though:
* switch to gerrit, which has its own git stack, its own sshd, and God knows
what else. All written in Java, the COBOL of the internet era ;-)
* bug the git people to add this feature in ;-)
* use a separate repo for Wally.
Using separate repos is not that hard with gitolite. Here's how to maintain a
[partial copy][partialcopy] of the main repo and keep it synced (while not
allowing the secret branches into it).
## F=aac advanced access control
The previous section is sufficient for most common needs, but gitolite can go