maintaining a partial copy of a repo...

...with gl-pre-git and update.secondary hooks
2011-11-08 20:08:44 +05:30 · 2011-11-08 20:08:44 +05:30 · f3eae5e170
commit f3eae5e170
parent 5858ecb56e
6 changed files with 448 additions and 20 deletions
--- a/contrib/partial-copy/gl-pre-git
+++ b/contrib/partial-copy/gl-pre-git
@ -0,0 +1,51 @@
+#!/usr/bin/perl
+use strict;
+use warnings;
+
+# called from gitolite before any git operations are run
+
+# "we", "our repo"  =>  the partial copy
+# "main", "pco"     =>  the one which we are a "partial copy of"
+
+my $main=`git config --file $ENV{GL_REPO_BASE_ABS}/$ENV{GL_REPO}.git/config --get gitolite.partialCopyOf`;
+chomp ($main);
+
+exit 0 unless $main;
+
+die "ENV GL_RC not set\n" unless $ENV{GL_RC};
+die "ENV GL_BINDIR not set\n" unless $ENV{GL_BINDIR};
+
+unshift @INC, $ENV{GL_BINDIR};
+require gitolite or die "parse gitolite.pm failed\n";
+gitolite->import;
+
+# go to the main repo.  Find a list of all the refs it has, and for each one,
+# check if this user is allowed to read that ref from our repo.  If he is, add
+# it to a list.
+
+my %allowed;
+wrap_chdir("$ENV{GL_REPO_BASE_ABS}/$main.git");
+for my $ref (`git for-each-ref refs/heads '--format=%(refname)'`) {
+    chomp($ref);
+    my $ret = check_access($ENV{GL_REPO}, $ref, 'R', 1);
+    $allowed{$ref} = 1 unless $ret =~ /DENIED/;
+}
+
+# now go to our repo and...
+wrap_chdir("$ENV{GL_REPO_BASE_ABS}/$ENV{GL_REPO}.git");
+
+# delete all existing refs that are not "allowed" (e.g., refs that were
+# previously allowed but now are not, due to config file/rules change)
+for my $ref (`git for-each-ref refs '--format=%(refname)'`) {
+    chomp($ref);
+    next if $allowed{$ref};
+    system("git", "update-ref", "-d", $ref);
+}
+
+# now copy all allowed branches (and their tags, implicitly)
+for my $ref (sort keys %allowed) {
+    system("git", "fetch", "-f", "$ENV{GL_REPO_BASE_ABS}/$main.git", "$ref:$ref");
+}
+
+# now allow the git operation to proceed
+exit 0
--- a/contrib/partial-copy/partial-copy.mkd
+++ b/contrib/partial-copy/partial-copy.mkd
@ -0,0 +1,130 @@
+# F=partialcopy maintaining a partial copy of a repo
+
+The regular documentation on basic access control mentions [here][rpr_] that
+it is easy to maintain two repositories if you need a (set of) branch(es) to
+be "secret", with one repo that has everything, and another that has
+everything but the secret branches.
+
+Here's how gitolite can help do that sanely, with minimal hassles for all
+concerned.  This will ensure the right branches propagate correctly when
+people pull/push -- you don't have to do anything manually after setting it up
+unless the rules change.
+
+To start with, here's a **NON-WORKING** config that merely describes what
+we're **trying** to achieve:
+
+                    # THIS WILL NOT WORK!
+    repo foo
+            -   secret-1$       =   wally
+            RW+ dev/USER/       =   wally
+            RW+                 =   dilbert alice ashok wally
+
+We want Wally the slacker to not be able to see the "secret-1" branch.
+
+The only way to do this is to have two repos -- one with and the other without
+the secret branch.
+
+<font color="gray">These two repos cannot share git objects (to save disk
+space) using hardlinks etc.  Doing so would cause a data leak if Wally decides
+to stop slacking and start hacking.  See my conversation with Shawn
+[here][gitlog1] for more on this, but it basically involves Wally finding out
+the SHA of one of the secret branches, pushing a branch that he claims to have
+built on that SHA, then fetching that branch again.
+
+It requires a serious understanding of the git transport protocol, how objects
+are sent/received, how thin packs are created, etc., to implement it.  Or to
+convince yourself that someone's implementation is correct.
+
+Meanwhile, the method described here, once you accept the disk space cost, is
+quite understandable to mere mortals like me :-)</font>
+
+In the above example you had 2 sets of read access -- (1) all branches (2) all
+branches except secret-1.  If you end up with one more set (say, "all branches
+except secret-2") then you need one more repo to handle it.  If you can afford
+the storage, the following recipe can certainly make it *manageable*.
+
+[gitlog1]: http://colabti.org/irclogger/irclogger_log/git?date=2010-09-17#l2710
+
+## first, as usual, the caveats!
+
+  * if you change the config to disallow something that used to be allowed,
+    any tags pointing to objects that Wally's repo acquired before the change,
+    will keep coming back!  That is, branch B1 had a tag T1 within it.  Later,
+    B1 was disallowed for Wally.  However, Wally's repo will still retain the
+    tag T1!
+
+    So, if you ever disallow a branch that used to be allowed, it's best to
+    purge Wally's repo manually and let it get rebuilt on the next access.
+    Just delete it from the disk, push the gitolite-admin config to force it
+    to re-create, then access it as a legitimate user.
+
+  * this recipe has not been, and will not be, tested with smart http.
+
+  * it probably won't play well with wildcard repos either; not tested.
+
+  * finally, mirroring support for such repos has not been tested too.
+
+## the basic idea
+
+The basic idea is very simple.
+
+  * one repo is the "main" one.  It contains all the branches, and is the one
+    that people with full access will use.
+
+  * the other repo (or all the other repos, if you have more than one set, as
+    described above) is a "partial copy", with only a subset of the branches
+    in the main repo.
+
+  * every time someone accesses the partial copy, the branches that that user
+    is allowed to read are fetched from the main repo.  **See note in example
+    below**.
+
+  * every time someone pushes to the partial copy, the branch being pushed is
+    sent back to the main repo before the update succeeds.
+
+The main repo is always the canonical/current one.  The others may or may not
+be uptodate.
+
+## the config file
+
+Here's what we actually need to put in the config file.  Note that the
+reponames can be whatever you want of course.
+
+    repo foo
+            RW+                 =   dilbert alice ashok
+
+    repo foo-partialcopy-1
+            -   secret-1$       =   wally
+            R                   =   wally
+            RW+ dev/USER/       =   wally
+
+            config gitolite.partialCopyOf = foo
+
+**Important notes**:
+
+  * Wally must not have any access to "foo".  Absolutely none at all.
+
+  * Wally's rules for `foo-partialcopy-1` must be written such that restricted
+    branches are denied.  You could list only the branches he's allowed to
+    read, or you could deny the ones he's not allowed and add a blanket "R"
+    for the others later, as in this example.
+
+    Note that this is the same [deny][] logic that is normally used for write
+    operations, but applied to "read" in this case.  All we're doing is using
+    this logic to determine what branches from `foo` are allowed to propagate
+    to the partial copy repo.  *This is NOT being used by git to restrict
+    reads; at the risk of repetition, git does NOT have that capability*.
+
+  * All the other users with access to `foo-partialcopy-1` must be under the
+    same restrictions as Wally.  So let's say Ashok is not allowed to view a
+    branch called "USCO".  That needs to be defined in yet another partial
+    copy repo, and `ashok` must be removed from the access list for `foo`.
+
+## the hooks
+
+The code for both hooks is included in the source directory
+`contrib/partial-copy`.  Note that this is all done *without* touching
+gitolite core at all -- we only use two hooks; both described in the [hooks][]
+section.  A pictorial representation of all the stuff gitolite runs is
+[here][flow]; it may help you understand the role that these two hooks are
+playing in this scenario.
--- a/contrib/partial-copy/t.sh
+++ b/contrib/partial-copy/t.sh
@ -0,0 +1,203 @@
+#!/bin/bash
+
+# test script for partial copy feature
+
+# WARNING 1: will wipe out your gitolite.conf file (you can recover by usual
+# git methods if you need of course).
+
+# WARNING 2: will wipe out (completely) the following directories:
+
+    rm -rf ~/repositories/{foo,foo-pc}.git ~/td
+
+# REQUIRED 1: please make sure rc file allows config 'gitolite.partialCopyOf'.
+
+# REQUIRED 2: please make sure you copied the 2 hooks in contrib/partial-copy
+# and installed them into gitolite
+
+# REQUIRED 3: the 'git-test' command from my 'git-tools' project
+
+# ----
+
+set -e
+mkdir ~/td
+
+# ----
+
+cd ~/gitolite-admin
+
+cat << EOF1 > conf/gitolite.conf
+    repo    gitolite-admin
+            RW+     =   tester
+
+    repo    testing
+            RW+     =   @all
+EOF1
+
+git test '## setup base conf' 'add conf' 'commit -m start' 'commit-empty' 'ok' 'push -f' 'ok'
+
+cat << EOF2 >> conf/gitolite.conf
+
+    repo foo
+            RW+                 =   u1 u2
+
+    repo foo-pc
+            -   secret-1$       =   u4
+            R                   =   u4  # marker 01
+            RW  next            =   u4
+            RW+ dev/USER/       =   u4
+            RW  refs/tags/USER/ =   u4
+
+            config gitolite.partialCopyOf = foo
+
+EOF2
+
+git test << SETUP
+    ## setup partial-repos conf
+    add conf; commit -m partial-repos; commit-empty; ok;
+    # /master.*partial-repos/
+    push;  ok;
+        /Init.*empty.*foo\\.git/
+        /Init.*empty.*foo-pc\\.git/
+        /u3.*u5.*u6/; !/u1/; !/u2/; !/u4/
+
+SETUP
+
+cd ~/td; rm -rf foo foo-pc
+
+git test << FOO
+    ## populate repo foo, by user u1
+    # create foo with a bunch of branches and tags
+    clone u1:foo
+        /appear.*cloned/
+    cd foo
+    a1; a2
+    checkout -b dev/u1/foo; f1; f2
+    checkout master; m1; m2
+    checkout master; checkout -b next; n1; n2; tag nt1
+    checkout -b secret-1; s11; s12; tag s1t1
+    checkout next; checkout -b secret-2; s21; s22; tag s2t1
+    push --all
+        /new branch/; /secret-1/; /secret-2/
+    push --tags
+        /new tag/; /s1t1/; /s2t1/
+FOO
+
+git test << FOOPC
+    ## user u4 tries foo, fails, tries foo-pc
+    cd $HOME/td
+    clone u4:foo foo4; !ok
+        /R access for foo DENIED to u4/
+    clone u4:foo-pc ; ok;
+        /Cloning into foo-pc/
+        /new branch.* dev/u1/foo .* dev/u1/foo/
+        /new branch.* master .* master/
+        /new branch.* next .* next/
+        /new branch.* secret-2 .* secret-2/
+        !/new branch.* secret-1 .* secret-1/
+        /new tag.* nt1 .* nt1/
+        /new tag.* s2t1 .* s2t1/
+        !/new tag.* s1t1 .* s1t1/
+
+FOOPC
+
+git test << FOOPC2
+    ## user u4 pushes to foo-pc
+    cd $HOME/td/foo-pc
+    checkout master
+    u4m1; u4m2; push; !ok
+        /W refs/heads/master foo-pc u4 DENIED by fallthru/
+        /hook declined to update refs/heads/master/
+        /To u4:foo-pc/
+        /remote rejected/
+        /failed to push some refs to 'u4:foo-pc'/
+
+    checkout next
+    u4n1; u4n2
+    push origin next; ok
+        /To /home/gl-test/repositories/foo.git/
+        /new branch\]      ca3787119b7e8b9914bc22c939cefc443bc308da -> br-\d+/
+        /u4:foo-pc/
+        /52c7716..ca37871  next -> next/
+    tag u4/nexttag; push --tags
+        /To u4:foo-pc/
+        /\[new tag\]         u4/nexttag -> u4/nexttag/
+        /\[new branch\]      ca3787119b7e8b9914bc22c939cefc443bc308da -> br-\d+/
+
+    checkout master
+    checkout -b dev/u4/u4master
+    devu4m1; devu4m2
+    push origin HEAD; ok
+        /To /home/gl-test/repositories/foo.git/
+        /new branch\]      228353950557ed1eb13679c1fce4d2b4718a2060 -> br-\d+/
+        /u4:foo-pc/
+        /new branch.* HEAD -> dev/u4/u4master/
+
+FOOPC2
+
+git test << FOO2
+    ## user u1 gets u4's updates, makes some more
+    cd $HOME/td/foo
+    git remote update
+        /Fetching origin/
+        /From u1:foo/
+        /new branch\]      dev/u4/u4master -> origin/dev/u4/u4master/
+        /new tag\]         u4/nexttag -> u4/nexttag/
+        /52c7716..ca37871  next       -> origin/next/
+    checkout master; u1ma1; u1ma2;
+        /\[master 8ab1ff5\] u1ma2 at Thu Jul  7 06:23:20 2011/
+    tag mt2; push-om; ok
+    checkout secret-1; u1s1b1; u1s1b2
+        /\[secret-1 5f96cb5\] u1s1b2 at Thu Jul  7 06:23:20 2011/
+    tag s1t2; push origin HEAD; ok
+    checkout secret-2; u1s2b1; u1s2b2
+        /\[secret-2 1ede682\] u1s2b2 at Thu Jul  7 06:23:20 2011/
+    tag s2t2; push origin HEAD; ok
+    push --tags; ok
+
+    git ls-remote origin
+        /8ab1ff512faf5935dc0fbff357b6f453b66bb98b\trefs/tags/mt2/
+        /5f96cb5ff73c730fb040eb2d01981f7677ca6dba\trefs/tags/s1t2/
+        /1ede6829ec7b75a53cd6acb7da64e5a8011e6050\trefs/tags/s2t2/
+FOO2
+
+git test << FOOPC3
+    ## u4 gets updates but without the tag in secret-1
+    cd $HOME/td/foo-pc
+    git ls-remote origin;
+        !/ refs/heads/secret-1/; !/s1t1/; !/s1t2/
+        /8ab1ff512faf5935dc0fbff357b6f453b66bb98b\tHEAD/
+        /8ced4a374b3935bac1a5ba27ef8dd950bd867d47\trefs/heads/dev/u1/foo/
+        /228353950557ed1eb13679c1fce4d2b4718a2060\trefs/heads/dev/u4/u4master/
+        /8ab1ff512faf5935dc0fbff357b6f453b66bb98b\trefs/heads/master/
+        /ca3787119b7e8b9914bc22c939cefc443bc308da\trefs/heads/next/
+        /1ede6829ec7b75a53cd6acb7da64e5a8011e6050\trefs/heads/secret-2/
+        /8ab1ff512faf5935dc0fbff357b6f453b66bb98b\trefs/tags/mt2/
+        /52c7716c6b029963dd167c647c1ff6222a366499\trefs/tags/nt1/
+        /01f04ece6519e7c0e6aea3d26c7e75e9c4e4b06d\trefs/tags/s2t1/
+        /1ede6829ec7b75a53cd6acb7da64e5a8011e6050\trefs/tags/s2t2/
+
+    git remote update
+        /3ea704d..8ab1ff5  master     -> origin/master/
+        /01f04ec..1ede682  secret-2   -> origin/secret-2/
+        /\[new tag\]         mt2        -> mt2/
+        /\[new tag\]         s2t2       -> s2t2/
+        !/ refs/heads/secret-1/; !/s1t1/; !/s1t2/
+
+FOOPC3
+
+git ls-remote u4:foo-pc
+
+cd ~/gitolite-admin
+perl -ni -e 'print unless /marker 01/' conf/gitolite.conf
+git test 'add conf' 'commit -m erdel' 'ok' 'push -f' 'ok'
+
+git ls-remote u4:foo-pc
+
+cat <<RANT
+
+This is where things go all screwy.  Because we still have the *objects*
+pointed to by these tags, we still get them back from the main repo.
+
+<sigh>
+
+RANT
--- a/contrib/partial-copy/update.secondary
+++ b/contrib/partial-copy/update.secondary
@ -0,0 +1,32 @@
+#!/usr/bin/perl
+use strict;
+use warnings;
+
+# called from gitolite before any git operations are run
+
+# "we", "our repo"  =>  the partial copy
+# "main", "pco"     =>  the one which we are a "partial copy of"
+
+my $main=`git config --file $ENV{GL_REPO_BASE_ABS}/$ENV{GL_REPO}.git/config --get gitolite.partialCopyOf`;
+chomp ($main);
+
+exit 0 unless $main;
+
+die "ENV GL_RC not set\n" unless $ENV{GL_RC};
+die "ENV GL_BINDIR not set\n" unless $ENV{GL_BINDIR};
+
+unshift @INC, $ENV{GL_BINDIR};
+require gitolite or die "parse gitolite.pm failed\n";
+gitolite->import;
+
+my ($ref, $old, $new) = @ARGV;
+my $rand = int(rand(100000000));
+
+$ENV{GL_BYPASS_UPDATE_HOOK} = 1;
+system("git", "push", "-f", "$ENV{GL_REPO_BASE_ABS}/$main.git", "$new:refs/heads/br-$rand") and die "FATAL: failed to send $new\n";
+
+wrap_chdir("$ENV{GL_REPO_BASE_ABS}/$main.git");
+system("git", "update-ref", "-d", "refs//heads/br-$rand");
+system("git", "update-ref", $ref, $new, $old) and die "FATAL: update-ref for $ref failed\n";
+
+exit 0;
--- a/doc/gitolite.conf-by-example.mkd
+++ b/doc/gitolite.conf-by-example.mkd
@ -98,11 +98,11 @@ branch or tag on it.
 Wally can only read the repo.  Alice and Ashok can push but not rewind; only
 Sitaram and Dilbert can do that.

-            R master    =   wally       # MEANINGLESS!  WILL NOT DO WHAT YOU THINK IT DOES!!
+And now, a common misunderstanding:

-This won't work.  You can only restrict "read" access at the repo level, not
-at the branch level.  This is a git issue, not a gitolite issue.  Go bother
-them, or switch to gerrit.
+            R master    =   wally       # WILL NOT DO WHAT YOU THINK IT DOES!!
+
+This won't work.  Please see [here][rpr_] for more on this.

    repo    foo
            RW      master$             =   dilbert alice
--- a/doc/gitolite.conf.mkd
+++ b/doc/gitolite.conf.mkd
@ -86,22 +86,6 @@ tag with a new value.

 In a later section you'll see some more advanced permissions.

-<font color="gray">
-
-Side note: apparently it needs to be spelled out that "R" permissions can only
-apply to the entire repo and not to individual branches/tags.  Mention was
-made of a certain popular Linux distribution named after animals with
-adjectives, chosen merely for alliterative purposes, prefixed to their names,
-and of their users not being clueful enough to know that this (the "read"
-thing, not the alliterative adjective thing, in case you lost track) is an
-inherent git characteristic.
-
-Meanwhile, people who *desperately* need this are directed to gerrit, which
-can do this because they have their own git stack and dont use the one written
-by Linus and currently maintained by Junio.
-
-</font>
-
 ### how rules are matched

 It's important to understand that there're two levels at which access control
@ -229,6 +213,34 @@ documentation for [`~/.gitolite.rc`][rc].

 When used as a reponame, it includes all repos.

+### F=rpr_ side note: "R" permissions for refs
+
+You can control "read" access only at the repo level, not at the branch level.
+For example, this **won't** limit Wally to reading only the master branch:
+
+    repo foo
+        R master    =   wally       # WILL NOT DO WHAT YOU THINK IT DOES!!
+
+and this **won't** prevent him from reading it:
+
+    repo foo
+        - master    =   wally       # WILL NOT DO WHAT YOU THINK IT DOES!!
+        R           =   wally
+
+This (inability to distinguish one ref from another during a read operation)
+is a git issue, not a gitolite issue.
+
+There are 3 ways around this, though:
+
+  * switch to gerrit, which has its own git stack, its own sshd, and God knows
+    what else.  All written in Java, the COBOL of the internet era ;-)
+  * bug the git people to add this feature in ;-)
+  * use a separate repo for Wally.
+
+Using separate repos is not that hard with gitolite.  Here's how to maintain a
+[partial copy][partialcopy] of the main repo and keep it synced (while not
+allowing the secret branches into it).
+
 ## F=aac advanced access control

 The previous section is sufficient for most common needs, but gitolite can go