gitolite/doc/3-faq-tips-etc.mkd

# assorted faqs, tips, and notes on gitolite

In this document:

  * common errors and mistakes
  * git version dependency
  * other errors, warnings, notes...
  * getting a tar file from a clone
  * differences from gitosis
      * simpler syntax
      * two levels of access rights checking
      * error checking the config file
      * delegating parts of the config file
      * easier to specify gitweb/daemon access
      * better logging
      * one user, many keys
      * support for git installed outside default PATH
      * who am I?
  * other cool things
      * "personal" branches
  * design choices
      * keeping the parser and the access control separate
      * why we don't do "excludes"

### common errors and mistakes

  * forgetting to suffix `.git` to the end of the reponame in the `git clone`.
    This suffix is *not* used in the gitolite config file for the sake of
    clarity and cleaner syntax, but don't let that fool you.  It's a
    convention in the git world that **bare repos** end with `.git`.

  * adding `repositories/` at the start of the repo name in the `git clone`.
    This error is typically made by the *admin* himself -- because he knows
    what `$REPO_BASE` is set to and thinks he has to provide that prefix on
    the client side also :-)  In fact gitolite prepends `$REPO_BASE` when it
    is required anyway, so you shouldn't do the same thing!

### git version dependency

Here's a workaround for a version dependency that the normal flow of gitolite
has.

When you edit your config file to create a new repo, and push the changes to
the server, gitolite creates an empty, bare repo for you.  Normally, you're
expected to clone this on the client side, and start working -- make your
first commit(s), then push, etc.

However, cloning an empty repo requires a server side git version that is at
least 1.6.2.  Gitolite detects this when creating a repo, and warns you.

The workaround is to use the older (gitosis-style) method on the client:
create an empty repo locally, make a commit or two, set an "origin" remote,
and then push.  Something like:

    mkdir my-new-project
    cd    my-new-project
    git init
    git commit --allow-empty -m 'Initial repository'
    # or, if your client side git is too old for --allow-empty, just make some
    # files, "git add" them, then "git commit"
    git remote add origin git@gitolite-server:my-new-project.git
    git push origin master:master

Once this is done, the repo is available for cloning by anyone else in the
normal way, since it's not empty anymore.

### other errors, warnings, notes...

  * cloning an empty repo is only possible with clients greater than 1.6.2.
    So at least one of your clients needs to have a recent git.  Once at least
    one commit has been made, older clients can also use it

  * when you clone an empty repo, git seems to complain about the remote
    hanging up or something.  I have no idea what that is, but it doesn't seem
    to hurt anything.  This happens even in normal git, not just gitolite.
    [Update 2009-09-14; this has been fixed in git 1.6.4.3]

  * gitweb not able to read your repos?  You can change the umask for newly
    created repos to something more relaxed -- see the `~/.gitolite.rc` file

### getting a tar file from a clone

You can clone the repo from github or indefero, then execute a make command to
extract a tar file of the branch you want.  Please use the make command, not a
plain "git archive", because the Makefile adds a file called
`.GITOLITE-VERSION` that will help you identify which version you are using.

    git clone git://github.com/sitaramc/gitolite.git
            # (OR)
    git clone git://sitaramc.indefero.net/sitaramc/gitolite.git
    cd gitolite
    make master.tar
    # or maybe "make rebel.tar" or "make pu.tar"

### differences from gitosis

Apart from the big ones listed in the top level README, and subjective ones
like "better config file format", there are some small, but significant and
concrete, differences from gitosis.

#### simpler syntax

The basic syntax is simpler and cleaner but it goes beyond that: **you can
specify access in bits and pieces**, even if they overlap.

Some access needs are best grouped by repo, some by username, and some by
both.  So just do all of them, and gitolite will combine all the access lists!
Here's an example:

    # define groups of people
    @bosses     = phb1 phb2 phb3
    @devs       = dev1 dev2 dev3
    @interns    = int1 int2 int3

    # define groups of projects
    @open       = git gitolite linux rakudo
    @closed     = c1 c2 c3
    @topsecret  = ts1 ts2 ts3

    # all bosses have read access to all projects
    repo @open @closed @topsecret
        R   =   @bosses

    # everyone has read access to "open" projects
    repo @open
        R   =   @bosses @devs @interns

    [...or any other combination you want...]

    # later in the file:

    # specify access for individual repos (like RW, RW+, etc)
    repo c1
        [...]

    [...etc...]

If you notice that `@bosses` are given read access to `@open` via both rules,
do not worry that this causes some duplication or inefficiency.  It doesn't
:-)

See the "specify gitweb/daemon access" section below for one more example.

#### two levels of access rights checking

Gitolite has two levels of access checks.  The **first check** is what I will
call the **pre-git** level (this is the only check that gitosis has).  At this
stage, the `gl-auth-command` has been invoked by `sshd`, and it knows just
three things:

  * who,
  * what repository, and
  * what type of access (R or W)

Note that at this point no git program has entered the picture, and we have no
way of knowing what **ref** (branch, tag, etc) he is trying to update, even if
it is a "write" operation.

For a "read" operation to pass this check, the username (or `@all`) must be
mentioned on some line in the config for this repo.

For a "write" operation, there is an additional restriction: lines specifying
only `R` (read access) don't count.  *The user must have write access to
**some** ref in the repo in order to pass this stage!*

The **second check** is via a git `update hook`.  This check only happens for
write operations.  By this time we know what "ref" he is trying to update, as
well as the old and the new SHAs of that ref (by which we can also deduce
whether it's a rewind or not).  This is where the "per-branch" permissions
come into play.

Each refex that allows `W` access (or `+` if this is a rewind) for *this*
user, on *this* repo, is matched against the actual refname being updated.  If
any of the refexes match, the push succeeds.  If none of them match, it fails.

#### error checking the config file

gitosis does not do any.  I just found out that if you mis-spell `members` as
`member`, gitosis will silently ignore it, and leave you wondering why access
was denied.

Gitolite "compiles" the config file first and keyword typos *are* caught so
you know right away.

#### delegating parts of the config file

You can now split up the config file and delegate the authority to specify
access control for their own pieces.  See
[doc/5-delegation.mkd](http://github.com/sitaramc/gitolite/blob/pu/doc/5-delegation.mkd)
for details.

#### easier to specify gitweb/daemon access

Specifying gitweb and/or daemon access for a repo is simple: give "read"
permissions to two special usernames: `gitweb` and `daemon`.

You can also keep these pieces separate from the detailed, branch level access
for each repo, if you like, since you can write the access control specs in
bits and pieces.  Here's an example, using short repo names for convenience:

    # maybe near the top of the file, for ease of access:

    @only_web       = r1 r2 r3
    @only_daemon    = r4 r5 r6
    @web_and_daemon = r7 r8 r9

    repo @only_web
        R   = gitweb
    repo @only_daemon
        R   = daemon
    repo @web_and_daemon
        R   = gitweb
        R   = daemon

    # ...maybe much later in the file:

    repo r1
        # normal developer access lists for r1 and its branches/tags in the
        # usual way

    repo r2
    # ...and so on...

#### better logging

If you have been too liberal with the permission to rewind, it has built-in
logging as an emergency fallback if someone goes too far, or for audit
purposes [`*`].  The logfile names and location are configurable, and can
include the year/month/day etc in the filename for easy archival or further
processing.  The log file even tells you which pattern in the config file
matched to allow that specific access to proceed.

>   [`*`] setting `core.logAllRefUpdates true` does provide a safety net
>   against over-zealous rewinds, but it does not tell you "who".  And
>   strangely, management does not seem to share the view that "blame" is just
>   a synonym for "annotate" ;-)]

The log lines look like this:

    2009-09-19.10:24:37  +  b4e76569659939  4fb16f2a88d8b5  myrepo refs/heads/master       user2   refs/heads/master

The "+" at the start indicates a non-fast forward update, in this case from
b4e76569659939 to 4fb16f2a88d8b5.  So b4e76569659939 is the one to restore!
Can it get easier?

The other parts of the log line are the name of the repo, the refname being
updated, the user updating it, and the refex pattern (from the config file)
that matched, in case you need to debug the config file itself.

#### one user, many keys

I have a laptop and a desktop I need to access the server from.  I have
different private keys on them, but as far as gitolite is concerned both of
them should be treated as "sitaram".  How does this work?

In gitosis, the admin creates a single "sitaram.pub" containing one line for
each of my pubkeys.  In gitolite, we keep them separate: "sitaram@laptop.pub"
and "sitaram@desktop.pub".  The part before the "@" is the username, so
gitolite knows these two keys belong to the same person.

Note that you don't say "sitaram@laptop" and so on in the **config** file --
as far as the config file is concerned there's just **one** user called
"sitaram" -- so you only say "sitaram" there.  Only the **pubkey files** have
the extra "@" stuff.

I think this is easier to maintain if you have to delete or change one of
those keys.

#### support for git installed outside default PATH

The normal solution is to add to the system default PATH somehow, either by
munging `/etc/profile` or by enabling `PermitUserEnvironment` in
`/etc/ssh/sshd_config` and then setting the PATH in `~/.ssh/.environment`.
All these are security risks because they allow a lot more than just you and
your git install :-)

And if you don't have root, you can't do this anyway.

The only solution till now has been to ask every client to set the config
parameters `remote.<name>.receivepack` and `remote.<name>.uploadpack`.  But
telling *every* client to do so is a pain...

Gitolite lets you specify the directory in which git binaries are to be found,
via a new variable (`$GIT_PATH`) in the "rc" file.  If this variable is
non-empty, it will be appended to the PATH environment variable before
attempting to run git stuff.

Very easy, very simple, and completely transparent to the users :-)

#### who am I?

As a developer, I send a file called `id_rsa.pub` to the gitolite admin.  He
would rename it to "sitaram.pub" and put it in the key directory.  Then he'd
add "sitaram" to the config file for the repos which I have access to.

But he could have called me "foobar" instead of "sitaram" -- as long as he
uses it consistently, it'll all work the same and look the same to me, because
the public key is all that matters.

So do I have no reason to know what the admin named me?  Well -- maybe (see
next section for one possible use).  Anyway how do I find out?

In gitolite, it's simple: just ask nicely :-)

    $ ssh git@my.gitolite.server
    PTY allocation request failed on channel 0
    no SSH_ORIGINAL_COMMAND?  I'm not a shell, sitaram!

### other cool things

#### "personal" branches

"personal" branches are great for corporate environments, where
unauthenticated pull/clone is a no-no.  Since a dev workstation cannot do
authentication, even work shared just between 2 devs has to go *via* the
server.  This causes the same branch name clutter as in a centralised VCS,
plus setting up permissions for this becomes a chore for the admin.

gitolite lets you define a "personal" or "scratch" namespace prefix for
each developer (e.g., `refs/personal/<devname>/*`), with full
permissions for that dev and read-only for everyone else.  And you get
this without adding a single line to the access config file -- pretty
much fire and forget as far as the admin is concerned, even if there is
constant churn in the project teams.

Not bad for something that took just *one* line of code to implement.
And that's one clean, readable, line, by the way ;-)

The admin would set `$PERSONAL_BRANCH_PREFIX` in the rc file and communicate
this to all users.  It could be something like `refs/heads/personal`, which
means all such branches will show up in `git branch` lookups and `git clone`
will fetch them.  Or he could use, say, `refs/personal`, which means it won't
show up in any normal "branch-y" commands and stuff, and generally be much
less noisy.

**Note that a user who has NO write access cannot have personal branches**; if
you read the section (above) on "two levels of access rights checking" you'll
understand why.

For instance, in the following example, `user3` cannot push to any
`refs/heads/personal/user3/*` branches because the first level check stops him
cold:

    # assume $PERSONAL = 'refs/heads/personal' in ~/.gitolite.rc
    repo myrepo
        RW+ master      = sitaram
        RW+ release     = qa_guy
        RW              = user1 user2
        R               = user3

If we relax that check, *any* access becomes *write* access.  Yes it will be
caught later, by the hook, but it's good practice to catch things in multiple
places.

If you want `user3` to have his own personal branch, but without write access
to any of the "real" branches (like "master", "release", etc.), just use a
dummy branch.  Choose a name that will never exist in practice, or even if
someone creates it, we don't care.  For example, this will get him past the
first check:

        RW dummy        = user3

Just don't *show* the user this config file; it might sound insulting :-)

### design choices

#### keeping the parser and the access control separate

There are two programs concerned with access control:

  * `gl-auth-command`, the program that is run via `~/.ssh/authorized_keys`;
    this decides whether git should even be allowed to run (basic R/W/no
    access).  (This one cannot decide on the branch-level access; it is not
    known at this point what branch is being accessed)
  * the update-hook on each repo, which decides the per-branch permissions

I have chosen to keep the relatively complex task of parsing the config file
out of them to keep them simpler (and faster).  So any changes to the config
have to be first "compiled", and the access control programs use this
"compiled" version of the config.  (The compile step also refreshes
`~/.ssh/authorized_keys`).

If you choose the "easy install" method, all this is quite transparent to you
anyway.  If you cannot use the easy install and must install manually, I have
clear instructions on how to set it up.

#### why we don't do "excludes"

[umm... having said all this, I implemented it anyway; see the "rebel"
branch!]

I found an error in the example conf file.  This snippet *seems* to say that
"bruce" can write versioned tags (`refs/tags/v[0-9].*`), but the other
staffers can't:

        @staff = bruce whitfield martin
                [... and later ...]
        RW refs/tags/v[0-9].*   = bruce
        RW refs/tags            = @staff

But that's not how the matching works.  As long as any refex matches the
refname being updated, it's a "yes".  So the second refex lets anyone on
`@staff` create versioned tags, not just Bruce.

One way to fix this is to allow "excludes" -- some changes in syntax, combined
with a rigorous, ordered, interpretation would do it.

But if you're ever played with squid ACLs, or the include/exclude rules for
rsync, or rdiff-backup, or even git's own ignore mechanism, you'll see why I
won't do this.  It bloats the code and the docs, and, despite all the docs,
*still* confuses people, which may then *reduce* security!

Squid, rsync, gitignore, and all *need* the feature and so tolerate all this;
but we don't need it.  All we need to do is make the refexes *disjoint* in
what they match (i.e., ensure that no refname can be matched by more than one
refex):

        RW refs/tags/v[0-9].*   = bruce
        RW refs/tags/staff/     = @staff

In general, you probably want to control the refnames writable by devs anyway,
if at least to maintain some sanity, so being forced to make the refexes
disjoint is not a big problem.  Here's an example: only the `project_lead` can
make arbitrarily named refs, while the rest have to stay within their assigned
namespaces:

        RW+                     =   project_lead
        RW  refs/tags/qa/       =   @qa_team
        RW  bugID/              =   @dev_team
        RW  trac/               =   @dev_team

The lack of overlap between refexes ensures ***no confusion*** in specifying,
understanding, and ***auditing***, what is allowed and what is not.

And in security, "no confusion" is a good thing :-)