gitolite/doc/big-config.mkd
2010-10-26 20:30:10 +05:30

8.9 KiB

what is a "big-config"

In this document:

when/why do we need it?

A "big config" is anything that has a few thousand users and a few thousand repos, organised into groups that are much smaller in number (like maybe a few hundreds of repogroups and a few dozens of usergroups).

So let's say you have

@wbr    =   lynx firefox
@devs   =   alice bob

repo @wbr
    RW+     next    =   @devs
    RW    master    =   @devs

Gitolite internally translates this to

repo lynx firefox
    RW+     next    =   alice bob
    RW    master    =   alice bob

Not just that -- it now generates the actual config rules once for each user-repo-ref combination (there are 8 combinations above; the compiled config file looks partly like this:

%repos = (
  'firefox' => {
    'R' => {
      'alice' => 1,
      'bob' => 1
    },
    'W' => {
      'alice' => 1,
      'bob' => 1
    },
    'alice' => [
      {
        'refs/heads/next' => 'RW+'
      },
      {
        'refs/heads/master' => 'RW'
      }
    ],
    'bob' => [
      {
        'refs/heads/next' => 'RW+'
      },
      {
        'refs/heads/master' => 'RW'
      }
    ]
  },
  'lynx' => {
    'R' => {
      'alice' => 1,
      'bob' => 1
    },
    'W' => {
      'alice' => 1,
      'bob' => 1
    },
    'alice' => [
      {
        'refs/heads/next' => 'RW+'
      },
      {
        'refs/heads/master' => 'RW'
      }
    ],
    'bob' => [
      {
        'refs/heads/next' => 'RW+'
      },
      {
        'refs/heads/master' => 'RW'
      }
    ]
  }
);

Phew!

You can imagine what that does when you have 10,000 users and 10,000 repos. Let's just say it's not pretty :)

how do we use it?

Now, if you had all those 10,000 users and repos explicitly listed (no groups), then there is no help. But if, like the above example, you had groups like we used above, there is hope.

Just set

$GL_BIG_CONFIG = 1;

in the ~/.gitolite.rc file on the server (see next section for more variables). When you do that, and push this configuration, the compiled file looks like this:

%repos = (
  '@wbr' => {
    '@devs' => [
      {
        'refs/heads/next' => 'RW+'
      },
      {
        'refs/heads/master' => 'RW'
      }
    ],
    'R' => {
      '@devs' => 1
    },
    'W' => {
      '@devs' => 1
    }
  },
);
%groups = (
  '@devs' => {
    'alice' => 'master',
    'bob' => 'master'
  },
  '@wbr' => {
    'firefox' => 'master',
    'lynx' => 'master'
  }
);

That's a lot smaller, and allows orders of magintude more repos and groups to be supported.

other optimisations

disabling various defaults

The default RC file contains the following lines (we've already discussed the first one):

$GL_BIG_CONFIG = 0;
$GL_NO_DAEMON_NO_GITWEB = 0;
$GL_NO_CREATE_REPOS = 0;
$GL_NO_SETUP_AUTHKEYS = 0;

GL_NO_DAEMON_NO_GITWEB is a very useful optimisation that you must enable if you do have a large number of repositories, and do not use gitolite's support for gitweb or git-daemon access (see "easier to specify gitweb description and gitweb/daemon access" for details). This will save a lot of time when you push the gitolite-admin repo with changes. This variable also control whether "git config" lines (such as config hooks.emailprefix = "[gitolite]") will be processed or not.

Setting this is relatively harmless to a normal installation, unlike the next two variables :-) GL_NO_CREATE_REPOS and GL_NO_SETUP_AUTHKEYS are meant for installations where some backend system already exists that does all the actual repo creation, and all the authentication setup (ssh auth keys), respectively.

Summary: Please leave those two variables alone unless you're initials are "JK" ;-)

Also note that using all 3 of the GL_NO_* variables will result in everything after the config compile being skipped. In other words, gitolite is being used only for its access control language.

optimising the authkeys file

Sshd does a linear scan of the ~/.ssh/authorized_keys file when an incoming connection shows up. This means that keys found near the top get served faster than keys near the bottom. On my laptop, it takes about 2500 keys before I notice the delay; on a typical server it could be double that, so don't worry about all this unless your user-count is in that range.

One way to deal with 5000+ keys is to use customised, database-backed ssh daemons, but many people are uncomfortable with taking non-standard versions of such a critical piece of the security infrastructure. In addition, most distributions do not make it painless to use them.

So what do you do?

The following trick uses the Pareto principle (a.k.a the "80-20 rule") to get an immediate boost in response for the most frequent or prolific developers. It can allow you to ignore the problem until the next big increase in your user counts!

Here's how:

  • create subdirectories of keydir/ called 0, 1, (maybe 2, 3, etc., also), and 9.
  • in 0/, put in the pubkeys of the most frequent users
  • in 1/, add the next most important set of users, and so on for 2, 3, etc.
  • finally, put all the rest in 9/

Make sure "9" contains at least 70-90% of the total number of pubkeys, otherwise this doesn't really help.

You can easily determine who your top users are by runnning something like this (note the clever date command that always gets you last months log file!)

cat .gitolite/logs/gitolite-`date +%Y-%m -d -30days`.log |
    cut -f2 | sort | uniq -c | sort -n -r

what are the downsides?

There is one minor issue.

If you use the delegation feature, you can no longer define or extend @groups in a fragment, for security reasons. It will also not let you use any group other than the @fragname itself (specifically, groups which contained a subset of the allowed @fragname, which would work normally, do not work now).

(If you didn't understand all that, you're probably not using delegation, so feel free to ignore it!)

storing usergroup information outside gitolite (like in LDAP)

[Please NOTE: this is all about user groups, not repo groups]

[WARNING: the earlier method of doing this has been discontinued; please see the commit message for details]

Gitolite now allows usergroup information to be stored outside its own config file. We'll see "why" first, then the "how".

why

Large sites often have LDAP servers that already contain user and group information, including group membership details. Such sites may prefer that gitolite just pick up that info instead of having to redundantly put it in gitolite's config file.

Consider this example config for one repo:

repo foo
    RW+ =   @lead_devs
    RW  =   @devs
    R   =   @interns

Normally, you would also need to specify:

@lead_devs  =   dilbert alice
@devs       =   wally
@interns    =   ashok

However, if the corporate LDAP server already tags these people correctly, and if there is some way of getting that information out at run time, that would be cool.

how

All you need is a script that, given a username, queries your LDAP or similar server, and returns a space-separated list of all the groups she is a member of. If an invalid user name is sent in, or the user is valid but is not part of any groups, it should print nothing.

This script will probably be specific to your site. [Help wanted: I don't know LDAP, so if someone wants to contribute some sample code I'd be happy to put it in contrib/, with credit of course!]

Then set the $GL_GET_MEMBERSHIPS_PGM variable in the rc file to the full path to this program, set $GL_BIG_CONFIG to 1, and that will be that.