this is so the make-gh-pages (not part of gitolite) script can boldface the ones which have "# title"
13 KiB
what is a "big-config"
In this document:
- when/why do we need it?
- how do we use it?
- other optimisations
- what are the downsides?
- storing usergroup information outside gitolite (like in LDAP)
- implementation notes
when/why do we need it?
A "big config" is anything that has a few thousand users and a few thousand repos, resulting in a very large 'compiled' config file.
To understand the problem, consider what happens if you have something like this in your gitolite conf file:
@wbr = lynx firefox
@devs = alice bob
repo @wbr
RW+ next = @devs
RW master = @devs
Without the 'big config' setting, gitolite internally translates this to:
repo lynx firefox
RW+ next = alice bob
RW master = alice bob
and then generates the actual config rules once for each user-repo-ref combination (there are 8 combinations above); the compiled config file looks somewhat like this:
%repos = (
'firefox' => {
'R' => {
'alice' => 1,
'bob' => 1
},
'W' => {
'alice' => 1,
'bob' => 1
},
'alice' => [
[
0,
'refs/heads/next',
'RW+'
],
[
4,
'refs/heads/master',
'RW'
]
],
'bob' => [
[
1,
'refs/heads/next',
'RW+'
],
[
5,
'refs/heads/master',
'RW'
]
]
},
'lynx' => {
'R' => {
'alice' => 1,
'bob' => 1
},
'W' => {
'alice' => 1,
'bob' => 1
},
'alice' => [
[
2,
'refs/heads/next',
'RW+'
],
[
6,
'refs/heads/master',
'RW'
]
],
'bob' => [
[
3,
'refs/heads/next',
'RW+'
],
[
7,
'refs/heads/master',
'RW'
]
]
}
);
Phew!
Of course, the output is the same whether you used groups (like @wbr
and
@devs
in the example above) or listed the repos directly on the 'repo'
lines.
Anyway, you can imagine what that does when you have 10,000 users and 10,000 repos. Let's just say it's not pretty :)
how do we use it?
Just set
$GL_BIG_CONFIG = 1;
in the ~/.gitolite.rc
file on the server (see next section for more
variables). When you do that, and push this configuration, one of two things
happens.
access rules for groups
If you used group names in the 'repo' lines (as in repo @wbr
), then the
compiled config looks like this:
%repos = (
'@wbr' => {
'@devs' => [
[
0,
'refs/heads/next',
'RW+'
],
[
1,
'refs/heads/master',
'RW'
]
],
'R' => {
'@devs' => 1
},
'W' => {
'@devs' => 1
}
}
);
%groups = (
'@devs' => {
'alice' => 'master',
'bob' => 'master'
},
'@wbr' => {
'firefox' => 'master',
'lynx' => 'master'
}
);
That's a lot smaller, and allows orders of magintude more repos and groups to be supported.
access rules for individual repos (split config)
If, on the other hand, you had the repos listed individually, (as in repo lynx firefox
), then the main config file would now look like this:
%repos = ();
%split_conf = (
'firefox' => 1,
'lynx' => 1
);
And each individual repo's configuration would go its own directory. For
instance, ~/repositories/lynx.git/gl-conf
would look like this:
%one_repo = (
'lynx' => {
'R' => {
'alice' => 1,
'bob' => 1
},
'W' => {
'alice' => 1,
'bob' => 1
},
'alice' => [
[
0,
'refs/heads/next',
'RW+'
],
[
4,
'refs/heads/master',
'RW'
]
],
'bob' => [
[
1,
'refs/heads/next',
'RW+'
],
[
5,
'refs/heads/master',
'RW'
]
]
}
);
That does not reduce the overall size of the repo config (because you did not group the repos), but the main repo config is now even smaller!
other optimisations
disabling various defaults
The default RC file contains the following lines (we've already discussed the first one):
$GL_BIG_CONFIG = 0;
$GL_NO_DAEMON_NO_GITWEB = 0;
$GL_NO_CREATE_REPOS = 0;
$GL_NO_SETUP_AUTHKEYS = 0;
GL_NO_DAEMON_NO_GITWEB
is a very useful optimisation that you must enable
if you do have a large number of repositories, and do not use gitolite's
support for gitweb or git-daemon access (see "this" for details). This will save a
lot of time when you push the gitolite-admin repo with changes. This variable
also controls whether "git config" lines (such as config hooks.emailprefix = "[gitolite]"
) will be processed or not.
You should be a lot more careful with GL_NO_CREATE_REPOS
and
GL_NO_SETUP_AUTHKEYS
. These are meant for installations where some backend
system already exists that does all the actual repo creation, (including
setting up the proper hooks -- very important for access control), and all the
authentication setup (ssh auth keys), respectively.
Summary: Please leave those two variables alone unless you're initials are "JK" ;-)
optimising the authkeys file
Sshd does a linear scan of the ~/.ssh/authorized_keys
file when an incoming
connection shows up. This means that keys found near the top get served
faster than keys near the bottom. On my laptop, it takes about 2500 keys
before I notice the delay; on a typical server it could be double that, so
don't worry about all this unless your user-count is in that range.
One way to deal with 5000+ keys is to use customised, database-backed ssh daemons, but many people are uncomfortable with taking non-standard versions of such a critical piece of the security infrastructure. In addition, most distributions do not make it painless to use them.
So what do you do?
The following trick uses the Pareto principle (a.k.a the "80-20 rule") to get an immediate boost in response for the most frequent or prolific developers. It can allow you to ignore the problem until the next big increase in your user counts!
Here's how:
- create subdirectories of keydir/ called 0, 1, (maybe 2, 3, etc., also), and 9.
- in 0/, put in the pubkeys of the most frequent users
- in 1/, add the next most important set of users, and so on for 2, 3, etc.
- finally, put all the rest in 9/
Make sure "9" contains at least 70-90% of the total number of pubkeys, otherwise this doesn't really help.
You can easily determine who your top users are by runnning something like this (note the clever date command that always gets you last months log file!)
cat .gitolite/logs/gitolite-`date +%Y-%m -d -30days`.log |
cut -f2 | sort | uniq -c | sort -n -r
what are the downsides?
There are some downsides. The first one applies in all cases:
-
If you use the delegation feature, you can no longer define or extend @groups in a fragment, for security reasons. It will also not let you use any group other than the @fragname itself (specifically, groups which contained a subset of the allowed @fragname, which would work normally, do not work now).
(If you didn't understand all that, you're probably not using delegation, so feel free to ignore it!)
The following apply if individual ("split") conf files are written, which in
turn only happens if you used repo names instead of group names on the repo
lines:
-
the compile (gitolite-admin push) is now slower, because it potentially has to write a few thousand small files instead of one large one. Since the compile should be relatively infrequent compared to developer access, this is ok -- the main config file is parsed much faster now, so every hit to the server will benefit.
-
we can no longer distinguish 'repo not found on disk' from 'you dont have access'. They both now look like 'you dont have access'.
storing usergroup information outside gitolite (like in LDAP)
[Please NOTE: this is all about user groups, not repo groups]
[WARNING: the earlier method of doing this has been discontinued; please see the commit message for details]
Gitolite now allows usergroup information to be stored outside its own config file. We'll see "why" first, then the "how".
why
Large sites often have LDAP servers that already contain user and group information, including group membership details. Such sites may prefer that gitolite just pick up that info instead of having to redundantly put it in gitolite's config file.
Consider this example config for one repo:
repo foo
RW+ = @lead_devs
RW = @devs
R = @interns
Normally, you would also need to specify:
@lead_devs = dilbert alice
@devs = wally
@interns = ashok
However, if the corporate LDAP server already tags these people correctly, and if there is some way of getting that information out at run time, that would be cool.
how
All you need is a script that, given a username, queries your LDAP or similar server, and returns a space-separated list of all the groups she is a member of. If an invalid user name is sent in, or the user is valid but is not part of any groups, it should print nothing.
This script will probably be specific to your site. (See contrib/ldap for some example scripts that were contributed by the Nokia MeeGo team.)
Then set the $GL_GET_MEMBERSHIPS_PGM
variable in the rc file to the full
path of this program, set $GL_BIG_CONFIG
to 1, and that will be that.
implementation notes
To understand how big-config works (at least when you're using grouped repos), we'll first look at how it works without this setting. Think back to the example at the top, and assume 'alice' is accessing the 'lynx' repo. The various rights are governed by the following hash elements:
# for the first level checks
$repos{'lynx'}{'R'}{'alice'} = 1
$repos{'lynx'}{'W'}{'alice'} = 1
# for the second level checks
$repos{'lynx'}{'alice'}{'refs/heads/master'} = 'RW';
$repos{'lynx'}{'alice'}{'refs/heads/next'} = 'RW+';
Those elements are explicitly specified in the compiled hash, as you can see (you don't need to know perl too much to read a hash; just make some educated guesses if needed!)
Now look at the compiled hash produced when GL_BIG_CONFIG
is set. In place
of both 'firefox' and 'lynx' you have '@wbr', and similarly '@devs' for both
'alice' and 'bob'. In addition, there is a group hash at the bottom that
lists each group and its members.
When 'alice' tries to access the 'lynx' repo, gitolite collects all the group names that these names belong to, so '@devs' is added to the list of 'user' names that 'alice' inherits permissions from, and '@wbr' is added to the list of 'repo' names that 'lynx' inherits from. This means that the final access inherits all permissions pertaining to the following combinations:
alice, lynx
alice, @wbr
@devs, lynx
@devs, @wbr
(Actually there are 3 more... try and guess what they may be!)
Anyway, all ACL rules for these combinations are clubbed together to make the composite set of rules that 'alice' accessing 'lynx' is subject to.