gitolite/doc/big-config.mkd

338 lines
11 KiB
Markdown
Raw Normal View History

2010-05-14 14:50:57 +02:00
# what is a "big-config"
In this document:
* <a href="#_when_why_do_we_need_it_">when/why do we need it?</a>
* <a href="#_how_do_we_use_it_">how do we use it?</a>
* <a href="#_other_optimisations">other optimisations</a>
2010-10-26 16:26:51 +02:00
* <a href="#_disabling_various_defaults">disabling various defaults</a>
* <a href="#_optimising_the_authkeys_file">optimising the authkeys file</a>
* <a href="#_what_are_the_downsides_">what are the downsides?</a>
* <a href="#_storing_usergroup_information_outside_gitolite_like_in_LDAP_">storing usergroup information outside gitolite (like in LDAP)</a>
* <a href="#_why">why</a>
* <a href="#_how">how</a>
2010-12-24 06:31:28 +01:00
* <a href="#_implementation_notes">implementation notes</a>
<a name="_when_why_do_we_need_it_"></a>
2010-05-14 14:50:57 +02:00
### when/why do we need it?
A "big config" is anything that has a few thousand users and a few thousand
repos, organised into groups that are much smaller in number (like maybe a few
hundreds of repogroups and a few dozens of usergroups).
2010-05-14 14:50:57 +02:00
So let's say you have
@wbr = lynx firefox
@devs = alice bob
repo @wbr
RW+ next = @devs
RW master = @devs
Gitolite internally translates this to
repo lynx firefox
RW+ next = alice bob
RW master = alice bob
Not just that -- it now generates the actual config rules once for each
user-repo-ref combination (there are 8 combinations above; the compiled config
file looks partly like this:
%repos = (
'firefox' => {
'R' => {
'alice' => 1,
'bob' => 1
},
'W' => {
'alice' => 1,
'bob' => 1
},
'alice' => [
{
'refs/heads/next' => 'RW+'
},
{
'refs/heads/master' => 'RW'
}
],
'bob' => [
{
'refs/heads/next' => 'RW+'
},
{
'refs/heads/master' => 'RW'
}
]
},
'lynx' => {
'R' => {
'alice' => 1,
'bob' => 1
},
'W' => {
'alice' => 1,
'bob' => 1
},
'alice' => [
{
'refs/heads/next' => 'RW+'
},
{
'refs/heads/master' => 'RW'
}
],
'bob' => [
{
'refs/heads/next' => 'RW+'
},
{
'refs/heads/master' => 'RW'
}
]
}
);
Phew!
You can imagine what that does when you have 10,000 users and 10,000 repos.
Let's just say it's not pretty :)
<a name="_how_do_we_use_it_"></a>
2010-05-14 14:50:57 +02:00
### how do we use it?
Now, if you had all those 10,000 users and repos explicitly listed (no
groups), then there is no help. But if, like the above example, you had
groups like we used above, there is hope.
Just set
$GL_BIG_CONFIG = 1;
in the `~/.gitolite.rc` file on the server (see next section for more
variables). When you do that, and push this configuration, the compiled file
looks like this:
2010-05-14 14:50:57 +02:00
%repos = (
'@wbr' => {
'@devs' => [
{
'refs/heads/next' => 'RW+'
},
{
'refs/heads/master' => 'RW'
}
],
'R' => {
'@devs' => 1
},
'W' => {
'@devs' => 1
}
},
);
%groups = (
'@devs' => {
'alice' => 'master',
'bob' => 'master'
},
'@wbr' => {
'firefox' => 'master',
'lynx' => 'master'
}
);
That's a lot smaller, and allows orders of magintude more repos and groups to
be supported.
<a name="_other_optimisations"></a>
### other optimisations
2010-10-26 16:26:51 +02:00
<a name="_disabling_various_defaults"></a>
#### disabling various defaults
The default RC file contains the following lines (we've already discussed the
first one):
$GL_BIG_CONFIG = 0;
$GL_NO_DAEMON_NO_GITWEB = 0;
$GL_NO_CREATE_REPOS = 0;
$GL_NO_SETUP_AUTHKEYS = 0;
`GL_NO_DAEMON_NO_GITWEB` is a very useful optimisation that you *must* enable
if you *do* have a large number of repositories, and do *not* use gitolite's
support for gitweb or git-daemon access (see "[easier to specify gitweb
description and gitweb/daemon access][gwd]" for details). This will save a
lot of time when you push the gitolite-admin repo with changes. This variable
also control whether "git config" lines (such as `config hooks.emailprefix =
"[gitolite]"`) will be processed or not.
Setting this is relatively harmless to a normal installation, unlike the next
two variables :-) `GL_NO_CREATE_REPOS` and `GL_NO_SETUP_AUTHKEYS` are meant
for installations where some backend system already exists that does all the
actual repo creation, and all the authentication setup (ssh auth keys),
respectively.
Summary: Please **leave those two variables alone** unless you're initials are
"JK" ;-)
Also note that using all 3 of the `GL_NO_*` variables will result in
*everything* after the config compile being skipped. In other words, gitolite
is being used **only** for its access control language.
2010-10-26 16:26:51 +02:00
<a name="_optimising_the_authkeys_file"></a>
#### optimising the authkeys file
Sshd does a linear scan of the `~/.ssh/authorized_keys` file when an incoming
connection shows up. This means that keys found near the top get served
faster than keys near the bottom. On my laptop, it takes about 2500 keys
before I notice the delay; on a typical server it could be double that, so
don't worry about all this unless your user-count is in that range.
One way to deal with 5000+ keys is to use customised, database-backed ssh
daemons, but many people are uncomfortable with taking non-standard versions
of such a critical piece of the security infrastructure. In addition, most
distributions do not make it painless to use them.
So what do you do?
The following trick uses the Pareto principle (a.k.a the "80-20 rule")
to get an immediate boost in response for the most frequent or prolific
developers. It can allow you to ignore the problem until the next big
increase in your user counts!
Here's how:
* create subdirectories of keydir/ called 0, 1, (maybe 2, 3, etc., also),
and 9.
* in 0/, put in the pubkeys of the most frequent users
* in 1/, add the next most important set of users, and so on for 2, 3, etc.
* finally, put all the rest in 9/
Make sure "9" contains at least 70-90% of the total number of pubkeys,
otherwise this doesn't really help.
You can easily determine who your top users are by runnning something like
this (note the clever date command that always gets you last months log file!)
cat .gitolite/logs/gitolite-`date +%Y-%m -d -30days`.log |
cut -f2 | sort | uniq -c | sort -n -r
<a name="_what_are_the_downsides_"></a>
2010-05-14 14:50:57 +02:00
### what are the downsides?
2010-05-18 14:20:58 +02:00
There is one minor issue.
2010-05-14 14:50:57 +02:00
2010-05-18 14:20:58 +02:00
If you use the delegation feature, you can no longer define or extend
2010-05-14 14:50:57 +02:00
@groups in a fragment, for security reasons. It will also not let you use any
group other than the @fragname itself (specifically, groups which contained a
subset of the allowed @fragname, which would work normally, do not work now).
(If you didn't understand all that, you're probably not using delegation, so
feel free to ignore it!)
<a name="_storing_usergroup_information_outside_gitolite_like_in_LDAP_"></a>
### storing usergroup information outside gitolite (like in LDAP)
2010-05-14 14:50:57 +02:00
[Please NOTE: this is all about *user* groups, not *repo* groups]
[WARNING: the earlier method of doing this has been discontinued; please see
the commit message for details]
2010-05-14 14:50:57 +02:00
Gitolite now allows usergroup information to be stored outside its own config
file. We'll see "why" first, then the "how".
2010-05-14 14:50:57 +02:00
<a name="_why"></a>
2010-05-14 14:50:57 +02:00
#### why
2010-05-14 14:50:57 +02:00
Large sites often have LDAP servers that already contain user and group
information, including group membership details. Such sites may prefer that
gitolite just pick up that info instead of having to redundantly put it in
gitolite's config file.
2010-05-14 14:50:57 +02:00
Consider this example config for one repo:
2010-05-14 14:50:57 +02:00
repo foo
RW+ = @lead_devs
RW = @devs
R = @interns
2010-05-14 14:50:57 +02:00
Normally, you would also need to specify:
2010-05-14 14:50:57 +02:00
@lead_devs = dilbert alice
@devs = wally
@interns = ashok
2010-05-14 14:50:57 +02:00
However, if the corporate LDAP server already tags these people correctly, and
if there is some way of getting that information out **at run time**, that
would be cool.
2010-05-14 14:50:57 +02:00
<a name="_how"></a>
2010-05-14 14:50:57 +02:00
#### how
2010-05-14 14:50:57 +02:00
All you need is a script that, given a username, queries your LDAP or similar
server, and returns a space-separated list of all the groups she is a member
of. If an invalid user name is sent in, or the user is valid but is not part
of any groups, it should print nothing.
This script will probably be specific to your site. [**Help wanted**: I don't
know LDAP, so if someone wants to contribute some sample code I'd be happy to
put it in contrib/, with credit of course!]
Then set the `$GL_GET_MEMBERSHIPS_PGM` variable in the rc file to the full
path to this program, set `$GL_BIG_CONFIG` to 1, and that will be that.
[gwd]: http://github.com/sitaramc/gitolite/blob/pu/doc/3-faq-tips-etc.mkd#gwd
2010-12-24 06:31:28 +01:00
<a name="_implementation_notes"></a>
### implementation notes
To understand how big-config works, we'll first look at how it works without
this setting. Think back to the example at the top, and assume 'alice' is
accessing the 'lynx' repo. The various rights are governed by the following
hash elements:
# for the first level checks
$repos{'lynx'}{'R'}{'alice'} = 1
$repos{'lynx'}{'W'}{'alice'} = 1
# for the second level checks
$repos{'lynx'}{'alice'}{'refs/heads/master'} = 'RW';
$repos{'lynx'}{'alice'}{'refs/heads/next'} = 'RW+';
Those elements are explicitly specified in the compiled hash, as you can see
(you don't need to know perl too much to read a hash; just make some educated
guesses if needed!)
Now look at the compiled hash produced when `GL_BIG_CONFIG` is set. In place
of both 'firefox' and 'lynx' you have '@wbr', and similarly '@devs' for both
'alice' and 'bob'. In addition, there is a group hash at the bottom that
lists each group and its members.
When 'alice' tries to access the 'lynx' repo, gitolite collects all the group
names that these names belong to, so '@devs' is added to the list of 'user'
names that 'alice' inherits permissions from, and '@wbr' is added to the list
of 'repo' names that 'lynx' inherits from. This means that the final access
inherits all permissions pertaining to the following combinations:
alice, lynx
alice, @wbr
@devs, lynx
@devs, @wbr
(Actually there are 3 more... try and guess what they may be!)
Anyway, all ACL rules for these combinations are clubbed together to make the
composite set of rules that 'alice' accessing 'lynx' is subject to.