incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Abele <>
Subject Re: The Incubator and Infrastructure
Date Fri, 02 Sep 2005 17:24:54 GMT
On 02.09.2005, at 17:25, Niclas Hedhman wrote:

> On Friday 02 September 2005 20:23, Erik Abele wrote:
> I honestly don't feel like "fueling" this thread, so please don't  
> hesitate to
> say I am outright stupid and don't know what I am talking about,  
> and I'll
> shut up as a good citizen... My intention is not to "whine".

Your last msg was quite fueling, no? ;-)

>> Why isn't that working for others?
> Yes. Why?
> My take is that only one in 300 committers have what it takes to  
> "get thru". I
> was not one of them... Can that ratio be improved? If it takes 10  
> people to
> keep the ship afloat, (as a manager) I would plan for at least one  
> person
> leaving every quarter, and that would then set the minimum  
> recruitment pace.

Uhm, 'recruitement' (in the managerial sense) doesn't really work in  
a volunteer-driven organization with collaborative and meritocratic  
development processes... you can of course encourage people but  it  
seems that that unfortunately doesn't really work with the unsexy  
jobs of infra@.

>> The infra repo isn't the almighty tool everyone needs. Most material
>> in there (if not all by now) isn't instantly useful if you are not on
>> top of the different setups.
> Ok. So I don't need to bother about the docs, since they may  
> confuse me even
> more? Good start.

I didn't say that; all I said is that the infra repo is not our  
primary documentation place for beginners. It's just the place where  
we keep configuration files like crontabs, dns zonefiles, the httpd  
config and so on. There is also a limited set of documentation but I  
wonder why a infra newcomer has to know things like how to access the  
terminal server or how the network in the colo is set up.

>> Furthermore you can find nearly
>> everything on the machines itself, mostly world-readable;
> I noticed a pluralis of "machines". AFAIK, only minotaur is "world
> accessible".

Yep and that should be enough to show that you know what you are  
doing and that what you are doing is goodness...

>> a) overload is self-inflicted
>> Uh oh, just consider the following example: account requests.
> How long can it possibly take?
> Let me make a guess ~1 minute, perhaps 2. Let's say I spend half an  
> hour a
> day, that makes it 15 a day, and several thousand per year.  
> Apparently, this
> can't be a bottle neck.

You snipped the most important failures so trust me that it isn't  
done within a 1 minute. But that just shows the ignorance (not  
necessarily willful) we are facing, see below.

>> - pmc votes in new committer,
>> - makes him sent in a CLA;
>> - the PMC chair watches for the receipt of the CLA and if it gets  
>> recorded,
>>   he sends a single email to root@ (cc'ing the PMC) in a pre-defined
>>   format and waits till the account is created.
> Great. So it is not a problem, anymore?

If everybody would follow this scheme it would be nice but as I said,  
nobody is doing so. Well, to be correct and fair, nobody *was* doing  
so, it is really better now with respect to account requests - but  
that was just one of a bunch of examples to make my point clear: even  
if we have a process and documentation in place, we are still facing  
a lot of people not following the process, ignoring the documentation  
and whining and pestering to get their work done :(

>> b) being disorganized
>> Maybe, but keep in mind that we are all volunteers and that not only
>> the ASF is growing tremendously, our hardware/infrastructure needs
>> are doind so too. Old systems and services have to be kept running
>> for projects who want to still use it, new systems and services have
>> to be put in place (and administered) because projects are begging
>> for it. The complexity is growing daily.
> I recognize that. And I happen to be of the opinion that it is self- 
> inflicted.
> Leo wrote a humorous mail about it two months ago "Why we say no.".  
> And just
> like projects don't have a choice of CVS, such policy could be  
> introduced for
> Jira/Bugzilla/Scarab (any other?) as well, if it is seen as a  
> taking up
> precious time.

Yes, infra could say 'no' more often or could simply shut down the  
services they don't want to administer. To be honest, I'd be fine  
with this (and its consequences (people/projects leaving, flamewars,  
what-have-you)) but since I don't want to discuss this in hundreds of  
emails, I'll simply leave and let others take over. No harm done.

>> c) non-transparent
>> Hmm, IMO infra is *not* non-transparent; it's just that the bar is
>> pretty high (knowledge-wise and confidence-wise (in the sense of
>> trust)). Please give me an example of what is so non-transparent; I'm
>> willing to help you here.
> Example 1. You said it yourself -> docs are "shaky", but I could  
> live with
> that. The problem is "everyone knows they are not good" and it has  
> been
> hinted that a lot of material is outright wrong. That makes it even  
> worse.

Okay, but that is not 'non-transparent' - the bar is just higher. I  
agree that it'd be nice to have more docs but OTOH the people have to  
also read them, see my example with the documented account request  
processes; nobody was following it, even after several pmc-wide  
emails :)

> Example 2. Most requests comes in as either a mail or a Jira issue.  
> Some time
> later, someone like yourself, mark it as "done". If I was  
> overworked, and
> that I wanted others to get involved, I would spend more time  
> explaining what
> I did to make it "done" than I did to "do it". *In detail*.
> Over "my time", that rarely happened, and I took it as "they don't  
> want help
> with that".

I agree and I always tried (*) to explain what I did to a) show the  
other infra guys that I know what I was doing and b) to educate  
others. I learnt a lot by just reading infra mails. Ah, and some time  
ago, Leo even started a tool to help with this, but I have to admit  
that I'm not aware of the specifics right now.

*: note the past tense, I'm not doing it anymore and I know that it's  
bad but here you'll have to bear with me, sorry, I got lazy over time

> Example 3. I think that most resources are turned off by default,  
> and only
> after long considerations, made accessible (read and/or write) to a  
> wider
> audience. That is natural security awareness kicking in, but little
> discussion is going on, about how to make more info available. Can  
> other
> people watch this configuration? I have always been of the opinion  
> that ASF
> is more secretive than the situation calls for.
> The fact that many services live on machines that are not  
> accessible, makes it
> difficult to peek around to get an idea of how things are setup,  
> without
> "bothering" the peeps who do the work, since it is likely I won't  
> be able to
> help "in that particular area" right now.

IMO this is not the case. There are certainly parts which are only  
accessible to members, but that has also legal reasons. Everything  
else is more or less open, at least to a degree where you can show  
that it warrants more karma for you...

I'm still missing any concrete examples of issues which can't be  
solved because of too restrictive access.

> d) "put out fire by hand"
>> Well, that's the occasional hdd failure or worm attack or svn wedge
>> or ... . It's pretty hard to come up with automated solutions to
>> every problem so administering a system always means to baby-sit it
>> in some way. If it would be solvable by a click on a fancy button,
>> the managers could do it and we wouldn't need any sysadmins  
>> anymore :)
> I get the impression by your response that there are no problems,  
> or overload
> at the infra@ team.

Huh? 'occasional' in the terms of 10 machines and millions of users  
may mean 'every second day'.

> Catastrophic events can't be automated, but they happen
> rarely.

Catastrophic events may happen rarely but that's not all. See the  
list below.

> All the 'bulk' is already streamlined, and shouldn't take much time.

Is it? No, it is not, unfortunately...

> So what is it? Full time staff is needed, so there must be something.

How about the following list:
- creating, tweaking, moving, deleting of different project resources  
like mailing lists, svn repositories, user accounts etc.
- recovering from crashes of different sorts, ranging from hw (hdd,  
network, ...) to sw (rsyncs, repositories, ...)
- bearing with occasional events like virus attacks or malfunctioning  
mass downloaders etc.
- tweaking all sorts of things due to user requests due to other  
users failures ('can someone chmod these files please')
- answering questions
- thinking about and discussing improvements, changes, etc.
- keeping systems and services up to date, testing updates/changes
- putting new systems/resources in place
- caring about backups, hw orders and other boring things like security
- reading a lot of emails (heh, just the nightly cronjobs are 15  
emails alone plus numerous other alert/info mails)
- there are numerous more events which I don't want to bore you with...

And remember: this is not why we (the infra team) originally came to  
the ASF. The reason was developing software like HTTPD, Tomcat, Ant  
or Foo - so that's all substracting from the time we can do this.  
It's not that we are paid to do it or have the greatest fun doing it;  
it's a necessity!

I guess for most people it was just *fun* getting involved with OS  
software development (remember: some people don't even have IT as  
their day job!) and now we are stuck with keeping the ship afloat in  
the hope of getting it to a state, where we have enough time to get  
back again to the things for which we came here... hah, what a mess :)

>> Thanks, I'm nearly outta here too - it's far more easy to support my
>> own systems which have to take care only for a couple hundred users
>> per second and not millions and, ah, making a living out of it
>> instead of just fighting with a huge amount of of whining people,
>> materializing in hundreds of emails :|
> Erik, in case no one has expressed it before; A Big Thank You!!!

Oh, don't thank me (although I appreciate it) - I'm only a very small  
lightbulb in the flashing universe of infrastructure :)

> You, Noel,
> Leo, Justin and everyone else are providing a wonderful service.  
> That is the
> external interface, and I think you manage that well.

Thanks :)

And thanks for getting me think about this another time. I was not  
really actively involved in infrastructure issues for the last two  
months (except for an occasional helping hand on IRC) and with that  
distance I realize now how much time and effort it took for me, oh  

> If mails are a problem, disable the mailing list and require Jira  
> to be used
> as the medium to communicate with the infra@ team.

I think that would be the wrong way but OTOH, emails are sometimes  
actually an issue for me. As many others, I'm not a native english  
speaker so writing an email is probably taking twice (if not more)  
the amount of time... especially these long ones, sorry, I'll shut up  
now since I think _in the end_ we are basically on the same page :)


View raw message