incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <>
Subject Re: [DISCUSS] Crunch to join the Apache Incubator
Date Fri, 25 May 2012 19:00:22 GMT
Hi Steve,

Thank you for your thoughtful comments. Replies inlined below.

On Fri, May 25, 2012 at 2:39 AM, Steve Loughran
<> wrote:
> On 23 May 2012 19:35, Josh Wills <> wrote:
>> Hey Jakob,
>> This was a tough one-- you know that I've been talking about Crunch
>> w/Joe Adler for a few weeks now, and I personally am really looking
>> forward to working with you guys. That said, the team did feel
>> strongly about keeping the initial committers to people who had
>> already added major pieces of functionality to Crunch, and adding
>> Vinod was about his expertise on the MR2 internals, which we think
>> will be critical to Crunch's success. We are going to put the Crunch
>> proposal up for a vote with the current team in place.
>> We are, of course, very eager to grow the list of committers through
>> the normal Apache process.
> I'd go for pulling Jakob in for tactical and strategic reasons
> 1. He's using it at work, so represents the end users.

A super-majority of the initial committers are also end users. I use
Crunch on my own projects (e.g., and ), Cloudera solutions architects
use Crunch on client projects, Robert is building tools on top of
Crunch at WibiData, and Gabriel and Chris use it for building
pipelines at TomTom. I can't speak for Tom and Vinod, but of course,
they have other positive qualities. :)

> 2. His code is always of high quality

I in no way meant to disparage Jakob or his coding. The objective of
my reply was say "no" in the most apologetic, obsequious way possible
while not going so far over the top as to sound insincere. Having
LinkedIn on board would be a tremendous PR boost for the project. It
was painful to say no.

I am in no way savvy in the ways of Apache or the politics of the ASF.
I understand that smart people who I respect a great deal think that
this is the wrong decision. But I think that it takes something really
great for someone to see a project like Crunch, play around with, and
then take the time to make some contributions to it without any
expectation of recognition, in the form of an Apache committership or
anything else. That was what Gabriel and Chris and Robert did over the
past few months. I really admire that, and I think that it deserves
some special recognition, however small. I'm willing to have some
people not like me or think I'm dumb if that's the price of giving
that to them.

> 3. Given the ongoing discussion on diversity w.r.t Flume, I think it would
> be wise to not follow that projects example, and try to get broader
> involvement from the outset.

I agree that it is critical to have broad involvement at the outset.
Both S4 and Flume started out with at least 50% of their initial
committers from a single company, and no single company constitutes a
majority of the initial committers to Crunch (Cloudera has three,
TomTom has two, WibiData has one, and Hortonworks has one). That de
jure diversity mirrors the de facto diversity in Crunch's commit logs
over the past several months:

There is nothing more important than increasing that de facto
diversity over time. I fully expect that my role during the incubator
process is to be the best documenter, repository maintainer, and
recruiter of new contributors that I can be.


Director of Data Science
Twitter: @josh_wills

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message