incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: [PROPOSAL] Accumulo for the Apache Incubator
Date Tue, 06 Sep 2011 15:09:44 GMT
On 04/09/11 17:39, Billie J Rinaldi wrote:
> Bernd,
> We would divide the derived code into two categories: that which we modified only slightly
(for example to allow us to extend it) and that which we modified heavily.  Now that we are
able to interact openly, we hope to supply much of that back to the original projects.  There
is a detailed overview below.  We identified these by searching for "copyright" in our code.
 The total count came to just over 14,000 lines.  We use "heavily" as a qualitative assessment
of how much we modified, but we could certainly come up with quantitative assessments.
> 5400 lines: slightly modified versions of Hadoop BCFile and related classes
>              (our current file format extends BCFile)
> 4300 lines: heavily modified versions of MapFile and SequenceFile
>              (no longer our default file format, but still included for backward compatibility)

Internal compatibility or external? If internal only I'd keep that out 
of the public codebase.

> 2000 lines: heavily modified versions of HBase BlockCache and related files
>              (Adam didn't count the tests when he said 1500 lines)

+1 for more tests.

> 1300 lines: heavily modified versions of Hadoop BloomFilters

-any plan to contribute back to hadoop-core, or are they too 
incompatible now?

> 419 lines: modified Hadoop TeraSortIngest to sort data using Accumulo
> 325 lines: our Value is an immutable version of Hadoop BytesWritable

-any plan to contribute back to hadoop-core?

> 142 lines: modified ClassLoader based on commons-jci ReloadingClassLoader

classloaders scare me. If we had an ASF-certified-classloader-hacker 
proposal where only approved people could write CLs for ASF code I'd be 
+1 for it, even though I'd fail the test myself.

I understand why you've forked off your own versions of some of the 
Hadoop and HBase core -it is not only your right, it gets the changes in 
on your schedule. I have been known to do this myself.

Ideally those thing have to get back to a (future) version of Hadoop, 
which people like Doug and Owen can help with. Having forked code in the 
ASF codebase is something to avoid. Again, I speak from experience.

I think the proposal ought to consider how they fit in with BigTop too, 
so it can be part of the full apache hadoop stack deploy/test process.

I also think that the roadmap for the system may want to think about 
MR-279 integration; would that architecture be a better way to run 
Accumulo code within a Hadoop cluster.


(BTW: I'm not going to volunteer as a mentor/committer, my focus is on 
getting back into Hadoop core coding without distractions)

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message