incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam P Fuchs <>
Subject Re: [PROPOSAL] Accumulo for the Apache Incubator
Date Tue, 06 Sep 2011 22:42:35 GMT
Hey Steve,

We would like to be able to contribute back where appropriate. We think that our BloomFilter
improvements and some of our MapFile improvements are generally useful, and those should be
pretty natural contributions back to Hadoop. Other modifications may not be so obviously generally
useful, such as hard-coded optimizations for Accumulo. However, it is certainly our goal to
reduce unnecessary code forks.

The classloader project was a challenge, and it took us several attempts to get it right.
It sure is cool now that it works. We still have a number of tickets on our todo list in this
area, like more convenient distribution mechanisms for user-defined functions (i.e. Iterators
or Coprocessors) across a Hadoop cluster.

Thanks for the pointers to BigTop and MR-279. Those certainly look promising for better integration
with the Apache brand. I'm looking forward to lots of great contributions from the community
to the roadmap as Accumulo moves into incubation.


----- Original Message -----
From: Steve Loughran <>
Sent: Tue, 06 Sep 2011 15:09:44 -0000
Subject: Re: [PROPOSAL] Accumulo for the Apache Incubator

On 04/09/11 17:39, Billie J Rinaldi wrote:
> Bernd,
> We would divide the derived code into two categories: that which we modified only slightly
(for example to allow us to extend it) and that which we modified heavily.  Now that we are
able to interact openly, we hope to supply much of that back to the original projects.  There
is a detailed overview below.  We identified these by searching for "copyright" in our code.
 The total count came to just over 14,000 lines.  We use "heavily" as a qualitative assessment
of how much we modified, but we could certainly come up with quantitative assessments.
> 5400 lines: slightly modified versions of Hadoop BCFile and related classes
>              (our current file format extends BCFile)
> 4300 lines: heavily modified versions of MapFile and SequenceFile
>              (no longer our default file format, but still included for backward compatibility)

Internal compatibility or external? If internal only I'd keep that out 
of the public codebase.

> 2000 lines: heavily modified versions of HBase BlockCache and related files
>              (Adam didn't count the tests when he said 1500 lines)

+1 for more tests.

> 1300 lines: heavily modified versions of Hadoop BloomFilters

-any plan to contribute back to hadoop-core, or are they too 
incompatible now?

> 419 lines: modified Hadoop TeraSortIngest to sort data using Accumulo
> 325 lines: our Value is an immutable version of Hadoop BytesWritable

-any plan to contribute back to hadoop-core?

> 142 lines: modified ClassLoader based on commons-jci ReloadingClassLoader

classloaders scare me. If we had an ASF-certified-classloader-hacker 
proposal where only approved people could write CLs for ASF code I'd be 
+1 for it, even though I'd fail the test myself.

I understand why you've forked off your own versions of some of the 
Hadoop and HBase core -it is not only your right, it gets the changes in 
on your schedule. I have been known to do this myself.

Ideally those thing have to get back to a (future) version of Hadoop, 
which people like Doug and Owen can help with. Having forked code in the 
ASF codebase is something to avoid. Again, I speak from experience.

I think the proposal ought to consider how they fit in with BigTop too, 
so it can be part of the full apache hadoop stack deploy/test process.

I also think that the roadmap for the system may want to think about 
MR-279 integration; would that architecture be a better way to run 
Accumulo code within a Hadoop cluster.


(BTW: I'm not going to volunteer as a mentor/committer, my focus is on 
getting back into Hadoop core coding without distractions)

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message