incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie J Rinaldi <>
Subject Re: [PROPOSAL] Accumulo for the Apache Incubator
Date Sun, 04 Sep 2011 16:39:16 GMT

We would divide the derived code into two categories: that which we modified only slightly
(for example to allow us to extend it) and that which we modified heavily.  Now that we are
able to interact openly, we hope to supply much of that back to the original projects.  There
is a detailed overview below.  We identified these by searching for "copyright" in our code.
 The total count came to just over 14,000 lines.  We use "heavily" as a qualitative assessment
of how much we modified, but we could certainly come up with quantitative assessments.

5400 lines: slightly modified versions of Hadoop BCFile and related classes
            (our current file format extends BCFile)
4300 lines: heavily modified versions of MapFile and SequenceFile
            (no longer our default file format, but still included for backward compatibility)
2000 lines: heavily modified versions of HBase BlockCache and related files
            (Adam didn't count the tests when he said 1500 lines)
1300 lines: heavily modified versions of Hadoop BloomFilters
419 lines: modified Hadoop TeraSortIngest to sort data using Accumulo
325 lines: our Value is an immutable version of Hadoop BytesWritable
142 lines: modified ClassLoader based on commons-jci ReloadingClassLoader


----- Original Message -----
From: "Bernd Fondermann" <>
Sent: Sunday, September 4, 2011 3:41:09 AM
Subject: Re: [PROPOSAL] Accumulo for the Apache Incubator

On Saturday, September 3, 2011, Adam P Fuchs <> wrote:
> Hi Bernd,
> The latest stable release of Accumulo contains roughly 200,000 lines of
code, of which about 85,000 are machine generated thrift code. Of the
remaining code, about 15,000 lines are derived from other Apache projects,
and about 1,500 of those are derived from HBase code. The code derived from
HBase comprises a query caching layer (block cache, index cache, multi-level
LRU logic, etc.).

So, you are saying more than 10% of the non-generated code base (and you are
not counting lib-style uses/JARs here, right?) is derived from other Apache
code? That seems to be unusual. Just curious, could you elaborate a bit
about why you did that amd what kind of code that is? Thank you.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message