incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject Re: [VOTE] Accumulo to join the Incubator
Date Fri, 09 Sep 2011 19:23:16 GMT
Qualifying: +1 (non-binding).

I would also like to repeat what Marvin Humphrey said:

"I've been impressed by how the Accumulo representatives have conducted
themselves during this week of discussion, and I believe that they will
valuable and productive participants within Apache."

- milind

Milind Bhandarkar
Greenplum Labs, EMC
(Disclaimer: Opinions expressed in this email are those of the author, and
do not necessarily represent the views of any organization, past or
present, the author might be affiliated with.)

On 9/9/11 9:33 AM, "Bhandarkar, Milind" <> wrote:

>+1 !
>- milind
>On 9/9/11 9:22 AM, "Doug Cutting" <> wrote:
>>It's been a week since the Accumulo proposal was submitted for
>>discussion.  A few questions were asked, and the proposal was clarified
>>in response.  Sufficient mentors have volunteered.  I thus feel we are
>>now ready for a vote.
>>The latest proposal can be found at the end of this email and at:
>>The discussion regarding the proposal can be found at:
>>Please cast your votes:
>>[  ] +1 Accept Accumulo for incubation
>>[  ] +0 Indifferent to Accumulo incubation
>>[  ] -1 Reject Accumulo for incubation
>>This vote will close 72 hours from now.
>>= Accumulo Proposal =
>>== Abstract ==
>>Accumulo is a distributed key/value store that provides expressive,
>>cell-level access labels.
>>== Proposal ==
>>Accumulo is a sorted, distributed key/value store based on Google's
>>BigTable design.  It is built on top of Apache Hadoop, Zookeeper, and
>>Thrift.  It features a few novel improvements on the BigTable design in
>>the form of cell-level access labels and a server-side programming
>>mechanism that can modify key/value pairs at various points in the data
>>management process.
>>== Background ==
>>Google published the design of BigTable in 2006.  Several other open
>>source projects have implemented aspects of this design including HBase,
>>CloudStore, and Cassandra.  Accumulo began its development in 2008.
>>== Rationale ==
>>There is a need for a flexible, high performance distributed key/value
>>store that provides expressive, fine-grained access labels.  The
>>communities we expect to be most interested in such a project are
>>government, health care, and other industries where privacy is a
>>concern.  We have made much progress in developing this project over the
>>past 3 years and believe both the project and the interested communities
>>would benefit from this work being openly available and having open
>>== Current Status ==
>>=== Meritocracy ===
>>We intend to strongly encourage the community to help with and
>>contribute to the code.  We will actively seek potential committers and
>>help them become familiar with the codebase.
>>=== Community ===
>>A strong government community has developed around Accumulo and training
>>classes have been ongoing for about a year.  Hundreds of developers use
>>=== Core Developers ===
>>The developers are mainly employed by the National Security Agency, but
>>we anticipate interest developing among other companies.
>>=== Alignment ===
>>Accumulo is built on top of Hadoop, Zookeeper, and Thrift.  It builds
>>with Maven.  Due to the strong relationship with these Apache projects,
>>the incubator is a good match for Accumulo.
>>== Known Risks ==
>>=== Orphaned Products ===
>>There is only a small risk of being orphaned.  The community is
>>committed to improving the codebase of the project due to its fulfilling
>>needs not addressed by any other software.
>>=== Inexperience with Open Source ===
>>The codebase has been treated internally as an open source project since
>>its beginning, and the initial Apache committers have been involved with
>>the code for multiple years.  While our experience with public open
>>source is limited, we do not anticipate difficulty in operating under
>>Apache's development process.
>>=== Homogeneous Developers ===
>>The committers have multiple employers and it is expected that
>>committers from different companies will be recruited.
>>=== Reliance on Salaried Developers ===
>>The initial committers are all paid by their employers to work on
>>Accumulo and we expect such employment to continue.  Some of the initial
>>committers would continue as volunteers even if no longer employed to do
>>=== Relationships with Other Apache Products ===
>>Accumulo uses Hadoop, Zookeeper, Thrift, Maven, log4j, commons-lang,
>>-net, -io, -jci, -collections, -configuration, -logging, and -codec.
>>=== Relationship to HBase ===
>>Accumulo and HBase are both based on the design of Google's BigTable, so
>>there is a danger that potential users will have difficulty
>>distinguishing the two.  Some of the key areas in which Accumulo differs
>>from HBase are discussed below.  It may be possible to incorporate the
>>desired features of Accumulo into HBase.  However, the amount of work
>>required would slow development of HBase and Accumulo considerably.  We
>>believe this warrants a podling for Accumulo at the current time.  We
>>expect active cross-pollination will occur between HBase and podling
>>Accumulo and it is possible that the codebases and projects will
>>ultimately converge.
>>==== Access Labels ====
>>Accumulo has an additional portion of its key that sorts after the
>>column qualifier and before the timestamp.  It is called column
>>visibility and enables expressive cell-level access control.
>>Authorizations are passed with each query to control what data is
>>returned to the user.  The column visibilities are boolean AND and OR
>>combinations of arbitrary strings (such as "(A&B)|C") and authorizations
>>are sets of strings (such as {C,D}).
>>==== Iterators ====
>>Accumulo has a novel server-side programming mechanism that can modify
>>the data written to disk or returned to the user.  This mechanism can be
>>configured for any of the scopes where data is read from or written to
>>disk.  It can be used to perform joins on data within a single tablet.
>>==== Flexibility ====
>>HBase requires the user to specify the set of column families to be used
>>up front.  Accumulo places no restrictions on the column families.
>>Also, each column family in HBase is stored separately on disk.
>>Accumulo allows column families to be grouped together on disk, as does
>>BigTable.  This enables users to configure how their data is stored,
>>potentially providing improvements in compression and lookup speeds.  It
>>gives Accumulo a row/column hybrid nature, while HBase is currently
>>==== Testing ====
>>Accumulo has testing frameworks that have resulted in its achieving a
>>high level of correctness and performance.  We have observed that under
>>some configurations and conditions Accumulo will outperform HBase and
>>provide greater data integrity.
>>==== Logging ====
>>HBase uses a write-ahead log on the Hadoop Distributed File System.
>>Accumulo has its own logging service that does not depend on
>>communication with the HDFS NameNode.
>>==== Storage ====
>>Accumulo has a relative key file format that improves compression.
>>==== Areas in which HBase features improvements over Accumulo ====
>>in memory tables, upserts, coprocessors, connections to other projects
>>such as Cascading and Pig
>>=== Expectations ===
>>There is a risk that Accumulo will be criticized for not providing
>>adequate security.  The access labels in Accumulo do not in themselves
>>provide a complete security solution, but are a mechanism for labeling
>>each piece of data with the authorizations that are necessary to see it.
>>=== Apache Brand ===
>>Our interest in releasing this code as an Apache incubator project is
>>due to its strong relationship with other Apache projects, i.e. Accumulo
>>has dependencies on Hadoop, Zookeeper, and Thrift and has complementary
>>goals to HBase.
>>== Documentation ==
>>There is not currently documentation about Accumulo on the web, but a
>>fair amount of documentation and training materials exists and will be
>>provided on the Accumulo wiki at  Also, a paper discussing
>>YCSB results for Accumulo will be presented at the 2011 Symposium on
>>Cloud Computing.
>>== Initial Source ==
>>Accumulo has been in development since spring 2008.  There are hundreds
>>of developers using it and tens of developers have contributed to it.
>>The core codebase consists of 200,000 lines of code (mainly Java) and
>>100s of pages of documentation.  There are also a few projects built on
>>top of Accumulo that may be added to its contrib in the future.  These
>>include support for Hive, Matlab, YCSB, and graph processing.
>>== Source and Intellectual Property Submission Plan ==
>>Accumulo core code, examples, documention, and training materials will
>>be submitted by the National Security Agency.
>>We will also be soliciting contributions of further plugins from MIT
>>Lincoln Labs, Carnegie Mellon University, and others.
>>Accumulo has been developed by a mix of government employees and private
>>companies under government contract.  Material developed by government
>>employees is in the public domain and no U.S. copyright exists in works
>>of the federal government.  For the contractor developed material in the
>>initial submission, the U.S. Government has sufficient authority per the
>>ICLA from the copyright owner to contribute the Accumulo code to the
>>There has been some discussion regarding accepting contributions from US
>>Government sources on We
>>propose that the NSA will sign an ICLA/CCLA if that document could be
>>slightly modified to explicitly address copyright in works of government
>>employees. Specifically, we propose that the definition of ³You² be
>>modified to include ³the copyright owner, the owner of a Contribution
>>not subject to copyright, or legal entity authorized by the copyright
>>owner that is making this Agreement.² In addition, section 2, the
>>copyright license grant be modified after ³You hereby grant² that either
>>states ³to the extent authorized by law² or ³to the extent copyright
>>exists in the Contribution.²  These changes will permit US Government
>>employee developed work to be included.
>>One proposed solution is to form a Collaborative Research and
>>Development Agreement (CRADA) between the Apache Software Foundation and
>>the US Government, but this will not solve the underlying problem that
>>U.S. law does not grant copyright to works of government employees.  At
>>this time a CRADA is not necessary but should it be determined that a
>>CRADA is necessary, we would like to work through that process during
>>the incubation phase of Accumulo rather than before acceptance as this
>>may take time to enter into an agreement.
>>== External Dependencies ==
>>jetty (Apache and EPL), jline (BSD), jfreechart (LGPL), jcommon (LGPL),
>>slf4j (MIT), junit (CPL)
>>== Cryptography ==
>>== Required Resources ==
>> * Mailing Lists
>>   * accumulo-private
>>   * accumulo-dev
>>   * accumulo-commits
>>   * accumulo-user
>> * Subversion Directory
>>   *
>> * Issue Tracking
>>   * JIRA Accumulo (ACCUMULO)
>> * Continuous Integration
>>   * Jenkins builds on
>> * Web
>>   *
>>   * wiki at or
>>== Initial Committers ==
>> * Aaron Cordova (aaron at cordovas dot org)
>> * Adam Fuchs (adam.p.fuchs at ugov dot gov)
>> * Eric Newton (ecn at swcomplete dot com)
>> * Billie Rinaldi (billie.j.rinaldi at ugov dot gov)
>> * Keith Turner (keith.turner at ptech-llc dot com)
>> * John Vines (john.w.vines at ugov dot gov)
>> * Chris Waring (christopher.a.waring at ugov dot gov)
>>== Affiliations ==
>> * Aaron Cordova, The Interllective
>> * Adam Fuchs, National Security Agency
>> * Eric Newton, SW Complete Incorporated
>> * Billie Rinaldi, National Security Agency
>> * Keith Turner, Peterson Technology LLC
>> * John Vines, National Security Agency
>> * Chris Waring, National Security Agency
>>== Sponsors ==
>> * Champion: Doug Cutting
>>== Nominated Mentors ==
>> * Benson Margulies
>> * Alan Cabrera
>> * Bernd Fondermann
>> * Owen O'Malley
>>== Sponsoring Entity ==
>> * Apache Incubator
>>To unsubscribe, e-mail:
>>For additional commands, e-mail:
>To unsubscribe, e-mail:
>For additional commands, e-mail:

View raw message