incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Struberg <>
Subject Re: [VOTE] Accept Howl as an Incubator Project
Date Wed, 23 Feb 2011 08:13:21 GMT

Sounds like a welcome addition to the Hadoop space.


--- On Wed, 2/23/11, Alan Gates <> wrote:

> From: Alan Gates <>
> Subject: [VOTE] Accept Howl as an Incubator Project
> To:
> Date: Wednesday, February 23, 2011, 12:20 AM
> I would like to call a vote on
> accepting Howl as an Incubator project.  The proposal
> is available at 
> You can see the discussion from the proposal thread at
> Alan.
> ----------------------
> Abstract
> Howl is a table and storage management service for data
> created using Apache Hadoop.
> Proposal
> The vision of Howl is to provide table management and
> storage management layers for Apache Hadoop. This includes:
>     • Providing a shared schema and data
> type mechanism.
>     • Providing a table abstraction so
> that users need not be concerned with where or how their
> data is stored.
>     • Providing interoperability across
> data processing tools such as Pig, Map Reduce, Streaming,
> and Hive.
> Background
> Data processors using Apache Hadoop have a common need for
> table management services. The goal of a table management
> service is to track data that exists in a Hadoop grid and
> present that data to users in a tabular format. Such a table
> management service needs to provide a single input and
> output format to users so that individual users need not be
> concerned with the storage formats that are chosen for
> particular data sets. As part of having a single format, the
> data will need to be described by one type of schema and
> have a single datatype system.
> Additionally, users should be free to choose the best tools
> for their use cases. The Hadoop project includes Map Reduce,
> Streaming, Pig, and Hive, and additional tools exist such as
> Cascading. Each of these tools has users who prefer it, and
> there are use cases best addressed by each of these tools.
> Two users on the same grid who need to share data should not
> be constrained to use the same tool but rather should be
> free to choose the best tool for their use case. A table
> management service that presents data in the same way to all
> of the tools can alleviate this problem by providing
> interfaces to each of the data processing tools.
> There are also a few other features a table management
> service should provide, such as notification of when data
> arrives.
> A couple of developers at Yahoo! started the project. It is
> based on the Hive MetaStore component. There is good amount
> of interest in such a service expressed from Yahoo!,
> Facebook, LinkedIn, and, others. We are therefore proposing
> to place Howl in the Apache incubator and to build an open
> source community around it.
> Rationale
> There is a strong need for a table management service,
> especially for large grids with petabytes of data, and where
> the data volume is increasing by the day. Hadoop users need
> to find data to read and have a place to store  their
> data. Currently users must understand the location of data
> to read, the storage format, compression techniques used,
> etc. To write data they need to understand where on HDFS
> their data belongs, the best compression format to use, how
> their data should be serialized, etc.
> Most users do not want to be concerned with these issues.
> They want these managed for them.
> Having it as an Apache Open Source project will highly
> benefit Howl from the point of view of getting a large
> community that currently uses Hadoop and the other products
> built around Hadoop (like Pig, Hive, etc.). Users of the
> Hadoop ecosystem can influence Howl’s roadmap, and
> contribute to it. Looking at it in another way, we believe
> having Howl as part of the Hadoop ecosystem will be a great
> benefit to the current Hadoop/Pig/Hive community too.
> Current Status
> Meritocracy
> Our intent with this incubator proposal is to start
> building a diverse developer community around Howl following
> the Apache meritocracy model. We have wanted to make the
> project open source and encourage contributors from multiple
> organizations from the start. We plan to provide plenty of
> support to new developers and to quickly recruit those who
> make solid contributions to committer status.
> Community
> Howl is currently being used by developers at Yahoo! and
> there has been an expressed interest from LinkedIn and
> Facebook. Yahoo! also plans to deploy the current version of
> Howl in production soon. We hope to extend the user and
> developer base further in the future. The current developers
> and users are all interested in building a solid open source
> community around Howl.
> To work towards an open source community, we have started
> using the GitHub issue tracker and mailing lists at Yahoo!
> for development discussions within our group.
> Core Developers
> Howl is currently being developed by four engineers from
> Yahoo! - Devaraj Das, Ashutosh Chauhan, Sushanth Sowmyan,
> and Mac Yang. All the engineers have deep expertise in
> Hadoop and the Hadoop Ecosystem in general.
> Alignment
> The ASF is a natural host for Howl given that it is already
> the home of Hadoop, Pig, HBase, Cassandra, and other
> emerging cloud software projects. Howl was designed to
> support Hadoop from the beginning in order to solve data
> management challenges in Hadoop clusters. Howl complements
> the existing Apache cloud computing projects by providing a
> unified way to manage data.
> Known Risks
> Orphaned Products
> The core developers plan to work full time on the project.
> There is very little risk of Howl getting orphaned since
> large companies like Yahoo! are planning to deploy this in
> their production Hadoop clusters. We believe we can build an
> active developer community around Howl (companies like
> Facebook and LinkedIn have also expressed interest).
> Inexperience with Open Source
> All of the core developers are active users and followers
> of open source. Devaraj Das is an Apache Hadoop committer
> and Apache Hadoop PMC member, and has experience with the
> Apache infrastructure and development process. Ashutosh
> Chauhan is an Apache Pig committer and Apache Pig PMC
> member. Sushanth Sowmyan and Mac Yang made contributions to
> the Apache Hive and the Apache Chukwa projects.
> Homogeneous Developers
> The current core developers are all from Yahoo! However, we
> hope to establish a developer community that includes
> contributors from several corporations, and we are starting
> to work towards this with Facebook and LinkedIn.
> Reliance on Salaried Developers
> Currently, the developers are paid to do work on Howl.
> However, once the project has a community built around it,
> we expect to get committers and developers from outside the
> current core developers. Companies like Yahoo! are invested
> in Howl being a solution to the data management problem in
> Hadoop clusters, and that is not likely to change.
> Relationships with Other Apache Products
> Howl is going to be used by users of Hadoop, Pig, and Hive.
> See section Initial Source below for more information about
> Howl's relationship to Hive.
> An Excessive Fascination with the Apache Brand
> While we respect the reputation of the Apache brand and
> have no doubts that it will attract contributors and users,
> our interest is primarily to give Howl a solid home as an
> open source project following an established development
> model. We have also given reasons in the Rationale and
> Alignment sections.
> Documentation
> Information about Howl can be found at The following
> may be useful to start with:
>     •
> The GitHub site:
>     •
> The roadmap:
> Initial Source
> Howl has been under development since Summer 2010 by a team
> of engineers in Yahoo!. It is currently hosted on GitHub
> under an Apache license at
> The initial development of Howl has consisted of:
>     • maintaining a branch of the entire
> Hive codebase
>     • getting Howl-related patches
> committed to Hive
>     • developing Howl-specific plugins and
> wrappers to customize Hive behavior
> At runtime, Howl executes Hive code for metastore and
> CLI+DDL, disabling anything related to Hadoop map/reduce
> execution. It also makes use of the RCFile storage format
> contained in Hive.
> This approach was taken as a first step in order to
> validate the required functionality and get a production
> version working. However, in the long-term, maintaining a
> clone of Hive is undesirable. One possible resolution is to
> factor the metastore+CLI+DDL components out of Hive and move
> them into Howl (making Hive dependent on Howl). Another
> possible resolution is to remove the copy of Hive from Howl
> and do the build/release engineering necessary to make Howl
> depend on Hive. As part of the incubation process, we plan
> to work towards resolution of  these issues.
> External Dependencies
> The dependencies all have Apache compatible licenses.
> Cryptography
> Not applicable.
> Required Resources
> Mailing Lists
>     • howl-private for private PMC
> discussions (with moderated subscriptions)
>     • howl-dev
>     • howl-commits
>     • howl-user
> Subversion Directory
> Issue Tracking
> JIRA Howl (HOWL)
> Other Resources
> The existing code already has unit tests, so we would like
> a Hudson instance to run them whenever a new patch is
> submitted. This can be added after project creation.
> Initial Committers
>     • Devaraj Das
>     • Ashutosh Chauhan
>     • Sushanth Sowmyan
>     • Mac Yang
>     • Paul Yang
>     • Alan Gates
> A CLA is already on file for Sushanth.
> Affiliations
>     • Devaraj Das (Yahoo!)
>     • Ashutosh Chauhan (Yahoo!)
>     • Sushanth Sowmyan (Yahoo!)
>     • Mac Yang (Yahoo!)
>     • Paul Yang (Facebook)
>     • Alan Gates (Yahoo!)
> Sponsors
> Champion
> Owen O’Malley
> Nominated Mentors
>     • Olga Natkovich (Pig PMC member and
> Apache VP for Pig)
>     • Alan Gates (Pig PMC member)
>     • John Sichi (Hive PMC member)
> Sponsoring Entity
> We are requesting the Incubator to sponsor this project.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message