incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ralph Goers <>
Subject Re: [VOTE] Accept Howl as an Incubator Project
Date Thu, 24 Feb 2011 15:32:41 GMT


On Feb 22, 2011, at 4:20 PM, Alan Gates wrote:

> I would like to call a vote on accepting Howl as an Incubator project.  The proposal
is available at  You can see the discussion
from the proposal thread at
> Alan.
> ----------------------
> Abstract
> Howl is a table and storage management service for data created using Apache Hadoop.
> Proposal
> The vision of Howl is to provide table management and storage management layers for Apache
Hadoop. This includes:
> 	• Providing a shared schema and data type mechanism.
> 	• Providing a table abstraction so that users need not be concerned with where or
how their data is stored.
> 	• Providing interoperability across data processing tools such as Pig, Map Reduce,
Streaming, and Hive.
> Background
> Data processors using Apache Hadoop have a common need for table management services.
The goal of a table management service is to track data that exists in a Hadoop grid and present
that data to users in a tabular format. Such a table management service needs to provide a
single input and output format to users so that individual users need not be concerned with
the storage formats that are chosen for particular data sets. As part of having a single format,
the data will need to be described by one type of schema and have a single datatype system.
> Additionally, users should be free to choose the best tools for their use cases. The
Hadoop project includes Map Reduce, Streaming, Pig, and Hive, and additional tools exist such
as Cascading. Each of these tools has users who prefer it, and there are use cases best addressed
by each of these tools. Two users on the same grid who need to share data should not be constrained
to use the same tool but rather should be free to choose the best tool for their use case.
A table management service that presents data in the same way to all of the tools can alleviate
this problem by providing interfaces to each of the data processing tools.
> There are also a few other features a table management service should provide, such as
notification of when data arrives.
> A couple of developers at Yahoo! started the project. It is based on the Hive MetaStore
component. There is good amount of interest in such a service expressed from Yahoo!, Facebook,
LinkedIn, and, others. We are therefore proposing to place Howl in the Apache incubator and
to build an open source community around it.
> Rationale
> There is a strong need for a table management service, especially for large grids with
petabytes of data, and where the data volume is increasing by the day. Hadoop users need to
find data to read and have a place to store  their data. Currently users must understand the
location of data to read, the storage format, compression techniques used, etc. To write data
they need to understand where on HDFS their data belongs, the best compression format to use,
how their data should be serialized, etc.
> Most users do not want to be concerned with these issues. They want these managed for
> Having it as an Apache Open Source project will highly benefit Howl from the point of
view of getting a large community that currently uses Hadoop and the other products built
around Hadoop (like Pig, Hive, etc.). Users of the Hadoop ecosystem can influence Howl’s
roadmap, and contribute to it. Looking at it in another way, we believe having Howl as part
of the Hadoop ecosystem will be a great benefit to the current Hadoop/Pig/Hive community too.
> Current Status
> Meritocracy
> Our intent with this incubator proposal is to start building a diverse developer community
around Howl following the Apache meritocracy model. We have wanted to make the project open
source and encourage contributors from multiple organizations from the start. We plan to provide
plenty of support to new developers and to quickly recruit those who make solid contributions
to committer status.
> Community
> Howl is currently being used by developers at Yahoo! and there has been an expressed
interest from LinkedIn and Facebook. Yahoo! also plans to deploy the current version of Howl
in production soon. We hope to extend the user and developer base further in the future. The
current developers and users are all interested in building a solid open source community
around Howl.
> To work towards an open source community, we have started using the GitHub issue tracker
and mailing lists at Yahoo! for development discussions within our group.
> Core Developers
> Howl is currently being developed by four engineers from Yahoo! - Devaraj Das, Ashutosh
Chauhan, Sushanth Sowmyan, and Mac Yang. All the engineers have deep expertise in Hadoop and
the Hadoop Ecosystem in general.
> Alignment
> The ASF is a natural host for Howl given that it is already the home of Hadoop, Pig,
HBase, Cassandra, and other emerging cloud software projects. Howl was designed to support
Hadoop from the beginning in order to solve data management challenges in Hadoop clusters.
Howl complements the existing Apache cloud computing projects by providing a unified way to
manage data.
> Known Risks
> Orphaned Products
> The core developers plan to work full time on the project. There is very little risk
of Howl getting orphaned since large companies like Yahoo! are planning to deploy this in
their production Hadoop clusters. We believe we can build an active developer community around
Howl (companies like Facebook and LinkedIn have also expressed interest).
> Inexperience with Open Source
> All of the core developers are active users and followers of open source. Devaraj Das
is an Apache Hadoop committer and Apache Hadoop PMC member, and has experience with the Apache
infrastructure and development process. Ashutosh Chauhan is an Apache Pig committer and Apache
Pig PMC member. Sushanth Sowmyan and Mac Yang made contributions to the Apache Hive and the
Apache Chukwa projects.
> Homogeneous Developers
> The current core developers are all from Yahoo! However, we hope to establish a developer
community that includes contributors from several corporations, and we are starting to work
towards this with Facebook and LinkedIn.
> Reliance on Salaried Developers
> Currently, the developers are paid to do work on Howl. However, once the project has
a community built around it, we expect to get committers and developers from outside the current
core developers. Companies like Yahoo! are invested in Howl being a solution to the data management
problem in Hadoop clusters, and that is not likely to change.
> Relationships with Other Apache Products
> Howl is going to be used by users of Hadoop, Pig, and Hive. See section Initial Source
below for more information about Howl's relationship to Hive.
> An Excessive Fascination with the Apache Brand
> While we respect the reputation of the Apache brand and have no doubts that it will attract
contributors and users, our interest is primarily to give Howl a solid home as an open source
project following an established development model. We have also given reasons in the Rationale
and Alignment sections.
> Documentation
> Information about Howl can be found at The following
sources may be useful to start with:
> 	•
> The GitHub site:
> 	•
> The roadmap:
> Initial Source
> Howl has been under development since Summer 2010 by a team of engineers in Yahoo!. It
is currently hosted on GitHub under an Apache license at
> The initial development of Howl has consisted of:
> 	• maintaining a branch of the entire Hive codebase
> 	• getting Howl-related patches committed to Hive
> 	• developing Howl-specific plugins and wrappers to customize Hive behavior
> At runtime, Howl executes Hive code for metastore and CLI+DDL, disabling anything related
to Hadoop map/reduce execution. It also makes use of the RCFile storage format contained in
> This approach was taken as a first step in order to validate the required functionality
and get a production version working. However, in the long-term, maintaining a clone of Hive
is undesirable. One possible resolution is to factor the metastore+CLI+DDL components out
of Hive and move them into Howl (making Hive dependent on Howl). Another possible resolution
is to remove the copy of Hive from Howl and do the build/release engineering necessary to
make Howl depend on Hive. As part of the incubation process, we plan to work towards resolution
of  these issues.
> External Dependencies
> The dependencies all have Apache compatible licenses.
> Cryptography
> Not applicable.
> Required Resources
> Mailing Lists
> 	• howl-private for private PMC discussions (with moderated subscriptions)
> 	• howl-dev
> 	• howl-commits
> 	• howl-user
> Subversion Directory
> Issue Tracking
> JIRA Howl (HOWL)
> Other Resources
> The existing code already has unit tests, so we would like a Hudson instance to run them
whenever a new patch is submitted. This can be added after project creation.
> Initial Committers
> 	• Devaraj Das
> 	• Ashutosh Chauhan
> 	• Sushanth Sowmyan
> 	• Mac Yang
> 	• Paul Yang
> 	• Alan Gates
> A CLA is already on file for Sushanth.
> Affiliations
> 	• Devaraj Das (Yahoo!)
> 	• Ashutosh Chauhan (Yahoo!)
> 	• Sushanth Sowmyan (Yahoo!)
> 	• Mac Yang (Yahoo!)
> 	• Paul Yang (Facebook)
> 	• Alan Gates (Yahoo!)
> Sponsors
> Champion
> Owen O’Malley
> Nominated Mentors
> 	• Olga Natkovich (Pig PMC member and Apache VP for Pig)
> 	• Alan Gates (Pig PMC member)
> 	• John Sichi (Hive PMC member)
> Sponsoring Entity
> We are requesting the Incubator to sponsor this project.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message