incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dsh <>
Subject Re: [PROPOSAL] Propose Howl as an Apache Incubator project
Date Sun, 13 Feb 2011 18:04:15 GMT
There as well exists a Zeroconf implementation called Howl and I am
pretty certain that some other projects are using that name too.


On Sun, Feb 13, 2011 at 6:57 PM, Mohammad Nour El-Din
<> wrote:
> Good catch, but allow me to disagree with you. Howl here is a name,
> while OW2 HOWL is an acronym for "High-speed ObjectWeb Logger" and on
> [1] it is all written in capital letters.
> [1] -
> On Sun, Feb 13, 2011 at 5:52 PM, Brian McCallister <> wrote:
>> The proposal looks fine, but the name collides with
>> -Brian
>> On Thu, Feb 10, 2011 at 1:37 PM, Alan Gates <> wrote:
>>> I would like to propose Howl as an Apache Incubator project.  Howl is a
>>> table and storage management service for data created using Apache Hadoop.
>>>  The proposal is on the Incubator wiki at
>>> and is pasted below.  Thanks.
>>> Alan.
>>> == Abstract ==
>>> Howl is a table and storage management service for data created using Apache
>>> Hadoop.
>>> == Proposal ==
>>> The vision of Howl is to provide table management and storage management
>>> layers for Apache Hadoop.  This includes:
>>>  * Providing a shared schema and data type mechanism.
>>>  * Providing a table abstraction so that users need not be concerned with
>>> where or how their data is stored.
>>>  * Providing interoperability across data processing tools such as Pig, Map
>>> Reduce, Streaming, and Hive.
>>> == Background ==
>>> Data processors using Apache Hadoop have a common need for table management
>>> services.  The goal of a table management service is to track data that
>>> exists in a Hadoop grid and present that data to users in a tabular format.
>>>  Such a table management service needs to provide a single input and output
>>> format to users so that individual users need not be concerned with the
>>> storage formats that are chosen for particular data sets.  As part of having
>>> a single format, the data will need to be described by one type of schema
>>> and have a single datatype system.
>>> Additionally, users should be free to choose the best tools for their use
>>> cases.  The Hadoop project includes Map Reduce, Streaming, Pig, and Hive,
>>> and additional tools exist such as Cascading.  Each of these tools has users
>>> who prefer it, and there are use cases best addressed by each of these
>>> tools.  Two users on the same grid who need to share data should not be
>>> constrained to use the same tool but rather should be free to choose the
>>> best tool for their use case.  A table management service that presents data
>>> in the same way to all of the tools can alleviate this problem by providing
>>> interfaces to each of the data processing tools.
>>> There are also a few other features a table management service should
>>> provide, such as notification of when data arrives.
>>> A couple of developers at Yahoo! started the project. It is based on the
>>> Hive !MetaStore component. There is good amount of interest in such a
>>> service expressed from Yahoo!, Facebook, !LinkedIn, and, others. We are
>>> therefore proposing to place Howl in the Apache incubator and to build an
>>> open source community around it.
>>> == Rationale ==
>>> There is a strong need for a table management service, especially for large
>>> grids with petabytes of data, and where the data volume is increasing by the
>>> day. Hadoop users need to find data to read and have a place to store their
>>> data.  Currently users must understand the location of data to read, the
>>> storage format, compression techniques used, etc.  To write data they need
>>> to understand where on HDFS their data belongs, the best compression format
>>> to use, how their data should be serialized, etc.
>>> Most users do not want to be concerned with these issues.  They want these
>>> managed for them.
>>> Having it as an Apache Open Source project will highly benefit Howl from the
>>> point of view of getting a large community that currently uses Hadoop and
>>> the other products built around Hadoop (like Pig, Hive, etc.). Users of the
>>> Hadoop ecosystem can influence Howl’s roadmap, and contribute to it. Looking
>>> at it in another way, we believe having Howl as part of the Hadoop ecosystem
>>> will be a great benefit to the current Hadoop/Pig/Hive community too.
>>> == Current Status ==
>>> === Meritocracy ===
>>> Our intent with this incubator proposal is to start building a diverse
>>> developer community around Howl following the Apache meritocracy model. We
>>> have wanted to make the project open source and encourage contributors from
>>> multiple organizations from the start. We plan to provide plenty of support
>>> to new developers and to quickly recruit those who make solid contributions
>>> to committer status.
>>> === Community ===
>>> Howl is currently being used by developers at Yahoo! and there has been an
>>> expressed interest from !LinkedIn and Facebook. Yahoo! also plans to deploy
>>> the current version of Howl in production soon. We hope to extend the user
>>> and developer base further in the future. The current developers and users
>>> are all interested in building a solid open source community around Howl.
>>> To work towards an open source community, we have started using the !GitHub
>>> issue tracker and mailing lists at Yahoo! for development discussions within
>>> our group.
>>> === Core Developers ===
>>> Howl is currently being developed by four engineers from Yahoo! - Devaraj
>>> Das, Ashutosh Chauhan, Sushanth Sowmyan, and Mac Yang. All the engineers
>>> have deep expertise in Hadoop and the Hadoop Ecosystem in general.
>>> === Alignment ===
>>> The ASF is a natural host for Howl given that it is already the home of
>>> Hadoop, Pig, HBase, Cassandra, and other emerging cloud software projects.
>>> Howl was designed to support Hadoop from the beginning in order to solve
>>> data management challenges in Hadoop clusters. Howl complements the existing
>>> Apache cloud computing projects by providing a unified way to manage data.
>>> == Known Risks ==
>>> === Orphaned Products ===
>>> The core developers plan to work full time on the project. There is very
>>> little risk of Howl getting orphaned since large companies like Yahoo! are
>>> planning to deploy this in their production Hadoop clusters. We believe we
>>> can build an active developer community around Howl (companies like Facebook
>>> and !LinkedIn have also expressed interest).
>>> === Inexperience with Open Source ===
>>> All of the core developers are active users and followers of open source.
>>> Devaraj Das is an Apache Hadoop committer and Apache Hadoop PMC member, and
>>> has experience with the Apache infrastructure and development process.
>>> Ashutosh Chauhan is an Apache Pig committer and Apache Pig PMC member.
>>>  Sushanth Sowmyan and Mac Yang made contributions to the Apache Hive and the
>>> Apache Chukwa projects.
>>> === Homogeneous Developers ===
>>> The current core developers are all from Yahoo! However, we hope to
>>> establish a developer community that includes contributors from several
>>> corporations, and we are starting to work towards this with Facebook and
>>> !LinkedIn.
>>> === Reliance on Salaried Developers ===
>>> Currently, the developers are paid to do work on Howl. However, once the
>>> project has a community built around it, we expect to get committers and
>>> developers from outside the current core developers. Companies like Yahoo!
>>> are invested in Howl being a solution to the data management problem in
>>> Hadoop clusters, and that is not likely to change.
>>> === Relationships with Other Apache Products ===
>>> Howl is going to be used by users of Hadoop, Pig, and Hive. See section
>>> Initial Source below for more information about Howl's relationship to Hive.
>>> === An Excessive Fascination with the Apache Brand ===
>>> While we respect the reputation of the Apache brand and have no doubts that
>>> it will attract contributors and users, our interest is primarily to give
>>> Howl a solid home as an open source project following an established
>>> development model. We have also given reasons in the Rationale and Alignment
>>> sections.
>>> == Documentation ==
>>> Information about Howl can be found at The
>>> following sources may be useful to start with:
>>>  * The !GitHub site:
>>>  * The roadmap:
>>> == Initial Source ==
>>> Howl has been under development since Summer 2010 by a team of engineers in
>>> Yahoo!.  It is currently hosted on !GitHub under an Apache license at
>>> The initial development of Howl has consisted of:
>>>  * maintaining a branch of the entire Hive codebase
>>>  * getting Howl-related patches committed to Hive
>>>  * developing Howl-specific plugins and wrappers to customize Hive behavior
>>> At runtime, Howl executes Hive code for metastore and CLI+DDL, disabling
>>> anything related to Hadoop map/reduce execution.  It also makes use of the
>>> RCFile storage format contained in Hive.
>>> This approach was taken as a first step in order to validate the required
>>> functionality and get a production version working.  However, in the
>>> long-term, maintaining a clone of Hive is undesirable.  One possible
>>> resolution is to factor the metastore+CLI+DDL components out of Hive and
>>> move them into Howl (making Hive dependent on Howl).  Another possible
>>> resolution is to remove the copy of Hive from Howl and do the build/release
>>> engineering necessary to make Howl depend on Hive.  As part of the
>>> incubation process, we plan to work towards resolution of these issues.
>>> == External Dependencies ==
>>> The dependencies all have Apache compatible licenses.
>>> == Cryptography ==
>>> Not applicable.
>>> == Required Resources ==
>>> === Mailing Lists ===
>>>  * howl-private for private PMC discussions (with moderated subscriptions)
>>>  * howl-dev
>>>  * howl-commits
>>>  * howl-user
>>> === Subversion Directory ===
>>> === Issue Tracking ===
>>> JIRA Howl (HOWL)
>>> === Other Resources ===
>>> The existing code already has unit tests, so we would like a Hudson instance
>>> to run them whenever a new patch is submitted. This can be added after
>>> project creation.
>>> == Initial Committers ==
>>>  * Devaraj Das
>>>  * Ashutosh Chauhan
>>>  * Sushanth Sowmyan
>>>  * Mac Yang
>>>  * Paul Yang
>>>  * Alan Gates
>>> A CLA is already on file for Sushanth.
>>> == Affiliations ==
>>>  * Devaraj Das (Yahoo!)
>>>  * Ashutosh Chauhan (Yahoo!)
>>>  * Sushanth Sowmyan (Yahoo!)
>>>  * Mac Yang (Yahoo!)
>>>  * Paul Yang (Facebook)
>>>  * Alan Gates (Yahoo!)
>>> == Sponsors ==
>>> === Champion ===
>>> Owen O’Malley
>>> === Nominated Mentors ===
>>>  * Olga Natkovich (Pig PMC member and Apache VP for Pig)
>>>  * Alan Gates (Pig PMC member)
>>>  * John Sichi (Hive PMC member)
>>> === Sponsoring Entity ===
>>> We are requesting the Incubator to sponsor this project.
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> --
> Thanks
> - Mohammad Nour
>   Author of (WebSphere Application Server Community Edition 2.0 User Guide)
> - LinkedIn:
> - Blog:
> ----
> "Life is like riding a bicycle. To keep your balance you must keep moving"
> - Albert Einstein
> "Writing clean code is what you must do in order to call yourself a
> professional. There is no reasonable excuse for doing anything less
> than your best."
> - Clean Code: A Handbook of Agile Software Craftsmanship
> "Stay hungry, stay foolish."
> - Steve Jobs
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message