incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Zaliva <>
Subject re: [PROPOSAL] Whirr Project
Date Fri, 07 May 2010 01:15:39 GMT
Tom White wrote:

> I would like to propose Whirr as an incubator proposal.
> Whirr will be a set of libraries for running cloud services, such as
> Hadoop or Cassandra. The initial code (for Hadoop) is hosted as a
> Hadoop contrib module, but I believe it would flourish as its own
> project with its own community.
> The proposal is on the incubator wiki at

I think it is certainly something very useful. I've been recently working on a project which
requires automatic Hadoop task and PIG script  launching and task lifecycle management on
regular Hadoop cluster and Amazon EMR (HAMAKE). We were trying to make it transparent to the
user in what cloud his Hadoop task is executed. So having something like Whirr would be of
great benefit to projects like this.

The proposal currently talks mostly about libraries. We need libraries and more language bindings
is better, but that comes after APIs. Having set of well thought, well defined APIs for managing
services in the cloud is the first step, IMHO. Then we can start developing server side bindings
as well as client site libraries. I hope these APIs will become standard for many cloud providers
and eventually supported natively. Meanwhile we can take upon us a task of developing an intermediate
layer providing these APIs on top of "native" APIs currently offered by Hadoop, EMR and others.
So to sum up my point, the API definition effort should not be downplayed. I think it is as
important as actual library code development.

Taking Hadoop Tasks as an example, I would like to see APIs not just for provisioning and
launching tasks but a set of full lifecycle management capabilities. Monitoring, profiling,
terminating, getting access to debug information (logs, counters), etc. I think this was implied
in the proposal, but I just want to underline this. 

So basically my take on this project is that it is a chance for us to define a baseline definition
of service-based architecture for some cloud services, abstracting each cluster as a set of
services exposed via well defined APIs, along with reference implementation of an access libraries.

I hope that these thoughts are in line with other people vision of the project. I will be
glad to participate in the project: write code and help to design APIs.

Vadim Zaliva

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message