incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: Slider Proposal
Date Sat, 12 Apr 2014 10:27:57 GMT
On 10 April 2014 16:28, Andrew Purtell <> wrote:

> Hi Steve,
> Does Slider target the deployment and management of components/projects in
> the Hadoop project itself? Not just the ecosystem examples mentioned in the
> proposal? I don't see this mentioned in the proposal.


That said, some of the stuff I'm prototyping on a service registry should
be usable for existing code -there's no reason why a couple of zookeeper
arguments shouldn't be enough to look up the bindings for HDFS, Yarn, etc.

I've not done much there -currently seeing how well curator service
discovery works- so assistance would be welcome.

> The reason I ask is I'm wondering how Slider differentiates from projects
> like Apache Twill or Apache Bigtop that are already existing vehicles for
> achieving the aims discussed in the Slider proposal.

Twill: handles all the AM logic for running new code packaged as a JAR with
an executor method
Bigtop: stack testing

> Tackling
> cross-component resource management issues could certainly be that, but
> only if core Hadoop services are also brought into the deployment and
> management model, because IO pathways extend over multiple layers and
> components. You mention HBase and Accumulo as examples. Both are HDFS
> clients. Would it be insufficient to reserve or restrict resources for e.g.
> the HBase RegionServer without also considering the HDFS DataNode?

IO quotas is a tricky one -you can't cgroup-throttle a container for HDFS
IO as it takes place on local and remote DN processes. Without doing some
priority queuing in the DNs we can hope for some labelling of nodes in the
YARN cluster so you can at least isolate the high-SLA apps from IO
intensive but lower priority code.

> Do the
> HDFS DataNode and HBase RegionServer have exactly the same kind of
> deployment, recovery/restart, and dynamic scaling concerns?

DN's react to loss of the NN by spinning on the cached IP address, or, in
HA, to the defined failover address. Now, if we did support ZK lookup of NN
IPC and Web ports we could consider an alternate failure mode where the DNs
do intermittently poll the ZK bindings during the spin cycle

HBase and accumulo do have their own ZK binding mechanism, so don't really
need their own registry. But to work with their data you do need the
relevant client apps. I would like to have some standard for at least
publishing the core binding information in a way that could be parsed by
any client app (CLI, web UI, other in-cluster apps)

> Or are these
> sort of considerations outside the Slider proposal scope?

NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message