incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naresh Agarwal <naresh.agar...@inmobi.com>
Subject Re: [DISCUSS] [PROPOSAL] HTrace for Apache Incubator
Date Mon, 03 Nov 2014 02:19:05 GMT
Just curious if HTrace is aimed only for Hadoop infrastructure/Hadoop based
applications or it can be used in any Java based systems?

Thanks
Naresh

On Mon, Nov 3, 2014 at 1:34 AM, Andrew Purtell <apurtell@apache.org> wrote:

> Really great to see an incubation proposal for HTrace. If you need another
> mentor, please consider me.
>
> I don't think you need to list "HTrace is not the primary focus of any of
> the current list of contributors" as a risk. One can say that about many
> (perhaps the majority) of contributors to Apache projects. We would hope
> the incubation process develops a healthy community that sustains a level
> of contribution that keeps the project moving forward, as we would hope for
> all incubation candidates.
>
>
>
> On Fri, Oct 31, 2014 at 4:06 PM, Roman Shaposhnik <rvs@apache.org> wrote:
>
> > Hi!
> >
> > I would like to propose HTrace to be consider for
> > Apache Incubator. The proposal is attached and
> > is also available on the wiki:
> >     https://wiki.apache.org/incubator/HTraceProposal
> >
> > Please let me know what do you guys think and also
> > don't hesitate to massage the proposal on the wiki
> > based on the feedback from this thread.
> >
> > Thanks,
> > Roman.
> >
> > == Abstract ==
> > HTrace is a tracing framework intended for use with distributed
> > systems written in java.
> >
> > == Proposal ==
> > HTrace is an aid for understanding system behavior and for reasoning
> > about performance
> > issues in distributed systems. HTrace is primarily a low impedance
> > library that a java
> > distributed system can incorporate to generate ‘breadcrumbs’ or
> > ‘traces’ along the path
> > of execution, even as it crosses processes and machines. HTrace also
> > includes various
> > tools and glue for collecting, processing and ‘visualizing’ captured
> > execution traces
> > for analysis ex post facto of where time was spent and what resources
> > were consumed.
> >
> > == Background ==
> > Distributed systems are made up of multiple software components
> > running on multiple
> > computers connected by networks. Debugging or profiling operations run
> > over non-trivial
> > distributed systems -- figuring execution paths and what services,
> > machines, and
> > libraries participated in the processing of a request -- can be involved.
> >
> > == Rationale ==
> > Rather than have each distributed system build its own custom
> > ‘tracing’ libraries,
> > ideally all would use a single project that provides necessary
> > primitives and saves
> > each project building its own visualizations and processing tools anew.
> >
> > Google described “...[a] large-scale distributed systems tracing
> > infrastructure”
> > in Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. The
> > paper
> > tells a compelling story of what is possible when disparate systems
> > standardize
> > on a single tracing library and cooperate, ‘passing the baton’, filling
> out
> > trace context as executions cross systems.
> >
> > HTrace aims to provide a rough equivalent in open source of the described
> > core
> > Dapper tools and library.  As it is adopted by more projects, there will
> > be a
> > ‘network effect’ as HTrace will provide a more comprehensive view of
> > activity
> > on the cluster.  For example, as HDFS gets HTrace support, we can connect
> > this
> > with the HTrace support in HBase to follow HBase requests as they enter
> > HDFS.
> >
> > Given the success of HTrace depends on its being integrated by many
> > projects,
> > HTrace should be perceived as unhampered, free of any commercial,
> > political,
> > or legal ‘taint’. Being an Apache project would help in this regard.
> >
> > == Initial Goals ==
> > HTrace is a small project of narrow scope but with a grand vision:
> >   * Move the HTrace source and repository to Apache, a vendor-neutral
> > location. Currently HTrace resides at a Cloudera-hosted repository.
> >   * Add past contributors as committers and institute Apache governance.
> >   * Evangelize and encourage HTrace diffusion. Initially we will
> > continue a focus on the Hadoop space since that is where most of the
> > initial contributors work and it is where HTrace has been initially
> > deployed.
> >   * Building out the standalone visualization tool that ships with
> HTrace.
> >   * Build more community and add more committers
> >
> > == Current Status ==
> > Currently HTrace has a viable Java trace library that can be interpolated
> > to create ‘traces’.  The work that needs to be done on this library is
> > mostly
> > bug fixes, ease-of-use improvements, and performance tweaks.  In the
> > future,
> > we may add libraries for other languages besides Java.
> >
> > HTrace has means of dumping traces to the filesystem, Twitters’ Zipkin
> > (a tracing
> > sink and visualization system developed by Twitter
> > https://github.com/twitter/zipkin),
> > or Apache HBase.  Executions can be viewed either in Zipkin or in pygraph
> > (https://code.google.com/p/python-graph/).
> >
> > Since the initial sprint in the summer of 2012 which saw HTrace patches
> > proposed
> > for Apache HDFS and committed to Apache HBase, development has been
> > sporadic;
> > mostly a single developer or two adding a feature or bug fixing. HTrace
> is
> > currently undergoing a new “spurt” of development with the effort to get
> > HTrace
> > added to Apache HDFS revived and a new standalone viewing facility being
> > added
> > in to HTrace itself.
> >
> > HTrace has been integrated by Apache Phoenix.
> >
> >
> > === Meritocracy ===
> > HTrace, up to this, has been run by Apache committers and PMC members.
> > We want to
> > build out a diverse developer and user community and run the HTrace
> > project in
> > the Apache way.  Users and new contributors will be treated with respect
> > and
> > welcomed; they will earn merit in the project by tendering quality
> patches
> > and support that move the project forward.  Those with a proven support
> and
> > quality patch track record will be encouraged to become committers.
> >
> > === Community ===
> > There are just a few developers involved at the moment. If our project
> > is accepted
> > by incubator, building community would be a primary initial goal.
> >
> > === Core Developers ===
> >
> > Core developers include Apache members and members of the Hadoop and
> > HBase PMCs.
> > Of those listed, all have contributed to HTrace. Half are from Cloudera.
> > The remainder are Hortonworks, NTTData, Google, and Facebook employees.
> >
> > === Alignment ===
> > HTrace has been integrated into Apache HBase and Apache Phoenix.
> > Integration
> > into Apache HDFS is currently being worked on. Approaching the Apache
> YARN
> > project would be a likely next integration.
> >
> >
> > == Known Risks ==
> > As noted above, development has been sporadic up to this.  It may
> continue
> > so.
> >
> > HTrace is not the primary focus of any of the current list of
> contributors.
> > It is for all a side effort.  HTrace may lack sufficient impetus with
> such
> > a state of affairs.
> >
> > For HTrace to tell a compelling story, it needs to be taken up by
> > significant
> > projects that make up a traced distributed system.  For example, say YARN
> > and
> > HBase take on HTrace but HDFS does not, then the HDFS portions of an
> > end-to-end
> > operation will render opaque compromising our being able to tell a good
> > story
> > around an execution. Because the picture painted has gaps, HTrace may be
> > left
> > aside as ineffective.
> >
> > === Orphaned products ===
> > The proposers have a vested interest in making HTrace succeed, driving
> its
> > development and its insertion into projects we all work on. Its
> dispersion
> > will shine light on difficult to understand interactions amongst the
> > various
> > systems we all work on. A working, integrated HTrace will add a useful
> > debugging mechanism to the Apache projects we all work on.
> >
> >
> > === Inexperience with Open Source ===
> > The majority of the proposers here have day jobs that has them working
> near
> > full-time on (Apache) open source projects. A few of us have helped carry
> > other projects through incubator.  HTrace to date has been developed as
> > an open source project.
> >
> > === Homogenous Developers ===
> > The initial group of committers is small but already we have a healthy
> > diversity of participating companies.  We are bay-area challenged but
> > a Japanese contributor makes for a good counter balance.
> >
> > === Reliance on Salaried Developers ===
> > Most of the contributors are paid to work in the Hadoop ecosystem.
> > While we might wander from our current employers, we probably won’t
> > go far from the Hadoop tree.  Whoever the Hadoop employer, it is
> > plain a successful HTrace project is in everyone’s interest.
> > At least one of the developers has already changed employers but
> > his interest in seeing HTrace succeed prevails.
> >
> > === Relationships with Other Apache Products ===
> > For HTrace to succeed, it is critical we build good relations with
> > other distributed systems projects.  We intend to initially build
> > on relations we already have in place, mostly in the Hadoop space.
> >
> > The HTrace project has been incorporated by Apache HBase and
> > Apache Phoenix. It is currently being actively integrated into
> > Apache HDFS.
> >
> > We do not know of any equivalent or near-equivalent project
> > in the Apache space.
> >
> > The Dapper paper notes precedent, in particular, the Berkeley
> > Rad Lab X-Trace project.
> >
> > ==== How HTrace relates to Zipkin ====
> > Zipkin is an Apache Licensed project from Twitter. It is a complete
> > tracing tool with trace collectors, trace viewers and tools to help
> > you generate traces. It is written in Scala.  If your project is
> > not Scala or if it is Java and you cannot afford a Scala dependency,
> > at a minimum, you need an alternate means of generating traces.
> > HTrace provides this facility for Java as well as bridging tools
> > to feed traces to Zipkin for query and display.
> >
> > The projects complement each other.
> >
> > === A Excessive Fascination with the Apache Brand ===
> > While we intend to leverage the Apache ‘branding’ when talking to other
> > projects as testament of our project’s ‘neutrality’, we have no plans
> > for making use of Apache brand in press releases nor posting billboards
> > advertising acceptance of HTrace into Apache Incubator.
> >
> >
> > == Documentation ==
> > See [[http://htrace.org|htrace.org]] for the current state of the HTrace
> > project and documentation.
> >
> > How to enable tracing in
> > [[http://hbase.apache.org/book/tracing.html|HBase using HTrace]]
> > Elliott Clark on
> > [[
> http://files.meetup.com/1350427/HBase%20Meetup%20-%20Zipkin.pptx|tracing
> > in HBase]]
> >
> > == Initial Source ==
> > Jonathan Leavitt and Todd Lipcon built the first versions of HTrace in
> the
> > summer of 2012.  Jonathan was Todd’s summer intern at Cloudera.
> >
> >
> > == Source and Intellectual Property Submission Plan ==
> > We know of no legal encumberments in the way of transfer of source to
> > Apache.
> >
> > == External Dependencies ==
> > HTrace includes third party libs. These include guava, jetty, junit,
> > protobuf,
> > hbase, and thrift.  All dependencies are Apache licensed or licenses that
> > are
> > palatable: e.g. junit is EPL (Eclipse Public License v1.0) and
> > ProtoBufs are BSD licensed.
> >
> > Cryptography
> > N/A
> >
> > == Required Resources ==
> >
> > === Mailing lists ===
> >   * private@htrace.incubator.apache.org (moderated subscriptions)
> >   * commits@htrace.incubator.apache.org
> >   * dev@htrace.incubator.apache.org
> >   * issues@htrace.incubator.apache.org
> >   * user@htrace.incubator.apache.org
> >
> > === Git Repository ===
> > https://git-wip-us.apache.org/repos/asf/incubator-htrace.git
> >
> > === Issue Tracking ===
> > JIRA HTrace (HTRACE)
> >
> > === Other Resources ===
> > Means of setting up regular builds for htrace on builds.apache.org
> >
> > == Initial Committers ==
> >   * Colin McCabe (cmccabe@apache.org)
> >   * Elliott Clark (eclark@apache.org)
> >   * Jonathan Leavitt (jon.s.leavitt@gmail.com) -- CLA being submitted
> >   * Masatake Iwasaki (iwasakims@gmail.com) -- CLA being submitted
> >   * Michael Stack (stack@apache.org)
> >   * Nick Dimiduk (ndimiduk@apache.org)
> >   * Todd Lipcon (todd@apache.org)
> >
> >
> > == Affiliations ==
> >   * Colin McCabe - Cloudera
> >   * Elliott Clark - Facebook
> >   * Jonathan Leavitt - Google
> >   * Masatake Iwasaki - NTTData
> >   * Michael Stack - Cloudera
> >   * Nick Dimiduk - Hortonworks
> >   * Todd Lipcon - Cloudera
> >
> > == Sponsors ==
> >
> > === Champion ===
> > Roman Shaposhnik
> >
> > === Nominated Mentors ===
> >   * Michael Stack - Apache Member
> >   * Todd Lipcon - Apache Member
> >
> > We will be soliciting more mentors as part of the proposal process.
> >
> > === Sponsoring Entity ===
> > We would like to propose Apache incubator to sponsor this project.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
> >
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message