incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Chen <tnac...@gmail.com>
Subject Re: [DISCUSS] Dr. Elephant Incubator Proposal
Date Wed, 07 Mar 2018 04:52:43 GMT
+1 as well, I think the work Dr. Elephant is doing can also be
potentially applied to more than Spark and Hadoop.

Tim

On Tue, Mar 6, 2018 at 8:38 PM, Kevin A. McGrail <kmcgrail@apache.org> wrote:
> I'm intrigued by the proposal and the product. I'm a 0.5+.
>
> I'd love to know more about why LI put it on GitHub and what problems it's
> having that are leading to a foundation.
>
> --
> Kevin A. McGrail
> Asst. Treasurer & VP Fundraising, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>
> On Tue, Mar 6, 2018 at 8:27 PM, Gangumalla, Uma <uma.gangumalla@intel.com>
> wrote:
>
>> I would +1 to have as a separate project instead of pushing under Hadoop.
>> When a project can sustain by having potential to build community on its
>> own and can run logically as independent module, I feel that’s good enough
>> to start as separate project.
>>
>> I could not recall the discussions on removal of Vaidya package from
>> Hadoop. If someone remembers, it would be great to know the reasons for
>> removal of that package from Hadoop base. [ probably at the time of
>> mavenization ? ]
>>
>> Regards,
>> Uma
>>
>> On 3/6/18, 3:17 PM, "mdrob@cloudera.com on behalf of Mike Drob" <
>> mdrob@cloudera.com on behalf of mdrob@apache.org> wrote:
>>
>>     Why does Dr. Elephant make sense as a separate project instead of
>>     contributing to Hadoop directly?
>>
>>     What is the relationship between Dr. Elephant and the (now seemingly
>>     defunct) Hadoop Vaidya?
>>
>>     On Tue, Mar 6, 2018 at 5:08 PM, Carl Steinbach <cws@apache.org> wrote:
>>
>>     > Hi,
>>     >
>>     > I would like to propose Dr. Elephant as an Apache Incubator
>>     > project. The proposal is available as a draft at
>>     > https://wiki.apache.org/incubator/DrElephantProposal. I have also
>>     > included the text of the proposal below.
>>     >
>>     > Any feedback from the community is much appreciated.
>>     >
>>     > Thanks.
>>     >
>>     > - Carl
>>     >
>>     >
>>     > = ABSTRACT =
>>     >
>>     > Dr. Elephant is a performance monitoring and tuning service for
>> Apache
>>     > Hadoop and Apache Spark jobs and workflows. While the system is
>>     > primarily aimed at developers, we have discovered that it is also
>>     > popular with cluster operators who use it to monitor the health of
>>     > workloads running on their clusters.
>>     >
>>     > = PROPOSAL =
>>     >
>>     > Dr. Elephant was open sourced by LinkedIn in 2016 and is currently
>>     > hosted on GitHub. We believe that being a part of the Apache Software
>>     > Foundation will improve the diversity and help form a strong
>> community
>>     > around the project.
>>     >
>>     > LinkedIn submits this proposal to donate the code base to the Apache
>>     > Software Foundation. The code is already under Apache License 2.0.
>>     > Both the source code and documentation are hosted on Github.
>>     >
>>     >  * Code: http://github.com/linkedin/dr-elephant
>>     >  * Documentation: https://github.com/linkedin/dr-elephant/wiki
>>     >
>>     > = Background =
>>     >
>>     > Dr. Elephant is a service that helps users of Apache Hadoop and
>> Apache
>>     > Spark understand, analyze, and improve the performance of jobs and
>>     > workflows running on their clusters. It automatically gathers
>> metrics,
>>     > performs analysis, and presents the results along with actionable
>>     > advice. The goal of the project is to improve developer productivity
>>     > and increase cluster efficiency by reducing the time and domain
>>     > expertise required to diagnose and treat sick jobs. It analyzes
>> Hadoop
>>     > and Spark jobs using a set of configurable, extensible, rule-based
>>     > heuristics that provide insights on job performance, and then uses
>>     > this information to provide recommendations about how to tune jobs to
>>     > make them run more efficiently.
>>     >
>>     > Dr. Elephant was open sourced in 2016 after two years of
>>     > successful production use at Linkedin. In the time since many new
>>     > features have been added including support for the Oozie and Airflow
>>     > workflow schedulers, improved metrics, and enhancements to the Spark
>>     > history fetcher and Spark heuristics. It is also important to note
>>     > that many of these contributions came from developers outside of
>>     > LinkedIn. We have also been happy to see that many people have been
>>     > able to benefit from running Dr. Elephant including companies like
>>     > Airbnb, Foursquare, Hulu, and Pinterest.
>>     >
>>     > = RATIONALE =
>>     >
>>     > Dr. Elephant's entry to the ASF will be beneficial to both the
>>     > Dr. Elephant and Apache communities. Dr. Elephant has greatly
>>     > benefited from its open source roots. Its community and adoption has
>>     > grown greatly as a result. More importantly, the feedback from the
>>     > community, whether through interactions at meetups or through the
>>     > mailing list, have allowed for a rich exchange of ideas. We believe a
>>     > partnership with the Apache Foundation is the logical next step. The
>>     > Dr. Elephant community will greatly benefit from the established
>>     > development and consensus processes that have worked well for other
>>     > projects. The Apache process has served many other open source
>>     > projects well and we believe that the Dr. Elephant community will
>>     > greatly benefit from these practices as well.
>>     >
>>     > = CURRENT STATUS =
>>     >
>>     > Dr. Elephant is currently open sourced under the Apache License
>>     > Version 2.0 and is available at github.com/linkedin/dr-elephant. All
>>     > of the development is done using GitHub Pull Requests.
>>     >
>>     > We are aware of at least 10 organizations that are running
>>     > Dr. Elephant, and many of these organizations have also contributed
>>     > code. Dr. Elephant has also been integrated into commercial products
>>     > such as Pepperdata's Application Profiler.
>>     >
>>     > = INITIAL GOALS =
>>     >
>>     > Our initial goals are as follows:
>>     >
>>     >  * Migrate the existing codebase to Apache
>>     >  * Study and integrate with the Apache development process
>>     >  * Ensure all dependencies are compliant with Apache License version
>> 2.0
>>     >  * Incremental development and releases per Apache guidelines
>>     >  * Diversify the set of core developers and committers
>>     >
>>     > = MERITOCRACY =
>>     >
>>     > Following the Apache meritocracy model, we intend to build an open
>> and
>>     > diverse community around Dr. Elephant. We will encourage the
>> community to
>>     > contribute to discussions and the codebase.
>>     >
>>     > = COMMUNITY =
>>     >
>>     > The need for a simple and understandable performance monitoring and
>>     > tuning service for Hadoop and Spark is tremendous. Dr. Elephant is
>>     > currently being used by at least 10 organizations worldwide (some
>>     > examples are listed here). We hope to extend the contributor base
>>     > significantly by bringing Dr. Elephant into Apache.
>>     >
>>     > = CORE DEVELOPERS =
>>     >
>>     > Dr. Elephant was started by engineers at LinkedIn. Many other
>>     > individuals and organizations have contributed to the project, and
>>     > this diversity is reflected in the list of initial committers.
>>     >
>>     > = ALIGNMENT =
>>     >
>>     > Apache is the most natural home for Dr. Elephant because of its close
>>     > relationship to Apache Hadoop and Apache Spark, and its integration
>>     > with Apache Oozie and Apache Airflow (incubating).
>>     >
>>     > = KNOWN RISKS =
>>     >
>>     > == Orphaned products ==
>>     >
>>     > The risk of the Dr. Elephant project being abandoned is minimal. As
>>     > noted earlier, there are many organizations that have benefitted from
>>     > Dr. Elephant, and which are thus incentivized to continue
>>     > development. In addition, the software vendor PepperData has
>>     > integrated Dr. Elephant into their Application Profiler product.
>>     >
>>     > == Inexperience with Open Source ==
>>     >
>>     > Dr. Elephant has existed as a healthy open source project since
>>     > 2016. Any risks that we foresee are ones associated with scaling our
>>     > open source communication and operation process rather than with
>>     > inherent inexperience in operating as an open source project.
>>     >
>>     > == Homogenous Developers ==
>>     >
>>     > Apart from Linkedin’s developers, Dr. Elephant has developers from
>>     > Airbnb, Pepperdata, Flipkart, Hulu, Foursquare, Altiscale, PayPal,
>>     > Evariant, Didi, Trivago, and Cardlytics.
>>     >
>>     > A lot of effort has been put for efficient communication between all
>>     > the developers. We have set up different forums for communication
>> like
>>     > github issues, google groups mailing list, gitter chat, weekly
>>     > hangouts, and frequent meetups.
>>     >
>>     > == Reliance on Salaried Developers ==
>>     >
>>     > It is expected that Dr. Elephant development will occur on both
>>     > salaried time and on volunteer time, after hours. Many of the initial
>>     > committers are paid by their employer to contribute to this
>>     > project. However, they are all passionate about the project, and we
>>     > are confident that the project will continue even if no salaried
>>     > developers contribute to the project. We are committed to recruiting
>>     > additional committers including non-salaried developers.
>>     >
>>     > == A Excessive Fascination with the Apache Brand ==
>>     >
>>     > While we respect the reputation of the Apache brand and have no
>> doubts
>>     > that it will attract contributors and users, we believe the ASF is
>> the
>>     > right home for Dr. Elephant to foster a great community that will
>> lead
>>     > to a better outcome in the long term.
>>     >
>>     > = Documentation =
>>     >
>>     > Dr Elephant's developer wiki: https://github.com/linkedin/
>> dr-elephant/wiki
>>     >
>>     > = Initial Source =
>>     >
>>     > Dr Elephant's initial source contribution will come from
>>     > https://github.com/linkedin/dr-elephant
>>     >
>>     > The code is licensed under the Apache License V2.
>>     >
>>     > = Source and Intellectual Property Submission Plan =
>>     >
>>     > The Dr. Elephant codebase is currently hosted on Github. This is the
>>     > exact codebase that we would migrate to the Apache Software
>>     > Foundation. The Dr. Elephant source code is already licensed under
>>     > Apache License Version 2.0. Going forward, we will continue to have
>>     > all the contributions licensed directly to the Apache Software
>>     > Foundation through our signed Individual Contributor License
>>     > Agreements for all of the committers on the project.
>>     >
>>     > = External Dependencies =
>>     >
>>     > To the best of our knowledge all of Dr. Elephant’s dependencies are
>>     > distributed under Apache Software Foundation compatible licenses.
>> Upon
>>     > acceptance to the incubator, we will begin a thorough analysis of all
>>     > transitive dependencies to verify this fact and introduce license
>>     > checking into the build and release process.
>>     >
>>     > = Cryptography =
>>     >
>>     > We do not expect Dr. Elephant to be a controlled export item due to
>>     > the use of encryption.
>>     >
>>     > = Required Resources =
>>     >
>>     > == Mailing lists ==
>>     >
>>     >  * private@drelephant.incubator.apache.org (moderated subscriptions)
>>     >  * commits@drelephant.incubator.apache.org
>>     >  * dev@drelephant.incubator.apache.org
>>     >  * issues@drelephant.incubator.apache.org
>>     >  * user@drelephant.incubator.apache.org
>>     >
>>     > == Git Repository ==
>>     >
>>     > Git is the preferred source control system:
>>     > git://git.apache.org/dr-elephant
>>     >
>>     > == Issue Tracking ==
>>     >
>>     > JIRA project DOCTOR
>>     >
>>     > == Other Resources ==
>>     >
>>     > The existing code already has unit and integration tests, so we would
>>     > like a Jenkins instance to run them whenever a new patch is
>>     > submitted. This can be added after project creation.
>>     >
>>     > = Initial Committers =
>>     >
>>     >  * Akshay Rai <akshayrai09 at gmail dot com>
>>     >  * Anant Nag <nntnag17 at gmail dot com>
>>     >  * Chetna Chaudhari <chetnachaudhari at gmail dot com>
>>     >  * Clemens Valiente <clemens dot valiente at gmail dot com>
>>     >  * Fangshi Li <shengzhixia at gmail dot com>
>>     >  * George Wu <georgieewuu at gmail dot com>
>>     >  * Krishna Puttaswamy <krishnaprasad dot pn at gmail dot com>
>>     >  * Maxime Kestemont <maxkestemont at hotmail dot com>
>>     >  * Noam Shaish <noamshaish at gmail dot com>
>>     >  * Paul Reed Bramsen <prb at paulbramsen dot com>
>>     >  * Ragesh K R <ragesh dot rajagopalan at gmail dot com>
>>     >  * Shankar Manian <shankar37 at gmail dot com>
>>     >  * Shahrukh Khan <shahrukhkhan489 at gmail dot com>
>>     >  * Shekhar Gupta <shkhrgptat gmail dot com>
>>     >  * Shida Li <lishid at gmail dot com>
>>     >
>>     > == Affiliations ==
>>     >
>>     >  * Akshay Rai - Linkedin
>>     >  * Anant Nag - Linkedin
>>     >  * Chetna Chaudhari - SkyTv New Zealand
>>     >  * Clemens Valiente - trivago GmbH
>>     >  * Fangshi Li - Linkedin
>>     >  * George Wu - Pinterest
>>     >  * Krishna Puttaswamy - Airbnb
>>     >  * Mark Wagner - Linkedin
>>     >  * Maxime Kestemont - Criteo
>>     >  * Noam Shaish - Nordea Bank
>>     >  * Ragesh K R - Linkedin
>>     >  * Shankar Manian - Linkedin
>>     >  * Shahrukh Khan - Hortonworks
>>     >  * Shekhar Gupta - Pepperdata
>>     >  * Shida Li - Dynalist Inc.
>>     >
>>     > = Sponsors =
>>     > == Champion ==
>>     >  * Carl Steinbach
>>     >
>>     > == Nominated Mentors ==
>>     >   * Carl Steinbach (LinkedIn)
>>     >
>>     > == Sponsoring Entity ==
>>     > The Apache Incubator
>>     >
>>
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message