incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Sakellis <kos...@cloudera.com>
Subject Re: [PROPOSAL] Livy Proposal for Apache Incubator
Date Mon, 22 May 2017 21:27:14 GMT
On Mon, May 22, 2017 at 1:01 AM, Luciano Resende <luckbr1975@gmail.com>
wrote:

> +1
>
> Also, I see the proposal is short on mentors, so feel free to include me as
> a mentor for the project.
>
> Thanks
>

Thanks Luciano! Welcome onboard.


> On Fri, May 19, 2017 at 4:45 PM Sean Busbey <busbey@apache.org> wrote:
>
> > Dear Apache Incubator Community,
> >
> > I'm excited to present for discussion a proposal to move Livy into
> > incubation. Livy is web service that exposes a REST interface for
> managing
> > long running Apache Spark contexts in your cluster. With Livy, new
> > applications can be built on top of Apache Spark that require fine
> grained
> > interaction with many Spark contexts.
> >
> > The proposal is on the wiki at the following page as well as copied in
> the
> > email below:
> >
> > https://wiki.apache.org/incubator/LivyProposal
> >
> > In addition to welcoming feedback on the proposal, we are actively
> seeking
> > one or more additional mentors. We also have included a section for
> > interested folks to ensure they get added to the mailing lists, presuming
> > Livy gets accepted for incubation.
> >
> > ---- LivyProposal
> >
> > = Abstract =
> >
> > Livy is web service that exposes a REST interface for managing
> > long running Apache Spark contexts in your cluster. With Livy, new
> > applications can be built on top of Apache Spark that require fine
> grained
> > interaction with many Spark contexts.
> >
> > = Proposal =
> >
> > Livy is an open-source REST service for Apache Spark. Livy
> > enables applications to submit Spark applications and retrieve results
> > without a co-location requirement on the Spark cluster.
> >
> > We propose to contribute the Livy codebase and associated artifacts (e.g.
> > documentation, web-site context etc) to the Apache Software Foundation.
> >
> > = Background =
> >
> > Apache Spark is a fast and general purpose distributed
> > compute engine, with a versatile API. It enables processing of large
> > quantities of static data distributed over a cluster of machines, as well
> > as
> > processing of continuous streams of data. It is the preferred distributed
> > data processing engine for data engineering, stream processing and data
> > science workloads. Each Spark application uses a construct called the
> > SparkContext, which is the application’s connection or entry point to the
> > Spark engine. Each Spark application will have its own SparkContext.
> >
> > Livy enables clients to interact with one or more Spark sessions through
> > the
> > Livy Server, which acts as a proxy layer. Livy Clients have fine grained
> > control over the lifecycle of the Spark sessions, as well as the ability
> to
> > submit jobs and retrieve results, all over HTTP.  Clients have two modes
> of
> > interaction: RPC Client API, available in Java and Python, which allows
> > results to be retrieved as Java or Python objects. The serialization and
> > deserialization of the results is handled by the Livy framework.  HTTP
> > based
> > API that allows submission of code snippets, and retrieval of the results
> > in
> > different formats.
> >
> > Multi-tenant resource allocation and security: Livy enables multiple
> > independent Spark sessions to be managed simultaneously. Multiple clients
> > can also interact simultaneously with the same Spark session and share
> the
> > resources of that Spark session. Livy can also enforce secure,
> > authenticated
> > communication between the clients and their respective Spark sessions.
> >
> > More information on Livy can be found at the existing open source
> website:
> > http://livy.io/
> >
> > = Rationale =
> >
> > Users want to use Spark’s powerful processing engine and API
> > as the data processing backend for interactive applications. However, the
> > job submission and application interaction mechanisms built into Apache
> > Spark are insufficient and cumbersome for multi-user interactive
> > applications.
> >
> > The primary mechanism for applications to submit Spark jobs is via
> > spark-submit
> > (http://spark.apache.org/docs/latest/submitting-applications.html),
> which
> > is
> > available as a command line tool as well as a programmatic API. However,
> > spark-submit has the following limitations that make it difficult to
> build
> > interactive applications: It is slow: each invocation of spark-submit
> > involves a setup phase where cluster resources are acquired, new
> processes
> > are forked, etc. This setup phase runs for many seconds, or even minutes,
> > and hence is too slow for interactive applications.  It is cumbersome and
> > lacks flexibility: application code and dependencies have to be
> > pre-compiled
> > and submitted as jars, and can not be submitted interactively.
> >
> > Apache Spark comes with an ODBC/JDBC server, which can be used to submit
> > SQL
> > queries to Spark. However, this solution is limited to SQL and does not
> > allow the client to leverage the rest of the Spark API, such as RDDs,
> MLlib
> > and Streaming.
> >
> > A third way of using Spark is via its command-line shell, which allows
> the
> > interactive submission of snippets of Spark code. However, the shell
> > entails
> > running Spark code on the client machine and hence is not a viable
> > mechanism
> > for remote clients to submit Spark jobs.
> >
> > Livy solves the limitations of the above three mechanisms, and provides
> the
> > full Spark API as a multi-tenant service to remote clients.
> >
> > Since the open source release of Livy in late 2015, we have seen
> tremendous
> > interest among a diverse set of application developers and ISVs that want
> > to
> > build applications with Apache Spark. To make Livy a robust and flexible
> > solution that will enable a broad and growing set of applications, it is
> > important to grow a large and varied community of contributors.
> >
> > = Initial Goals =
> >
> > Move existing codebase, website, documentation and mailing
> > lists to Apache-hosted infrastructure Work with the infrastructure team
> to
> > implement and approve our code review, build, and testing workflows in
> the
> > context of the ASF Incremental development and releases per Apache
> > guidelines
> >
> > = Current Status =
> >
> > The Livy project began at Cloudera, as a part of the Hue
> > project. Cloudera soon realized the broad applicability of Livy, and
> > separated it out into an independent project in Nov 2015.
> >
> > == Releases ==
> >
> > Livy has undergone two public releases, tagged here:
> >
> > * https://github.com/cloudera/livy/releases/tag/v0.2.0
> > * https://github.com/cloudera/livy/releases/tag/v0.3.0
> >
> > Tarballs and zip files were created for each release and hosted on
> github.
> > Upon joining the incubator, we will adopt a more typical ASF release
> > process.
> >
> > == Source ==
> >
> > Livy’s source is currently hosted on Github at:
> >
> > https://github.com/cloudera/livy
> >
> > This repository will be transitioned to Apache’s git hosting during
> > incubation.
> >
> > == Code review ==
> >
> > Livy’s code reviews are currently public and hosted on
> > github as pull request reviews at: https://github.com/cloudera/
> livy/pulls
> > The Livy developer community so far is happy with github pull request
> > reviews and hopes to continue this after being admitted to the ASF.
> >
> > == Issue Tracking ==
> >
> > Livy’s bug and feature tracking is hosted on JIRA at:
> > https://issues.cloudera.org/projects/LIVY/summary This JIRA instance
> > contains bugs and development discussion dating back 1 year and will
> > provide
> > an initial seed for the ASF JIRA
> >
> > == Community Discussion ==
> >
> > Livy has several public discussion forums:
> >
> > * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-dev
> > * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-user
> >
> > == Development Practices ==
> >
> > The Livy project follows a review before commit philosophy. Every commit
> > automatically runs through the unit tests and generates coverage reports
> > presented as a pull request comment. Our experience with this process
> leads
> > us to believe that it helps ease new contributors into the project. They
> > get
> > feedback quickly on common mistakes, lowering the burden on reviewers.
> > Those
> > same reviewers get to lead by example, showing the new contributors that
> we
> > value feedback within our community even when changes are done by more
> > experienced folks.
> >
> > == Meritocracy ==
> >
> > We believe strongly in meritocracy when electing committers and PMC
> > members.
> > In the past few months, the project has added two new committers from two
> > different organisations, in recognition of their significant
> contributions
> > to the project. We will encourage contributions and participation of all
> > types, and ensure that contributors are appropriately recognized.
> >
> > == Community ==
> >
> > Though Livy is relatively new as a standalone open source project, it has
> > already seen promising growth in its community across several
> > organizations:
> > Cloudera is the original development sponsor for Livy Microsoft pushed
> the
> > development of the interpreter fixing high availability issues and adding
> > additional features.  Hortonworks has contributed the security features
> to
> > Livy allowing kerberos and impersonation to work with Spark IBM is
> starting
> > to make contributions to the Livy project A number of other patches
> > contributed by community members
> >
> > Livy currently relies on Google Groups for mailing lists. These lists
> have
> > been active since the end of 2015/start of 2016. Currently, Livy’s user
> > mailing list has 173 subscribers and has hosted a total of 227 topic
> > threads. Livy’s developer list has 49 subscribers and has hosted 79 topic
> > threads.
> >
> > == Core Developers ==
> >
> > The early contributions to Livy were made by Cloudera engineers. In 2016,
> > engineers from Microsoft and Hortonworks joined the core developer
> > community.
> >
> > == Alignment ==
> >
> > Livy is built upon Apache Spark, and other Apache projects like Apache
> > Hadoop YARN. It’s used as a building block by Apache Zeppelin.  These
> > community connections combined with our focus on development practices
> that
> > emphasize community engagement with a path to meritocratic recognition
> > naturally align us with the ASF.
> >
> > = Known Risks =
> > == Orphaned Products ==
> >
> > The risk of Livy being abandoned is low because it is supported by three
> > major big-data software vendors.  Moreover, Livy is already used to power
> > multiple releases of services and products used in production.
> >
> > == Inexperience with Open Source ==
> >
> > Several of the initial committers are experienced open source developers,
> > several being committers and/or PMC members on other ASF projects (Spark,
> > YARN).
> >
> > == Homogenous Developers ==
> >
> > The project already has a diverse developer base. It has contributions
> from
> > 3 major organisations (Cloudera, Microsoft and Hortonworks), and is used
> in
> > diverse applications, in diverse settings (On-Prem and Cloud).
> >
> > == Reliance on salaried Developers ==
> >
> > The existing contributors to the Livy project have been made by salaried
> > engineers from Cloudera, Microsoft and Hortonworks. Since there are three
> > major organisations involved, the risk of reliance on a single group of
> > salaried developers is mitigated. The Livy user base is diverse, with
> users
> > from across the globe, including users from academic settings. We aim to
> > further diversify the Livy user and contributor base.
> >
> > == Relationships with other Apache projects ==
> >
> > Livy is closely tied to the Apache Spark project and currently addresses
> > the
> > scenarios for a REST based batch and interactive gateway for Spark jobs
> on
> > YARN. Given the growing number of integrations with Livy, keeping it
> > outside
> > of Apache Spark aligns with the desire of the Apache Spark community to
> > reduce the number of external dependencies in the Spark project.
> > Specifically, the Apache Spark community has previously expressed a
> desire
> > to keep job servers independent from the project.<<FootNote(See, for
> > example, discussion of the Ooyala Spark Job Server in SPARK-818)>>
> > Furthermore, while Livy common usage is closely tied to Spark deployments
> > right now, its core building blocks can be reused elsewhere.  Livy’s
> Remote
> > REPL could be used as a library for interactive scenarios in non-Spark
> > projects. In the future, integrations with cluster managers like Apache
> > Mesos and others could also be added.
> >
> > The features provided by Livy have already been integrated with existing
> > projects like Jupyter and Apache Zeppelin for their interactive Spark use
> > cases. This validates the need for a project like Livy and provides an
> > active downstream user base that the Livy community can interact with to
> > seed future interest in the project.
> >
> > Livy serves a similar purpose to Apache Toree (incubating) but differs in
> > making session management, security and impersonation a focal design
> point.
> >
> > == An Excessive Fascination with the Apache Brand ==
> >
> > The primary motivation for submitting Livy to the ASF is to grow a
> diverse
> > and strong community. We wish to encourage diverse organisations,
> including
> > ISVs, to adopt Livy and contribute to Livy without any concerns about
> > ownership or licensing.
> >
> > = Documentation =
> >
> > Documentation can be found on the Livy website http://livy.io/ The Livy
> > web
> > site is version controlled on the ‘gh-pages’ branch of the above
> repository
> > Additional documentation is provided on the github wiki:
> > https://github.com/cloudera/livy/wiki APis are documented within the
> > source
> > code as JavaDoc style documentation comments.
> >
> > = Initial Source =
> >
> > The initial source code for Livy is hosted at
> >
> > https://github.com/cloudera/livy
> >
> > = Source and Intellectual Property submission plan =
> >
> > The Livy codebase and web site is currently hosted on GitHub and will be
> > transitioned to the ASF repositories during incubation. Livy is already
> > licensed under the Apache 2.0 license. Cloudera has collected ICLAs and
> > CCLAs from all committers.  There are, however, some contributions
> recently
> > from authors that have not signed the CCLA and ICLA. If necessary for a
> > successful SGA, we’ll seek the necessary documentation or replace the
> > contributions.
> >
> > The “Livy” name is not a registered trademark. We will need to do a
> > trademark search and make sure it is available for the Apache Foundation
> > prior to graduation.
> >
> > Cloudera currently owns the domain name: http://livy.io/ which will be
> > transferred to the ASF and redirected to the official page during
> > incubation.
> >
> > = External Dependencies =
> >
> > The list below covers the non-Apache dependencies of the project and
> their
> > licenses.
> >
> >  * Jetty: Apache 2.0
> >  * Dropwizard Metrics: Apache 2.0
> >  * FasterXML Jackson: Apache 2.0
> >  * Netty: Apache 2.0
> >  * Scala: BSD
> >  * Py4J: BSD
> >  * Scalatra: BSD
> >
> > Build/test-only dependencies:
> >
> >  * Mockito: MIT
> >  * JUnit: Eclipse
> >
> > = Required Resources =
> > == Mailing Lists ==
> >
> >  * private@livy.incubator.apache.org (PPMC)
> >  * dev@livy.incubator.apache.org (dev mailing list)
> >  * user@livy.incubator.apache.org (User questions)
> >  * commits@livy.incubator.apache.org (subscribers shouldn’t be able to
> > post)
> >  * issues@livy.incubator.apache.org (subscribers shouldn’t be able to
> > post)
> >
> > == Git Repository ==
> >
> > git://git.apache.org/livy
> >
> > == Issue Tracking ==
> >
> > We would like to import our current JIRA project into the ASF JIRA, such
> > that our historical commit message and code comments continue to
> reference
> > the appropriate bug numbers.
> >
> > = Initial Committers =
> >
> >  * Marcelo Vanzin (vanzin@cloudera.com)
> >  * Alex Man (alex@alexman.space)
> >  * Jeff Zhang (zjffdu@gmail.com)
> >  * Saisai Shao (sshao@hortonworks.com)
> >  * Kostas Sakellis (kostas@cloudera.com)
> >
> > = Affiliations =
> >
> > The initial set of committers includes people employed by Cloudera and
> > Hortonworks as well as one person currently unaffiliated with an
> employer.
> >
> > = Additional Interested Contributors =
> >
> > Those interested in getting involved with the project as we enter
> > incubation
> > are encourage to list themselves here.
> >
> >  * < add here >
> >
> > = Sponsors =
> > == Champion ==
> >
> >  * Sean Busbey (busbey@apache.org)
> >
> > == Nominated Mentors ==
> >
> >  * Bikas Saha (bikas@apache.org)
> >  * Brock Noland (brock@phdata.io)
> >
> > == Sponsoring Entity ==
> >
> > We ask that the Incubator PMC sponsor this proposal.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
> > --
> Sent from my Mobile device
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message