incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William GUO <guo...@outlook.com>
Subject Re: [PROPOSAL] Livy Proposal for Apache Incubator
Date Sat, 20 May 2017 00:24:56 GMT
+1 

Griffin needs Livy to access Spark context.


Thanks,
William

On 5/20/17, 7:45 AM, "Sean Busbey" <busbey@apache.org> wrote:

    Dear Apache Incubator Community,
    
    I'm excited to present for discussion a proposal to move Livy into
    incubation. Livy is web service that exposes a REST interface for managing
    long running Apache Spark contexts in your cluster. With Livy, new
    applications can be built on top of Apache Spark that require fine grained
    interaction with many Spark contexts.
    
    The proposal is on the wiki at the following page as well as copied in the
    email below:
    
    https://wiki.apache.org/incubator/LivyProposal
    
    In addition to welcoming feedback on the proposal, we are actively seeking
    one or more additional mentors. We also have included a section for
    interested folks to ensure they get added to the mailing lists, presuming
    Livy gets accepted for incubation.
    
    ---- LivyProposal
    
    = Abstract =
    
    Livy is web service that exposes a REST interface for managing
    long running Apache Spark contexts in your cluster. With Livy, new
    applications can be built on top of Apache Spark that require fine grained
    interaction with many Spark contexts.  
    
    = Proposal =
    
    Livy is an open-source REST service for Apache Spark. Livy
    enables applications to submit Spark applications and retrieve results
    without a co-location requirement on the Spark cluster. 
    
    We propose to contribute the Livy codebase and associated artifacts (e.g.
    documentation, web-site context etc) to the Apache Software Foundation.
    
    = Background =
    
    Apache Spark is a fast and general purpose distributed
    compute engine, with a versatile API. It enables processing of large
    quantities of static data distributed over a cluster of machines, as well as
    processing of continuous streams of data. It is the preferred distributed
    data processing engine for data engineering, stream processing and data
    science workloads. Each Spark application uses a construct called the
    SparkContext, which is the application’s connection or entry point to the
    Spark engine. Each Spark application will have its own SparkContext.
    
    Livy enables clients to interact with one or more Spark sessions through the
    Livy Server, which acts as a proxy layer. Livy Clients have fine grained
    control over the lifecycle of the Spark sessions, as well as the ability to
    submit jobs and retrieve results, all over HTTP.  Clients have two modes of
    interaction: RPC Client API, available in Java and Python, which allows
    results to be retrieved as Java or Python objects. The serialization and
    deserialization of the results is handled by the Livy framework.  HTTP based
    API that allows submission of code snippets, and retrieval of the results in
    different formats.
    
    Multi-tenant resource allocation and security: Livy enables multiple
    independent Spark sessions to be managed simultaneously. Multiple clients
    can also interact simultaneously with the same Spark session and share the
    resources of that Spark session. Livy can also enforce secure, authenticated
    communication between the clients and their respective Spark sessions.
    
    More information on Livy can be found at the existing open source website:
    http://livy.io/
    
    = Rationale =
    
    Users want to use Spark’s powerful processing engine and API
    as the data processing backend for interactive applications. However, the
    job submission and application interaction mechanisms built into Apache
    Spark are insufficient and cumbersome for multi-user interactive
    applications.
    
    The primary mechanism for applications to submit Spark jobs is via
    spark-submit
    (http://spark.apache.org/docs/latest/submitting-applications.html), which is
    available as a command line tool as well as a programmatic API. However,
    spark-submit has the following limitations that make it difficult to build
    interactive applications: It is slow: each invocation of spark-submit
    involves a setup phase where cluster resources are acquired, new processes
    are forked, etc. This setup phase runs for many seconds, or even minutes,
    and hence is too slow for interactive applications.  It is cumbersome and
    lacks flexibility: application code and dependencies have to be pre-compiled
    and submitted as jars, and can not be submitted interactively.
    
    Apache Spark comes with an ODBC/JDBC server, which can be used to submit SQL
    queries to Spark. However, this solution is limited to SQL and does not
    allow the client to leverage the rest of the Spark API, such as RDDs, MLlib
    and Streaming.
    
    A third way of using Spark is via its command-line shell, which allows the
    interactive submission of snippets of Spark code. However, the shell entails
    running Spark code on the client machine and hence is not a viable mechanism
    for remote clients to submit Spark jobs.
    
    Livy solves the limitations of the above three mechanisms, and provides the
    full Spark API as a multi-tenant service to remote clients. 
    
    Since the open source release of Livy in late 2015, we have seen tremendous
    interest among a diverse set of application developers and ISVs that want to
    build applications with Apache Spark. To make Livy a robust and flexible
    solution that will enable a broad and growing set of applications, it is
    important to grow a large and varied community of contributors.
    
    = Initial Goals =
    
    Move existing codebase, website, documentation and mailing
    lists to Apache-hosted infrastructure Work with the infrastructure team to
    implement and approve our code review, build, and testing workflows in the
    context of the ASF Incremental development and releases per Apache
    guidelines
    
    = Current Status =
    
    The Livy project began at Cloudera, as a part of the Hue
    project. Cloudera soon realized the broad applicability of Livy, and
    separated it out into an independent project in Nov 2015.
    
    == Releases ==
    
    Livy has undergone two public releases, tagged here:
    
    * https://github.com/cloudera/livy/releases/tag/v0.2.0
    * https://github.com/cloudera/livy/releases/tag/v0.3.0
    
    Tarballs and zip files were created for each release and hosted on github.
    Upon joining the incubator, we will adopt a more typical ASF release
    process.
    
    == Source ==
    
    Livy’s source is currently hosted on Github at:
    
    https://github.com/cloudera/livy
    
    This repository will be transitioned to Apache’s git hosting during
    incubation.
    
    == Code review ==
    
    Livy’s code reviews are currently public and hosted on
    github as pull request reviews at: https://github.com/cloudera/livy/pulls
    The Livy developer community so far is happy with github pull request
    reviews and hopes to continue this after being admitted to the ASF.
    
    == Issue Tracking ==
    
    Livy’s bug and feature tracking is hosted on JIRA at:
    https://issues.cloudera.org/projects/LIVY/summary This JIRA instance
    contains bugs and development discussion dating back 1 year and will provide
    an initial seed for the ASF JIRA
    
    == Community Discussion ==
    
    Livy has several public discussion forums:
    
    * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-dev
    * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-user
    
    == Development Practices ==
    
    The Livy project follows a review before commit philosophy. Every commit
    automatically runs through the unit tests and generates coverage reports
    presented as a pull request comment. Our experience with this process leads
    us to believe that it helps ease new contributors into the project. They get
    feedback quickly on common mistakes, lowering the burden on reviewers. Those
    same reviewers get to lead by example, showing the new contributors that we
    value feedback within our community even when changes are done by more
    experienced folks.
    
    == Meritocracy ==
    
    We believe strongly in meritocracy when electing committers and PMC members.
    In the past few months, the project has added two new committers from two
    different organisations, in recognition of their significant contributions
    to the project. We will encourage contributions and participation of all
    types, and ensure that contributors are appropriately recognized.
    
    == Community ==
    
    Though Livy is relatively new as a standalone open source project, it has
    already seen promising growth in its community across several organizations:
    Cloudera is the original development sponsor for Livy Microsoft pushed the
    development of the interpreter fixing high availability issues and adding
    additional features.  Hortonworks has contributed the security features to
    Livy allowing kerberos and impersonation to work with Spark IBM is starting
    to make contributions to the Livy project A number of other patches
    contributed by community members
    
    Livy currently relies on Google Groups for mailing lists. These lists have
    been active since the end of 2015/start of 2016. Currently, Livy’s user
    mailing list has 173 subscribers and has hosted a total of 227 topic
    threads. Livy’s developer list has 49 subscribers and has hosted 79 topic
    threads.
    
    == Core Developers ==
    
    The early contributions to Livy were made by Cloudera engineers. In 2016,
    engineers from Microsoft and Hortonworks joined the core developer
    community. 
    
    == Alignment ==
    
    Livy is built upon Apache Spark, and other Apache projects like Apache
    Hadoop YARN. It’s used as a building block by Apache Zeppelin.  These
    community connections combined with our focus on development practices that
    emphasize community engagement with a path to meritocratic recognition
    naturally align us with the ASF.
    
    = Known Risks =
    == Orphaned Products ==
    
    The risk of Livy being abandoned is low because it is supported by three
    major big-data software vendors.  Moreover, Livy is already used to power
    multiple releases of services and products used in production.
    
    == Inexperience with Open Source ==
    
    Several of the initial committers are experienced open source developers,
    several being committers and/or PMC members on other ASF projects (Spark,
    YARN). 
    
    == Homogenous Developers ==
    
    The project already has a diverse developer base. It has contributions from
    3 major organisations (Cloudera, Microsoft and Hortonworks), and is used in
    diverse applications, in diverse settings (On-Prem and Cloud).
    
    == Reliance on salaried Developers ==
    
    The existing contributors to the Livy project have been made by salaried
    engineers from Cloudera, Microsoft and Hortonworks. Since there are three
    major organisations involved, the risk of reliance on a single group of
    salaried developers is mitigated. The Livy user base is diverse, with users
    from across the globe, including users from academic settings. We aim to
    further diversify the Livy user and contributor base.
    
    == Relationships with other Apache projects ==
    
    Livy is closely tied to the Apache Spark project and currently addresses the
    scenarios for a REST based batch and interactive gateway for Spark jobs on
    YARN. Given the growing number of integrations with Livy, keeping it outside
    of Apache Spark aligns with the desire of the Apache Spark community to
    reduce the number of external dependencies in the Spark project.
    Specifically, the Apache Spark community has previously expressed a desire
    to keep job servers independent from the project.<<FootNote(See, for
    example, discussion of the Ooyala Spark Job Server in SPARK-818)>>
    Furthermore, while Livy common usage is closely tied to Spark deployments
    right now, its core building blocks can be reused elsewhere.  Livy’s Remote
    REPL could be used as a library for interactive scenarios in non-Spark
    projects. In the future, integrations with cluster managers like Apache
    Mesos and others could also be added.
    
    The features provided by Livy have already been integrated with existing
    projects like Jupyter and Apache Zeppelin for their interactive Spark use
    cases. This validates the need for a project like Livy and provides an
    active downstream user base that the Livy community can interact with to
    seed future interest in the project.
    
    Livy serves a similar purpose to Apache Toree (incubating) but differs in
    making session management, security and impersonation a focal design point.
    
    == An Excessive Fascination with the Apache Brand ==
    
    The primary motivation for submitting Livy to the ASF is to grow a diverse
    and strong community. We wish to encourage diverse organisations, including
    ISVs, to adopt Livy and contribute to Livy without any concerns about
    ownership or licensing.
    
    = Documentation =
    
    Documentation can be found on the Livy website http://livy.io/ The Livy web
    site is version controlled on the ‘gh-pages’ branch of the above repository
    Additional documentation is provided on the github wiki:
    https://github.com/cloudera/livy/wiki APis are documented within the source
    code as JavaDoc style documentation comments. 
    
    = Initial Source =
    
    The initial source code for Livy is hosted at
    
    https://github.com/cloudera/livy 
    
    = Source and Intellectual Property submission plan =
    
    The Livy codebase and web site is currently hosted on GitHub and will be
    transitioned to the ASF repositories during incubation. Livy is already
    licensed under the Apache 2.0 license. Cloudera has collected ICLAs and
    CCLAs from all committers.  There are, however, some contributions recently
    from authors that have not signed the CCLA and ICLA. If necessary for a
    successful SGA, we’ll seek the necessary documentation or replace the
    contributions.
    
    The “Livy” name is not a registered trademark. We will need to do a
    trademark search and make sure it is available for the Apache Foundation
    prior to graduation.
    
    Cloudera currently owns the domain name: http://livy.io/ which will be
    transferred to the ASF and redirected to the official page during
    incubation.
    
    = External Dependencies =
    
    The list below covers the non-Apache dependencies of the project and their
    licenses.
    
     * Jetty: Apache 2.0
     * Dropwizard Metrics: Apache 2.0
     * FasterXML Jackson: Apache 2.0
     * Netty: Apache 2.0
     * Scala: BSD
     * Py4J: BSD
     * Scalatra: BSD
    
    Build/test-only dependencies:
    
     * Mockito: MIT
     * JUnit: Eclipse
    
    = Required Resources =
    == Mailing Lists ==
    
     * private@livy.incubator.apache.org (PPMC)
     * dev@livy.incubator.apache.org (dev mailing list)
     * user@livy.incubator.apache.org (User questions)
     * commits@livy.incubator.apache.org (subscribers shouldn’t be able to post)
     * issues@livy.incubator.apache.org (subscribers shouldn’t be able to post)
    
    == Git Repository ==
    
    git://git.apache.org/livy
    
    == Issue Tracking ==
    
    We would like to import our current JIRA project into the ASF JIRA, such
    that our historical commit message and code comments continue to reference
    the appropriate bug numbers.
    
    = Initial Committers =
    
     * Marcelo Vanzin (vanzin@cloudera.com)
     * Alex Man (alex@alexman.space)
     * Jeff Zhang (zjffdu@gmail.com)
     * Saisai Shao (sshao@hortonworks.com)
     * Kostas Sakellis (kostas@cloudera.com)
    
    = Affiliations =
    
    The initial set of committers includes people employed by Cloudera and
    Hortonworks as well as one person currently unaffiliated with an employer.
    
    = Additional Interested Contributors =
    
    Those interested in getting involved with the project as we enter incubation
    are encourage to list themselves here.
    
     * < add here >
    
    = Sponsors =
    == Champion ==
    
     * Sean Busbey (busbey@apache.org)
    
    == Nominated Mentors ==
    
     * Bikas Saha (bikas@apache.org)
     * Brock Noland (brock@phdata.io)
    
    == Sponsoring Entity ==
    
    We ask that the Incubator PMC sponsor this proposal.
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
    For additional commands, e-mail: general-help@incubator.apache.org
    
    


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org
Mime
View raw message