incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Willem Jiang <willem.ji...@gmail.com>
Subject Re: [VOTE] Accept Crail into the Apache Incubator
Date Mon, 30 Oct 2017 02:55:39 GMT
+1 (binding)


Willem Jiang

Blog: http://willemjiang.blogspot.com (English)
          http://jnn.iteye.com  (Chinese)
Twitter: willemjiang
Weibo: 姜宁willem

On Sat, Oct 28, 2017 at 2:12 AM, Pierre Smits <pierre.smits@gmail.com>
wrote:

> +1
>
> Best regards
>
> Pierre
>
> On Fri, 27 Oct 2017 at 13:57 Raphael Bircher <rbircherapache@gmail.com>
> wrote:
>
> > +1 (binding)
> >
> > Am .10.2017, 18:01 Uhr, schrieb Luciano Resende <luckbr1975@gmail.com>:
> >
> > > Off course, my + 1
> > >
> > > On Thu, Oct 26, 2017 at 12:31 PM, Luciano Resende <
> luckbr1975@gmail.com>
> > > wrote:
> > >
> > >> Now that the discussion thread on the Crail proposal has ended, please
> > >> vote on accepting Crail into into the Apache Incubator.
> > >>
> > >> The ASF voting rules are described at:
> > >>    http://www.apache.org/foundation/voting.html
> > >>
> > >> A vote for accepting a new Apache Incubator podling is a majority vote
> > >> for which only Incubator PMC member votes are binding.
> > >>
> > >> Votes from other people are also welcome as an indication of peoples
> > >> enthusiasm (or lack thereof).
> > >>
> > >> Please do not use this VOTE thread for discussions.
> > >> If needed, start a new thread instead.
> > >>
> > >> This vote will run for at least 72 hours. Please VOTE as follows
> > >> [] +1 Accept Crail into the Apache Incubator
> > >> [] +0 Abstain.
> > >> [] -1 Do not accept Crail into the Apache Incubator because ...
> > >>
> > >> The proposal below is also on the wiki:
> > >> https://wiki.apache.org/incubator/CrailProposal
> > >>
> > >> ===
> > >>
> > >> Abstract
> > >>
> > >> Crail is a storage platform for sharing performance critical data in
> > >> distributed data processing jobs at very high speed. Crail is built
> > >> entirely upon principles of user-level I/O and specifically targets
> data
> > >> center deployments with fast network and storage hardware (e.g.,
> 100Gbps
> > >> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of
> > >> operation
> > >> such resource disaggregation or serverless computing. Crail is written
> > >> in
> > >> Java and integrates seamlessly with the Apache data processing
> > >> ecosystem.
> > >> It can be used as a backbone to accelerate high-level data operations
> > >> such
> > >> as shuffle or broadcast, or as a cache to store hot data that is
> queried
> > >> repeatedly, or as a storage platform for sharing inter-job data in
> > >> complex
> > >> multi-job pipelines, etc.
> > >>
> > >> Proposal
> > >>
> > >> Crail enables Apache data processing frameworks to run efficiently in
> > >> next
> > >> generation data centers using fast storage and network hardware in
> > >> combination with resource (e.g., DRAM, Flash) disaggregation.
> > >>
> > >> Background
> > >>
> > >> Crail started as a research project at the IBM Zurich Research
> > >> Laboratory
> > >> around 2014 aiming to integrate high-speed I/O hardware effectively
> into
> > >> large scale data processing systems.
> > >>
> > >> Rational
> > >>
> > >> During the last decade, I/O hardware has undergone rapid performance
> > >> improvements, typically in the order of magnitudes. Modern day
> > >> networking
> > >> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a
> > >> few
> > >> microseconds of access latencies. However, despite such progress in
> raw
> > >> I/O
> > >> performance, effectively leveraging modern hardware in data processing
> > >> frameworks remains challenging. In most of the cases, upgrading to
> > >> high-end
> > >> networking or storage hardware has very little effect on the
> > >> performance of
> > >> analytics workloads. The problem comes from heavily layered software
> > >> imposing overheads such as deep call stacks, unnecessary data copies,
> > >> thread contention, etc. These problems have already been addressed at
> > >> the
> > >> operating system level with new I/O APIs such as RDMA verbs, NVMe,
> etc.,
> > >> allowing applications to bypass software layers during I/O operations.
> > >> Distributed data processing frameworks on the other hand, are
> typically
> > >> implemented on legacy I/O interfaces such as such as sockets or block
> > >> storage. These interfaces have been shown to be insufficient to
> deliver
> > >> the
> > >> full hardware performance. Yet, to the best of our knowledge, there
> are
> > >> no
> > >> active and systematic efforts to integrate these new user level I/O
> APIs
> > >> into Apache software frameworks. This problem affects all end-users
> and
> > >> organizations that use Apache software. We expect them to see
> > >> unsatisfactory small performance gains when upgrading their networking
> > >> and
> > >> storage hardware.
> > >>
> > >> Crail solves this problem by providing an efficient storage platform
> > >> built
> > >> upon user-level I/O, thus, bypassing layers such as JVM and OS during
> > >> I/O
> > >> operations. Moreover, Crail directly leverages the specific hardware
> > >> features of RDMA and NVMe to provide a better integration with
> > >> high-level
> > >> data operations in Apache compute frameworks. As a consequence, Crail
> > >> enables users to run larger, more complex queries against ever
> > >> increasing
> > >> amounts of data at a speed largely determined by the deployed
> hardware.
> > >> Crail is generic solution that integrates well with the Apache
> ecosystem
> > >> including frameworks like Spark, Hadoop, Hive, etc.
> > >>
> > >> Initial Goals
> > >>
> > >> The initial goals to move Crail to the Apache Incubator is to broaden
> > >> the
> > >> community, and foster contributions from developers to leverage Crail
> in
> > >> various data processing frameworks and workloads. Ultimately, the goal
> > >> for
> > >> Crail is to become the de-facto standard platform for storing
> temporary
> > >> performance critical data in distributed data processing systems.
> > >>
> > >> Current Status
> > >>
> > >> The initial code has been developed at the IBM Zurich Research Center
> > >> and
> > >> has recently been made available in GitHub under the Apache Software
> > >> License 2.0. The Project currently has explicit support for Spark and
> > >> Hadoop. Project documentation is available on the website
> www.crail.io.
> > >> There is also a public forum for discussions related to Crail
> available
> > >> at
> > >> https://groups.google.com/forum/#!forum/zrlio-users.
> > >>
> > >> Mericrotacy
> > >>
> > >> The current developers are familiar with the meritocratic open source
> > >> development process at Apache. Over the last year, the project has
> > >> gathered
> > >> interest at GitHub and several companies have already expressed
> > >> interest in
> > >> the project. We plan to invest in supporting a meritocracy by inviting
> > >> additional developers to participate.
> > >>
> > >> Community
> > >>
> > >> The need for a generic solution to integrate high-performance I/O
> > >> hardware
> > >> in the open source is tremendous, so there is a potential for a very
> > >> large
> > >> community. We believe that Crail’s extensible architecture and its
> > >> alignment with the Apache Ecosystem will further encourage community
> > >> participation. We expect that over time Crail will attract a large
> > >> community.
> > >>
> > >> Alignment
> > >>
> > >> Crail is written in Java and is built for the Apache data processing
> > >> ecosystem. The basic storage services of Crail can be used seamlessly
> > >> from
> > >> Spark, Hadoop, Storm. The enhanced storage services require dedicated
> > >> data
> > >> processing specific binding, which currently are available only for
> > >> Spark.
> > >> We think that moving Crail to the Apache incubator will help to extend
> > >> Crail’s support for different data processing frameworks.
> > >>
> > >> Known Risks
> > >>
> > >> To-date, development has been sponsored by IBM and coordinated mostly
> by
> > >> the core team of researchers at the IBM Zurich Research Center. For
> > >> Crail
> > >> to fully transition to an "Apache Way" governance model, it needs to
> > >> start
> > >> embracing the meritocracy-centric way of growing the community of
> > >> contributors.
> > >>
> > >> Orphaned Products
> > >>
> > >> The Crail developers have a long-term interest in use and maintenance
> of
> > >> the code and there is also hope that growing a diverse community
> around
> > >> the
> > >> project will become a guarantee against the project becoming orphaned.
> > >> We
> > >> feel that it is also important to put formal governance in place both
> > >> for
> > >> the project and the contributors as the project expands. We feel ASF
> is
> > >> the
> > >> best location for this.
> > >>
> > >> Inexperience with Open Source
> > >>
> > >> Several of the initial committers are experienced open source
> developers
> > >> (Linux Kernel, DPDK, etc.).
> > >>
> > >> Relationships with Other Apache Products
> > >>
> > >> As of now, Crail has been tested with Spark, Hadoop and Hive, but it
> is
> > >> designed to integrate with any of the Apache data processing
> frameworks.
> > >>
> > >> Homogeneous Developers
> > >>
> > >> The project already has a diverse developer base including
> contributions
> > >> from organizations and public developers.
> > >>
> > >> An Excessive Fascination with the Apache Brand
> > >>
> > >> Crail solves a real need for a generic approach to leverage modern
> > >> network
> > >> and storage hardware effectively in the Apache Hadoop and Spark
> > >> ecosystems.
> > >> Our rationale for developing Crail as an Apache project is detailed in
> > >> the
> > >> Rationale section. We believe that the Apache brand and community
> > >> process
> > >> will help to us to engage a larger community and facilitate closer
> ties
> > >> with various Apache data processing projects.
> > >>
> > >> Documentation
> > >>
> > >> Documentation regarding Crail is available at www.crail.io
> > >>
> > >> Initial Source
> > >>
> > >> Initial source is available on GitHub under the Apache License 2.0:
> > >>
> > >> https://github.com/zrlio/crail
> > >> External Dependencies
> > >>
> > >> Crail is written in Java and currently supports Apache Hadoop
> MapReduce
> > >> and Apache Spark runtimes. To the best of our knowledge, all
> > >> dependencies
> > >> of Crail are distributed under Apache compatible licenses.
> > >>
> > >> Required Resource
> > >>
> > >> Mailing lists
> > >>
> > >> private@crail.incubator.apache.org
> > >> dev@crail.incubator.apache.org
> > >> commits@crail.incubator.apache.org
> > >> Git repository
> > >>
> > >> https://git-wip-us.apache.org/repos/asf/incubator-crail.git
> > >> Issue Tracking
> > >>
> > >> JIRA (Crail)
> > >> Initial Committers
> > >>
> > >> Patrick Stuedi <stu AT ibm DOT zurich DOT com>
> > >> Animesh Trivedi <atr AT ibm DOT zurich DOT com>
> > >> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
> > >> Bernard Metzler <bmt AT ibm DOT zurich DOT com>
> > >> Michael Kaufmann <kau AT ibm DOT zurich DOT com>
> > >> Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
> > >> Patrick McArthur <patrick AT patrickmcarthur DOT net>
> > >> Ana Klimovic <anakli AT stanford DOT edu>
> > >> Yuval Degani <yuvaldeg AT mellanox DOT com>
> > >> Vu Pham <vuhuong AT mellanox DOT com>
> > >> Affiliations
> > >>
> > >> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard
> Metzler,
> > >> Michael Kaufmann, Adrian Schuepbach)
> > >> University of New Hampshire (Patrick McArthur)
> > >> Stanford University (Ana Klimovic)
> > >> Mellanox (Yuval Degani, Vu Pham)
> > >> Sponsors
> > >>
> > >> Champion
> > >>
> > >> Luciano Resende <lresende AT apache DOT org>
> > >>
> > >> Nominated Mentors
> > >>
> > >> Luciano Resende <lresende AT apache DOT org>
> > >>
> > >> Raphael Bircher <rbircher AT apache DOT org>
> > >>
> > >> Julian Hyde <jhyde AT apache DOT org>
> > >>
> > >> Sponsoring Entity
> > >>
> > >> We would like to propose the Apache Incubator to sponsor this project.
> > >>
> > >>
> > >> --
> > >> Luciano Resende
> > >> http://twitter.com/lresende1975
> > >> http://lresende.blogspot.com/
> > >>
> > >
> > >
> > >
> >
> >
> > --
> > My introduction https://youtu.be/Ln4vly5sxYU
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
> > --
> Pierre Smits
>
> ORRTIZ.COM <http://www.orrtiz.com>
> OFBiz based solutions & services
>
> OFBiz Extensions Marketplace
> http://oem.ofbizci.net/oci-2/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message