incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <larsgeo...@apache.org>
Subject Re: [VOTE] Accept Wayang into the Apache Incubator
Date Sat, 12 Dec 2020 12:18:39 GMT
+1 binding

On Sat, Dec 12, 2020 at 2:24 AM Sheng Wu <wu.sheng.841108@gmail.com> wrote:

> +1 binding
>
> Sheng Wu 吴晟
> Twitter, wusheng1108
>
>
> Byung-Gon Chun <bgchun@gmail.com> 于2020年12月12日周六 上午5:59写道:
>
> > +1 (binding)
> >
> > -Gon
> >
> > On Sat, Dec 12, 2020 at 2:35 AM Furkan KAMACI <furkankamaci@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > +1 (binding)
> > >
> > > Kind Regards,
> > > Furkan KAMACI
> > >
> > > On 11 Dec 2020 Fri at 20:04 Daniel B. Widdis <widdis@gmail.com> wrote:
> > >
> > > > +1 (non-binding).  I'm interested in getting involved in this
> project!
> > > >
> > > > On Fri, Dec 11, 2020 at 8:33 AM Christofer Dutz <
> > > christofer.dutz@c-ware.de
> > > > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > following up the [DISCUSS] thread on Wayang (
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E
> > > > )
> > > > > I would like to call a VOTE to accept Wayang Aka Rheem into the
> > Apache
> > > > > Incubator.
> > > > >
> > > > > Please cast your vote:
> > > > >
> > > > >   [ ] +1, bring Wayang into the Incubator
> > > > >   [ ] +0, I don't care either way
> > > > >   [ ] -1, do not bring Wayang into the Incubator, because...
> > > > >
> > > > > The vote will open at least for 72 hours and only votes from the
> > > > Incubator
> > > > > PMC are binding, but votes from everyone are welcome.
> > > > >
> > > > > Chris
> > > > >
> > > > > -----
> > > > >
> > > > > Wayang Proposal (
> > > > >
> https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal
> > )
> > > > >
> > > > > == Abstract ==
> > > > >
> > > > > Wayang is a cross-platform data processing system that aims at
> > > decoupling
> > > > > the business logic of data analytics applications from concrete
> data
> > > > > processing platforms, such as Apache Flink or Apache Spark. Hence,
> it
> > > > tames
> > > > > the complexity that arises from the "Cambrian explosion" of novel
> > data
> > > > > processing platforms that we currently witness.
> > > > >
> > > > > Note that Wayang project is the Rheem project, but we have renamed
> > the
> > > > > project because of trademark issues.
> > > > >
> > > > > You can find the project web page at:
> > > https://rheem-ecosystem.github.io/
> > > > >
> > > > > = Proposal =
> > > > >
> > > > > Wayang is a cross-platform system that provides an abstraction over
> > > data
> > > > > processing platforms to free users from the burdens of (i)
> performing
> > > > > tedious and costly data migration and integration tasks to run
> their
> > > > > applications, and (ii) choosing the right data processing platforms
> > for
> > > > > their applications. To achieve this, Wayang: (1) provides an
> > > abstraction
> > > > on
> > > > > top of existing data processing platforms that allows users to
> > specify
> > > > > their data analytics tasks in a form of a DAG of operators; (2)
> comes
> > > > with
> > > > > a cross-platform optimizer for automating the selection of
> > > > > suitable/efficient platforms; and (3) and finally takes care of
> > > executing
> > > > > the optimized plan, including communication across platforms. In
> > > summary,
> > > > > Wayang has the following salient features:
> > > > >
> > > > > - Flexible Data Model - It considers a flexible and simple data
> model
> > > > > based on data quanta. A data quantum is an atomic processing unit
> in
> > > the
> > > > > system, that can represent a large spectrum of data formats, such
> as
> > > data
> > > > > points for a machine learning application, tuples for a database
> > > > > application, or RDF triples. Hence, Wayang is able to express a
> wide
> > > > range
> > > > > of data analytics tasks.
> > > > > - Platform independence - It provides a simple interface (currently
> > > Java
> > > > > and Scala) that is inspired by established programming models, such
> > as
> > > > that
> > > > > of Apache Spark and Apache Flink. Users represent their data
> analytic
> > > > tasks
> > > > > as a DAG (Wayang plan), where vertices correspond to Wayang
> operators
> > > and
> > > > > edges represent data flows (data quanta flowing) among these
> > > operators. A
> > > > > Wayang operator defines a particular kind of data transformation
> over
> > > an
> > > > > input data quantum, ranging from basic functionality (e.g.,
> > > > > transformations, filters, joins) to complex, extensible tasks
> (e.g.,
> > > > > PageRank).
> > > > > - Cross-platform execution - Besides running a data analytic task
> on
> > > any
> > > > > data processing platform, it also comes with an optimizer that can
> > > decide
> > > > > to execute a single data analytic task using multiple data
> processing
> > > > > platforms. This allows for exploiting the capabilities of different
> > > data
> > > > > processing platforms to perform complex data analytic tasks more
> > > > > efficiently.
> > > > > Self-tuning UDF-based cost model - Its optimizer uses a cost model
> > > fully
> > > > > based on UDFs. This not only enables Wayang to learn the cost
> > functions
> > > > of
> > > > > newly added data processing platforms, but also allows developers
> to
> > > tune
> > > > > the optimizer at will.
> > > > > - Extensibility - It treats data processing platforms as plugins
to
> > > allow
> > > > > users (developers) to easily incorporate new data processing
> > platforms
> > > > into
> > > > > the system. This is achieved by exposing the functionalities of
> data
> > > > > processing platforms as operators (execution operators). The same
> > > > approach
> > > > > is followed at the Wayang interface, where users can also extend
> > Wayang
> > > > > capabilities, i.e., the operators, easily.
> > > > >
> > > > > We plan to work on the stability of all these features as well as
> > > > > extending Wayang with more advanced features. Furthermore, Wayang
> > > > currently
> > > > > supports Apache Spark, Standalone Java, GraphChi, relational
> > databases
> > > > (via
> > > > > JDBC). We plan to incorporate more data processing platforms, such
> as
> > > > > Apache Flink and Apache Hive.
> > > > >
> > > > > === Background ===
> > > > >
> > > > > Many organizations and companies collect or produce large variety
> of
> > > data
> > > > > to apply data analytics over them. This is because insights from
> data
> > > > > rapidly allow them to make better decisions. Thus, the pursuit for
> > > > > efficient and scalable data analytics as well as the
> > > > > one-size-does-not-fit-all philosophy has given rise to a plethora
> of
> > > data
> > > > > processing platforms. Examples of these specialized processing
> > > platforms
> > > > > range from DBMSs to MapReduce-like platforms.
> > > > >
> > > > > However, today's data analytics are moving beyond the limits of a
> > > single
> > > > > data processing platform. More and more applications need to
> perform
> > > > > complex data analytics over several data processing platforms. For
> > > > example,
> > > > > IBM reported that North York hospital needs to process 50 diverse
> > > > datasets,
> > > > > which are on a dozen different internal systems, (ii) oil & gas
> > > companies
> > > > > stated they need to process large amounts of data they produce
> > > everyday,
> > > > > e.g., a single oil company can produce more than 1.5TB of diverse
> > > > > (structured and unstructured) data per day, (iii) Fortune magazine
> > > stated
> > > > > that airlines need to analyze large datasets, which are produced
by
> > > > > different departments, are of different data formats, and reside
on
> > > > > multiple data sources, to produce global reports for decision
> makers,
> > > and
> > > > > (iv) Hewlett Packard has claimed that, according to its customer
> > > > portfolio,
> > > > > business intelligence typically require a single analytics pipeline
> > > using
> > > > > different processing platforms at different parts of the pipeline.
> > > These
> > > > > are just a few examples of emerging applications that require a
> > > diversity
> > > > > of data processing platforms.
> > > > >
> > > > > Today, developers have to deal with this myriad of data processing
> > > > > platforms. That is, they have to choose the right data processing
> > > > platform
> > > > > for their applications (or data analytic tasks) and to familiarize
> > with
> > > > the
> > > > > intricacies of the different platforms to achieve high efficiency
> and
> > > > > scalability. Several systems have also appeared with the goal of
> > > helping
> > > > > users to easily glue several platforms together, such as Apache
> > Drill,
> > > > > PrestoDB, and Luigi. Nevertheless, all these systems still require
> > > quite
> > > > > good expertise from users to decide which data processing platforms
> > to
> > > > use
> > > > > for the data analytic task at hand. In consequence, great
> engineering
> > > > > effort is required to unify the data from various sources, to
> combine
> > > the
> > > > > processing capabilities of different platforms, and to maintain
> those
> > > > > applications, so as to unleash the full potential of the data. In
> the
> > > > worst
> > > > > case, such applications are not built in the first place, as it
> seems
> > > too
> > > > > much of a daunting endeavor.
> > > > >
> > > > > === Rationale ===
> > > > >
> > > > > It is evident that there is an urgent need to release developers
> from
> > > the
> > > > > burden of knowing all the intricacies of choosing and glueing
> > together
> > > > data
> > > > > processing platforms for supporting their applications (data
> analytic
> > > > > tasks). Developers must focus only on the logics of their
> > applications.
> > > > > Surprisingly, there is no open source system trying to satisfy this
> > > > urgent
> > > > > need. Wayang aims at filling this gap. It copes with this urgent
> need
> > > by
> > > > > providing both a common interface over data processing platforms
> and
> > an
> > > > > optimizer to execute data analytic tasks on the right data
> processing
> > > > > platform(s) seamlessly. As Apache is the place where most of the
> > > > important
> > > > > big data systems are, we then consider Apache as the right place
> for
> > > > Wayang.
> > > > >
> > > > > === Current Status ===
> > > > >
> > > > > The current version of Wayang (v0.5.0) was initially co-developed
> by
> > > > > staff, students, and interns at the Qatar Computing Research
> > Institute
> > > > > (QCRI) and the Hasso-Plattner Institute (HPI). The project was
> > > initiated
> > > > at
> > > > > and sponsored by QCRI in 2015 with the goal of freeing data
> > scientists
> > > > and
> > > > > developers from the intricacies of data processing platforms to
> > support
> > > > > their analytic tasks. The first open source release of Wayang was
> > made
> > > > only
> > > > > one year and a half later, in June 13th of 2016, under the Apache
> > > > Software
> > > > > License 2.0. Since we have made several releases, the latest
> release
> > > was
> > > > > done on January 23th, 2019.
> > > > >
> > > > > ** Meritocracy **
> > > > >
> > > > > All current Wayang developers are familiar with this development
> > > process
> > > > > at Apache and are currently trying to follow this meritocracy
> process
> > > as
> > > > > much as possible. For example, Wayang already follows a committer
> > > > principle
> > > > > where any pull request is analyzed by at least one Wayang core
> > > developer.
> > > > > This was one of the reasons for choosing Apache for Wayang as we
> all
> > > want
> > > > > to encourage and keep this style of development for Wayang.
> > > > >
> > > > > ** Community **
> > > > >
> > > > > Wayang started as a pure research project, but it quickly started
> > > > > developing into a community. People from HPI quickly joined our
> > efforts
> > > > > almost from the very beginning to make this project a reality.
> > > Recently,
> > > > > the Berlin Institute of Technology (TU Berlin) and the Pontifical
> > > > Catholic
> > > > > University of Valparaiso (PUCV) in Chile have also joined our
> efforts
> > > for
> > > > > developing Wayang. A company, called Scalytics, has been created
> > around
> > > > > Wayang. Currently, we are intensively seeking to further develop
> both
> > > > > developer and user communities. To keep broadening the community,
> we
> > > plan
> > > > > to also exploit our ongoing academic collaborations with multiple
> > > > > universities in Berlin and companies that we collaborate with. For
> > > > > instance, Wayang is already being utilized for accessing multiple
> > data
> > > > > sources in the context of a large data analytics project led by TU
> > > Berlin
> > > > > and Huawei. We also believe that Wayang's extensible architecture
> > > (i.e.,
> > > > > adding new operators and platforms) will further encourage
> community
> > > > > participation. During incubation we plan to have Wayang adopted by
> at
> > > > least
> > > > > one company and will explicitly seek more industrial participation.
> > > > >
> > > > > ** Core Developers **
> > > > >
> > > > > The initial developers of the project are diverse, they are from
> four
> > > > > different institutions (TU Berlin, Scalytics, PUCV, and HBKU). We
> > will
> > > > work
> > > > > aggressively to grow the community during the incubation by
> > recruiting
> > > > more
> > > > > developers from other institutions.
> > > > >
> > > > > ** Alignment **
> > > > >
> > > > > We believe Apache is the most natural home for taking Wayang to the
> > > next
> > > > > level. Apache is currently hosting the most important big data
> > systems.
> > > > > Hadoop, Spark, Flink, HBase, Hive, Tez, Reef, Storm, Drill, and
> > Ignite
> > > > are
> > > > > just some examples of these technologies. Wayang fills a
> significant
> > > gap
> > > > -
> > > > > it provides a common abstraction for all these platforms and
> decides
> > on
> > > > > which platforms to run a single data analytic task - that exist in
> > the
> > > > big
> > > > > data open source world. Wayang is now being developed following the
> > > > > Apache-style development model. Also, it is well-aligned with the
> > > Apache
> > > > > principle of building a community to impact the big data community.
> > > > >
> > > > > === Known Risks ===
> > > > >
> > > > > ** Orphaned Products **
> > > > >
> > > > > Currently, Wayang is the core technology behind Scalytics inc.. As
> a
> > > > > result, a team of two engineers are working on a full time basis
on
> > > this
> > > > > project. Recently, three more developers have joined our efforts
in
> > > > > building Wayang. Thus, the risk of Wayang becoming orphaned is
> > > relatively
> > > > > very low. Still, people outside Scalytics (from TU Berlin and HBKU)
> > > have
> > > > > also joined the project, which makes the risk of abandoning the
> > project
> > > > > even lower. The PUCV in Chile is also beginning to contribute to
> the
> > > code
> > > > > base and to develop a declarative query language on top of Wayang.
> > The
> > > > > project is constantly being monitored by email and frequent Skype
> > > > meetings
> > > > > as well as by weekly meetings with Scalytics people. Additionally,
> at
> > > the
> > > > > end of each year, we meet to discuss the status of the project as
> > well
> > > as
> > > > > to plan the most important aspects we should work on during the
> year
> > > > after.
> > > > >
> > > > > ** Inexperience with Open Source **
> > > > >
> > > > > Wayang quickly started being developed in open source under the
> > Apache
> > > > > Software License 2.0. The source code is available on Github. Also
> > few
> > > of
> > > > > the initial committers have contributed to other open source
> > projects:
> > > > > Hadoop and Flume
> > > > >
> > > > > ** Homogeneous Developers **
> > > > >
> > > > > The initial committers are already geographically distributed among
> > > > Chile,
> > > > > Germany, and Qatar. During incubation, one of our main goals is to
> > > > increase
> > > > > the heterogeneity of the current community and we will work hard
to
> > > > achieve
> > > > > it.
> > > > >
> > > > > ** Reliance on salaried developers **
> > > > >
> > > > > Wayang is already being developed by a mix of full time and
> volunteer
> > > > > time. Only 2 of the initial committers are working full time on
> this
> > > > > project (Scalytics). So, we are confident that the project will not
> > > > > decrease its development pace. Furthermore, we are committed to
> > recruit
> > > > > additional committers to significantly increase the development
> pace
> > of
> > > > the
> > > > > project.
> > > > >
> > > > > ** Relationships with other Apache products **
> > > > >
> > > > > Wayang is somehow related to Apache Spark as its developing
> interface
> > > is
> > > > > inspired from Spark. In contrast to Spark, Wayang is not a data
> > > > processing
> > > > > platform, but a mediator between user applications and data
> > processing
> > > > > platforms. In this sense, Wayang is similar to the Apache Drill
> > > project,
> > > > > and Apache Beam. However, Wayang significantly differs from Apache
> > > Drill
> > > > in
> > > > > two main aspects. First, Apache Drill provides only a common
> > interface
> > > to
> > > > > query multiple data storages and hence users have to specify in
> their
> > > > query
> > > > > the data to fetch. Then, Apache Drill translates the query to the
> > > > > processing platforms where the data is stored, e.g. into mongoDB
> > query
> > > > > representation. In contrast, in Wayang, users only specify the data
> > > path
> > > > > and Wayang decides which are the best (performance-wise) data
> > > processing
> > > > > platforms to use to process such data. Second, the query interface
> in
> > > > > Apache Drill is SQL. Wayang uses an interface based on operators
> > > forming
> > > > > DAGs. In this latter point, we are currently developing a
> > PIGLatin-like
> > > > > query language for Wayang. In addition, in contrast to Apache Beam,
> > > > Wayang
> > > > > not only allows users to use multiple data processing platforms at
> > the
> > > > same
> > > > > time, but also it provides an optimizer to choose the most
> efficient
> > > > > platform for the task at hand. In Apache Beam, users have to
> specify
> > an
> > > > > appropriate runner (platform).
> > > > > Given these similarities with the two Apache projects mentioned
> > above,
> > > we
> > > > > are looking forward to collaborating with those communities. Still,
> > we
> > > > are
> > > > > open and would also love to collaborate with other Apache
> communities
> > > as
> > > > > well.
> > > > > ** An excessive fascination with the Apache Brand **
> > > > >
> > > > > Wayang solves a real problem that currently users and developers
> have
> > > to
> > > > > deal with at a high cost: monetary cost, high design and
> development
> > > > > efforts, and very time consuming. Therefore, we believe that Wayang
> > can
> > > > be
> > > > > successful in building a large community around it. We are
> convinced
> > > that
> > > > > the Apache brand and community process will significantly help us
> in
> > > > > building such a community and to establish the project in the
> > > long-term.
> > > > We
> > > > > simply believe that ASF is the right home for Wayang to achieve
> this.
> > > > >
> > > > > === Documentation ===
> > > > >
> > > > > Further details, documentation, and publications related to Wayang
> > can
> > > be
> > > > > found at https://docs.rheem.io/rheem/
> > > > >
> > > > > === Initial Source ===
> > > > >
> > > > > The current source code of Wayang resides in Github:
> > > > > https://github.com/rheem-ecosystem/rheem
> > > > >
> > > > > === External Dependencies ===
> > > > >
> > > > > Wayang depends on the following Apache projects:
> > > > >
> > > > > * Maven
> > > > > * HDFS
> > > > > * Hadoop
> > > > > * Spark
> > > > >
> > > > > Wayang depends on the following other open source projects
> organized
> > by
> > > > > license:
> > > > >
> > > > > org.json.json: Json (http://json.org/license.html)
> > > > > SnakeYAML: Apache 2.0
> > > > > Java Unified Expression Language API (Juel): Apache 2.0
> > > > > ProfileDB Instrumentation: Apache 2.0
> > > > > Gson: Apache 2.0
> > > > > Hadoop: Apache 2.0
> > > > > Scala: Apache 2.0
> > > > > Antlr 4: BSD
> > > > > Jackson: Apache 2.0
> > > > > Junit 5: EPL 2.0
> > > > > Mockito: MIT
> > > > > Assertj: Apache 2.0
> > > > > logback-classic: EPL 1.0 LGPL 2.1
> > > > > slf4j: MIT
> > > > > GNU Trove: LGPL 2.1
> > > > > graphchi: Apache 2.0
> > > > > SQLite JDBC: Apache 2.0
> > > > > PostgreSQL: BSD 2-clause
> > > > > jcommander: Apache 2.0
> > > > > Koloboke Collections API: Apache 2.0
> > > > > Snappy Java: Apache 2.0
> > > > > Apache Spark: Apache 2.0
> > > > > HyperSQL Database: BSD Modified (
> > > http://hsqldb.org/web/hsqlLicense.html)
> > > > > Apache Giraph: Apache 2.0
> > > > > Apache Flink: Apache 2.0
> > > > > Apache Commons IO: Apache 2.0
> > > > > Apache Commons Lang: Apache 2.0
> > > > > Apache Maven: Apache 2.0
> > > > >
> > > > > === Cryptography ===
> > > > >
> > > > > (not applicable)
> > > > >
> > > > > === Required Resources ===
> > > > >
> > > > > ** Mailing Lists **
> > > > >
> > > > > * mailto:private@wayang.incubator.apache.org
> > > > > * mailto:dev@wayang.incubator.apache.org
> > > > > * mailto:commits@Wayang.incubator.apache.org
> > > > >
> > > > > ** Git repositories **
> > > > >
> > > > > git://git.apache.org/repos/asf/incubator/wayang
> > > > >
> > > > > ** Issue tracking **
> > > > >
> > > > > https://issues.apache.org/jira/browse/RHEEM
> > > > >
> > > > > === Initial Committers ===
> > > > >
> > > > > The following list gives the planned initial committers (in
> > > alphabetical
> > > > > order):
> > > > >
> > > > > * Bertty Contreras-Rojas <bertty@http://scalytics.io>
> > > > > * Rodrigo Pardo-Meza <rodrigo@http://scalytics.io>
> > > > > * Alexander Alten-Lorenz <alo@http://scalytics.io>
> > > > > * Zoi Kaoudi <zoi.kaoudi@http://tu-berlin.de>
> > > > > * Haralampos Gavriilidis <gavriilidis@http://tu-berlin.de>
> > > > > * Jorge-Arnulfo Quiane-Ruiz <jorge.quiane@http://tu-berlin.de>
> > > > > * Anis Troudi <atroudi@http://hbku.edu.qa>
> > > > > * Wenceslao Palma-Muñoz <wenceslao.palma@http://pucv.cl>
> > > > >
> > > > > ** Affiliations **
> > > > >
> > > > > * Scalytics Inc.
> > > > > ** Bertty Contreras-Rojas
> > > > > ** Rodrigo Pardo-Meza
> > > > > ** Alexander Alten-Lorenz
> > > > > * Berlin Institute of Technology (TU Berlin)
> > > > > ** Zoi Kaoudi
> > > > > ** Haralampos Gavriilidis
> > > > > ** Jorge-Arnulfo Quiane-Ruiz
> > > > > * Hamad Bin Khalifa University (HBKU)
> > > > > ** Anis Troudi
> > > > > * Pontifical Catholic University of Valparaiso, Chile (PUCV)
> > > > > ** Wenceslao Palma-Muñoz
> > > > >
> > > > > === Sponsors ===
> > > > >
> > > > > ** Champion **
> > > > >
> > > > > * Christofer Dutz (christofer.dutz at c-ware dot de)
> > > > >
> > > > > ** Mentors **
> > > > >
> > > > > . (cdutz) Christofer Dutz
> > > > > . (larsgeorge) Lars George
> > > > > . (berndf) Fondermann
> > > > > . (jbonofre) Jean-Baptiste Onofré
> > > > >
> > > > > ** Sponsoring Entity **
> > > > >
> > > > > The Apache Incubator
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > > > > For additional commands, e-mail: general-help@incubator.apache.org
> > > > >
> > > > >
> > > >
> > > > --
> > > > Dan Widdis
> > > >
> > >
> >
> >
> > --
> > Byung-Gon Chun
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message