incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Byung-Gon Chun <>
Subject Re: [RESULT] [VOTE] Accept Coral into the Apache Incubator
Date Wed, 21 Feb 2018 04:13:20 GMT

There's a name conflict with a project hosted at Mozilla foundation.
The new project name is Nemo. It's approved by Podling Suitable Name Search.

Accordingly, I renamed the page CoralProposal to NemoProposal.

I'll continue the on-boarding process.


On Mon, Feb 5, 2018 at 6:10 AM, Byung-Gon Chun <> wrote:

> Hi,
> 72 hours has passed and the vote for accepting Coral into the Apache
> Incubator has passed with:
> 9 binding "+1" votes,  1 non-binding "+1" votes,  and no "-1” votes.
> Binding votes:
> Kevin A. McGrail
> Davor Bonaci
> Dave Fisher
> Hyunsik Choi
> Leif Hedstrom
> Jean-Baptiste Onofré
> Romain Manni-Bucau
> Mark Struberg
> Byung-Gon Chun
> Non-binding votes:
> Clebert Suconic
> Thanks to everyone who voted.
> On Thu, Feb 1, 2018 at 11:07 PM, Byung-Gon Chun <> wrote:
>> Hi all,
>> I would like to start a VOTE to propose the Coral project as a podling
>> into the Apache Incubator.
>> The ASF voting rules are described at
>> voting.html
>> A vote for accepting a new Apache Incubator podling is a majority vote
>> for which only Incubator PMC member votes are binding.
>> This vote will run for at least 72 hours. Please VOTE as follows.
>> [] +1 Accept Coral into the Apache Incubator
>> [] +0 Abstain
>> [] -1 Do not accept Coral into the Apache Incubator because ...
>> The proposal is listed below, but you can also access it on the wiki:
>> = CoralProposal =
>> == Abstract ==
>> Coral is a data processing system for flexible employment with different execution
scenarios for various deployment characteristics on clusters.
>> == Proposal ==
>> Today, there is a wide variety of data processing systems with different designs
for better performance and datacenter efficiency. They include processing data on specific
resource environments and running jobs with specific attributes. Although each system successfully
solves the problems it targets, most systems are designed in the way that runtime behaviors
are built tightly inside the system core to hide the complexity of distributed computing.
This makes it hard for a single system to support different deployment characteristics with
different runtime behaviors without substantial effort.
>> Coral is a data processing system that aims to flexibly control the runtime behaviors
of a job to adapt to varying deployment characteristics. Moreover, it provides a means of
extending the system’s capabilities and incorporating the extensions to the flexible job
>> In order to be able to easily modify runtime behaviors to adapt to varying deployment
characteristics, Coral exposes runtime behaviors to be flexibly configured and modified at
both compile-time and runtime through a set of high-level graph pass interfaces.
>> We hope to contribute to the big data processing community by enabling more flexibility
and extensibility in job executions. Furthermore, we can benefit more together as a community
when we work together as a community to mature the system with more use cases and understanding
of diverse deployment characteristics. The Apache Software Foundation is the perfect place
to achieve these aspirations.
>> == Background ==
>> Many data processing systems have distinctive runtime behaviors optimized and configured
for specific deployment characteristics like different resource environments and for handling
special job attributes.
>> For example, much research have been conducted to overcome the challenge of running
data processing jobs on cheap, unreliable transient resources. Likewise, techniques for disaggregating
different types of resources, like memory, CPU and GPU, are being actively developed to use
datacenter resources more efficiently. Many researchers are also working to run data processing
jobs in even more diverse environments, such as across distant datacenters. Similarly, for
special job attributes, many works take different approaches, such as runtime optimization,
to solve problems like data skew, and to optimize systems for data processing jobs with small-scale
input data.
>> Although each of the systems performs well with the jobs and in the environments
they target, they perform poorly with unconsidered cases, and do not consider supporting multiple
deployment characteristics on a single system in their designs.
>> For an application writer to optimize an application to perform well on a certain
system engraved with its underlying behaviors, it requires a deep understanding of the system
itself, which is an overhead that often requires a lot of time and effort. Moreover, for a
developer to modify such system behaviors, it requires modifications of the system core, which
requires an even deeper understanding of the system itself.
>> With this background, Coral is designed to represent all of its jobs as an Intermediate
Representation (IR) DAG. In the Coral compiler, user applications from various programming
models (ex. Apache Beam) are submitted, transformed to an IR DAG, and optimized/customized
for the deployment characteristics. In the IR DAG optimization phase, the DAG is modified
through a series of compiler “passes” which reshape or annotate the DAG with an expression
of the underlying runtime behaviors. The IR DAG is then submitted as an execution plan for
the Coral runtime. The runtime includes the unmodified parts of data processing in the backbone
which is transparently integrated with configurable components exposed for further extension.
>> == Rationale ==
>> Coral’s vision lies in providing means for flexibly supporting a wide variety of
job execution scenarios for users while facilitating system developers to extend the execution
framework with various functionalities at the same time. The capabilities of the system can
be extended as it grows to meet a more variety of execution scenarios. We require inputs from
users and developers from diverse domains in order to make it a more thriving and useful project.
The Apache Software Foundation provides the best tools and community to support this vision.
>> == Initial Goals ==
>> Initial goals will be to move the existing codebase to Apache and integrate with
the Apache development process. We further plan to develop our system to meet the needs for
more execution scenarios for a more variety of deployment characteristics.
>> == Current Status ==
>> Coral codebase is currently hosted in a repository at The current version
has been developed by system developers at Seoul National University, Viva Republica, Samsung,
and LG.
>> == Meritocracy ==
>> We plan to strongly support meritocracy. We will discuss the requirements in an open
forum, and those that continuously contribute to Coral with the passion to strengthen the
system will be invited as committers. Contributors that enrich Coral by providing various
use cases, various implementations of the configurable components including ideas for optimization
techniques will be especially welcome. Committers with a deep understanding of the system’s
technical aspects as a whole and its philosophy will definitely be voted as the PMC. We will
monitor community participation so that privileges can be extended to those that contribute.
>> == Community ==
>> We hope to expand our contribution community by becoming an Apache incubator project.
The contributions will come from both users and system developers interested in flexibility
and extensibility of job executions that Coral can support. We expect users to mainly contribute
to diversify the use cases and deployment characteristics, and developers to  contribute to
implement them.
>> == Alignment ==
>> Apache Spark is one of many popular data processing frameworks. The system is designed
towards optimizing jobs using RDDs in memory and many other optimizations built tightly within
the framework. In contrast to Spark, Coral aims to provide more flexibility for job execution
in an easy manner.
>> Apache Tez enables developers to build complex task DAGs with control over the control
plane of job execution. In Coral, a high-level programming layer (ex. Apache Beam) is automatically
converted to a basic IR DAG and can be converted to any IR DAG through a series of easy user
writable passes, that can both reshape and modify the annotation (of execution properties)
of the DAG. Moreover, Coral leaves more parts of the job execution configurable, such as the
scheduler and the data plane. As opposed to providing a set of properties for solid optimization,
Coral’s configurable parts can be easily extended and explored by implementing the pre-defined
interfaces. For example, an arbitrary intermediate data store can be added.
>> Coral currently supports Apache Beam programs and we are working on supporting Apache
Spark programs as well. Coral also utilizes Apache REEF for container management, which allows
Coral to run in Apache YARN and Apache Mesos clusters. If necessary, we plan to contribute
to and collaborate with these other Apache projects for the benefit of all. We plan to extend
such integrations with more Apache softwares. Apache software foundation already hosts many
major big-data systems, and we expect to help further growth of the big-data community by
having Coral within the Apache foundation.
>> == Known Risks ==
>> === Orphaned Products ===
>> The risk of the Coral project being orphaned is minimal. There is already plenty
of work that arduously support different deployment characteristics, and we propose a general
way to implement them with flexible and extensible configuration knobs. The domain of data
processing is already of high interest, and this domain is expected to evolve continuously
with various other purposes, such as resource disaggregation and using transient resources
for better datacenter resource utilization.
>> === Inexperience with Open Source ===
>> The initial committers include PMC members and committers of other Apache projects.
They have experience with open source projects, starting from their incubation to the top-level.
They have been involved in the open source development process, and are familiar with releasing
code under an open source license.
>> === Homogeneous Developers ===
>> The initial set of committers is from a limited set of organizations, but we expect
to attract new contributors from diverse organizations and will thus grow organically once
approved for incubation. Our prior experience with other open source projects will help various
contributors to actively participate in our project.
>> === Reliance on Salaried Developers ===
>> Many developers are from Seoul National University. This is not applicable.
>> === Relationships with Other Apache Products ===
>> Coral positions itself among multiple Apache products. It runs on Apache REEF for
container management. It also utilizes many useful development tools including Apache Maven,
Apache Log4J, and multiple Apache Commons components. Coral supports the Apache Beam programming
model for user applications. We are currently working on supporting the Apache Spark programming
APIs as well.
>> === An Excessive Fascination with the Apache Brand ===
>> We hope to make Coral a powerful system for data processing, meeting various needs
for different deployment characteristics, under a more variety of environments. We see the
limitations of simply putting code on GitHub, and we believe the Apache community will help
the growth of Coral for the project to become a positively impactful and innovative open source
software. We believe Coral is a great fit for the Apache Software Foundation due to the collaboration
it aims to achieve from the big data processing community.
>> == Documentation ==
>> The current documentation for Coral is at
>> == Initial Source ==
>> The Coral codebase is currently hosted at
>> == External Dependencies ==
>> To the best of our knowledge, all Coral dependencies are distributed under Apache
compatible licenses. Upon acceptance to the incubator, we would begin a thorough analysis
of all transitive dependencies to verify this fact and further introduce license checking
into the build and release process.
>> == Cryptography ==
>> Not applicable.
>> == Required Resources ==
>> === Mailing Lists ===
>> We will operate two mailing lists as follows:
>>    * Coral PMC discussions:
>>    * Coral developers:
>> === Git Repositories ===
>> Upon incubation:
>> After the incubation, we would like to move the existing repo
to the Apache infrastructure
>> === Issue Tracking ===
>> Coral currently tracks its issues using the Github issue tracker:
We plan to migrate to Apache JIRA.
>> == Initial Committers ==
>>   * Byung-Gon Chun
>>   * Jeongyoon Eo
>>   * Geon-Woo Kim
>>   * Joo Yeon Kim
>>   * Gyewon Lee
>>   * Jung-Gil Lee
>>   * Sanha Lee
>>   * Wooyeon Lee
>>   * Yunseong Lee
>>   * JangHo Seo
>>   * Won Wook Song
>>   * Taegeon Um
>>   * Youngseok Yang
>> == Affiliations ==
>>   * SNU (Seoul National University)
>>     * Byung-Gon Chun
>>     * Jeongyoon Eo
>>     * Geon-Woo Kim
>>     * Gyewon Lee
>>     * Sanha Lee
>>     * Wooyeon Lee
>>     * Yunseong Lee
>>     * JangHo Seo
>>     * Won Wook Song
>>     * Taegeon Um
>>     * Youngseok Yang
>>   * LG
>>     * Jung-Gil Lee
>>   * Samsung
>>     * Joo Yeon Kim
>>   * Viva Republica
>>     * Geon-Woo Kim
>> == Sponsors ==
>> === Champions ===
>> Byung-Gon Chun
>> === Mentors ===
>>   * Hyunsik Choi
>>   * Byung-Gon Chun
>>   * Jean-Baptiste Onofré
>>   * Markus Weimer
>>   * Reynold Xin
>> === Sponsoring Entity ===
>> The Apache Incubator
>> Thanks!
>> Byung-Gon Chun
> --
> Byung-Gon Chun

Byung-Gon Chun

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message