incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Delacretaz <bdelacre...@codeconsult.ch>
Subject Re: [VOTE] Accept YuniKorn into Apache Incubator
Date Mon, 13 Jan 2020 17:44:54 GMT
Hi,

On Fri, Jan 10, 2020 at 6:47 PM Vinod Kumar Vavilapalli
<vinodkv@apache.org> wrote:
> I'd like to call a vote on accepting YuniKorn into the Apache Incubator...

+1

I'm copying the proposal text below, we usually do that to get
complete mail archives.

-Bertrand


YuniKorn proposal

Abstract

YuniKorn is a standalone resource scheduler responsible for scheduling
batch jobs and long-running services on large scale distributed
systems running in on-premises environments as well as different
public clouds.
Proposal

YuniKorn ['ju:nikɔ:n] is a unified resource scheduler aiming to
achieve fine-grained resource sharing for various workloads
efficiently on a large scale, multi-tenant and cloud-native
environments. YuniKorn brings a unified, cross-platform scheduling
experience for mixed workloads, with support for but not limited to,
Apache™ Hadoop® YARN and Kubernetes. YuniKorn is a made-up word
(credit to Vinod Kumar Vavilapalli) - it’s made up of Y for Apache™
Hadoop® YARN, K for K8s, Uni for Unified, and its pronunciation is the
same as “Unicorn”

Currently, YuniKorn is an open-source project with Apache 2.0 license.
The source code is hosted as a git-repo under github.com/cloudera
domain. We would like to share it with the ASF and expand the
community to a wider range of users and contributors.
Background

Enterprise users run their workloads on different platforms such as
Apache™ Hadoop® YARN and Kubernetes. They need to work with different
resource schedulers in order to plan their workloads to run on these
platforms efficiently. The scheduler implementations are fragmented,
and not optimized to balance existing use-cases like batch workloads
along with new needs such as cloud-native architecture, autoscaling,
etc. We need a single resource planning/management framework to manage
resources on different platforms using the same semantics, in order to
address all the important resource management requirements.
Rationale

There is no solution that exists now to address the needs of having a
unified resource scheduling experiences across platforms. That makes
it difficult to manage workloads running on different environments,
from on-premise to Cloud. YuniKorn aims to satisfy these needs.
YuniKorn is designed around the following principles:

1) Support different environments

As the compute platforms are evolving quickly, there are more and more
challenges appears in on-prem, cloud or hybrid environments. YuniKorn
aims to bring unified scheduling experiences across multiple
environments with enhanced scheduling capabilities.

2) Support extensive type of workloads

To improve the efficiency of the computing platform, a key idea is to
run different types of applications, like long-running services and
batch jobs, on shared resources. YuniKorn is an effort to address all
the scheduling features needed for such mixed workload environments.

3) Benefit both big-data and cloud-native communities

A resource scheduler needs to be capable of supporting mixed
workloads, both batch and long-running services. This is the key to
improving cluster utilization, and to reduce the complexity of
dev-ops. By creating a common scheduler that is decoupled from the
container platforms underneath, it can benefit both Apache™ Hadoop®
YARN and the Kubernetes communities.
Initial Goals

Initial goals are:

    Move the existing codebase, documentation to Apache hosted repo
    Setup mailing lists, web-site, CI/CD pipeline under Apache infrastructure
    Setup JIRA for issue tracking
    Incremental development and releases according to Apache guidelines
    Expand the community and bring more diversified contributors/users
to the community

Current Status
Meritocracy

Many of the initial developers of YuniKorn are already Apache
committers and PMC members from other Apache projects, such as Apache
Hadoop and Apache Submarine. Many of us have worked in the Apache
Hadoop community for years and know the Apache way well. We believe
strongly in meritocracy in electing committers and PMC members. We
believe that contributions can come in forms other than just code: for
example, one of our initial proposed committers has contributed solely
in the area of project documentation. We will encourage contributions
and participation of all types, and ensure that contributors are
appropriately recognized.
Community

YuniKorn is a relatively new open source project, Cloudera is the
original development sponsor for YuniKorn. From the beginning of the
project itself, we had clearly aimed to have this as an open source
project, so we started to build the community from the very early
stages. We received a lot of feedback and valuable suggestions from
other community members while the project was hosted as an open source
project on github. This feedback has greatly influenced some of our
designs. For e.g, developers from Alibaba had been involved in the
very early stage of development, lots of effort related to
performance/throughput enhancement were contributed by them. Lots of
other organizations further showed their interest to join the
community once we started talking about it in meetups, conferences
etc.
Core developers

The project was initiated in Cloudera and so the core developers are
heavily from this organization. Tao Yang from Alibaba joined the
development at a very early stage. The core developers of YuniKorn are
(listed in alphabetical order):

    Akhil PB (Cloudera)
    Sunil Govindan (Cloudera)
    Tao Yang (Alibaba)
    Vinod Vavilapalli (Cloudera)
    Wangda Tan (Cloudera)
    Weiwei Yang (Cloudera)
    Wilfred Spiegelenburg (Cloudera)


Given the origin history, the core development team so far has not
been very diverse, but we’ve been attempting to grow that diversity.
We have every hope to continue building a diverse and sustainable
community if the project gets accepted into Apache.
Alignment

The motivation of YuniKorn project is to resolve common resource
scheduling problems for various workloads, on large scale distributed
systems. Apache is home to one of these systems in the form of Apache
Hadoop YARN. Many of thee workloads that we expect to leverage
YuniKorn are computing engines like Apache Spark, Apache Flink whether
they run on top of YARN or on Kubernetes.
Known Risks
Project Name

We have done a search of the name "YuniKorn" on Github, and at the
time of the search we found nothing related to resource scheduler or
distributed system. We also did a search of the name YuniKorn as a
trademark and there seem to be none. A generic web search also didn't
return any relevant projects. Since the name seems to be unique, easy
to remember, pronounce, and relevant to the project, we believe it is
a suitable name even at the ASF.

Cloudera does NOT have a trademark on the name YuniKorn, so there is
no trademark assignment needed. Cloudera will commit to using Apache
YuniKorn as the project name when/if it graduates and becomes an
Apache project.
Orphaned products

The core developers of YuniKorn project from different companies plan
to work full time on this project. Currently, the initial team intends
to continue the investments on the YuniKorn project, it will be
integrated into the solutions to the customers. Several other
organizations (like Alibaba) have also started to evaluate the
project, and plan to adopt it in their production environments. We
anticipate the adoption will be further improved once it becomes an
Apache project.

We have also got support from core-platform developers and Apache
committers who are interested in contributing to YuniKorn project from
different companies like Microsoft, Nvidia, Tencent, etc. We’re
expecting to see more contributions from these committers and usage by
their internal platforms. So overall, the risk of YuniKorn being an
orphaned project is low.
Inexperience with Open source

Most of the core developers in YuniKorn project are experienced open
source veterans, several developers are Apache committers and PMC
members of other projects, such as Apache™ Hadoop®. And the
development style is already very likely the Apache way

    We have open community meetings to discuss designs, problems and roadmaps
    We publish all patches and issue related discussions on github
    We enforce the code review and log all comments in github issues

Length of Incubation

We started the work 10 months ago, so far the groundwork for YuniKorn
is done and the initial version can work with K8s seamlessly. Based on
the initial contributers’ experience in ASF projects, we don’t expect
that there will be huge gaps before YuniKorn can graduate with
regarding to ASF’s policies on software and releases. The goal is to
grow the community quickly and increase the user base within a few
months while making releases that adhere to the ASF standards. When it
reaches a reasonable size of adoption, and a strong community with a
good number of committers/PMC members, we can prompt the graduation.
We expect the length of incubation to be approximately 12 to 18
months.
Homogenous Development

The initial proposed list of committers and contributors includes
developers from several institutions and industry participants. The
developers are also from different regions like U.S, Australia, India,
and the development team leverages slack, community mailing list,
weekly community calls to collaborate efficiently.
Reliance on Salaried Developers

Clearly, Cloudera has contributed most of the initial development
through salaried developers. But since the very beginning, YuniKorn is
built as a community effort project. We have people from other
organizations that are already collaborating with us on github. This
includes both at the source code level, as well as participating in
designs and providing feedback through community calls. We expect our
reliance on salaried developers to decrease drastically during the
incubation process itself.
Relationship to Other Apache Products


YuniKorn is very closely related to other Big-Data projects in Apache,
such as Hadoop YARN, Spark, Hive, Flink, etc.

YuniKorn’s core idea is to support both long-running and batch
workloads like Spark, Hive, Flink etc, and provide a consistent,
unified way to manage and schedule resources for Big Data workloads
across resource managers like Apache™ Hadoop® YARN / Kubernetes and
on-premise and cloud environments.

Many of the core ideas for YuniKorn come from the experience of the
initial team building Apache Hadoop YARN’s schedulers - Capacity
Scheduler and Fair Scheduler.
An Excessive Fascination with the Apache Brand

Many of the initial developers in YuniKorn project are already
experienced Apache committers, PMC members. We understand the value of
the Apache way, and how to operate the project development on a day to
day basis. The reason for proposing YuniKorn as an Apache project is
to build a healthy community, increasing adoption & the size of the
community and end users, because we believe the only way to build a
highly valuable infrastructure layer software is to have wide adoption
and cater to common use cases.
Documentation

Project summary:

    https://github.com/cloudera/yunikorn-core/blob/master/README.md

User guides

    https://github.com/cloudera/yunikorn-core/blob/master/docs/user-guide.md

Developer guides

    https://github.com/cloudera/yunikorn-core/blob/master/docs/developer-guide.md

Roadmap:

    https://github.com/cloudera/yunikorn-core/blob/master/docs/roadmap.md

Initial Source

YuniKorn is written in Golang, and currently, the source code is
hosted in several GitHub repositories

    Scheduler interface:
https://github.com/cloudera/yunikorn-scheduler-interface
    Scheduler core: https://github.com/cloudera/yunikorn-core
    K8s Shim:https://github.com/cloudera/yunikorn-k8shim
    Scheduler Web UI: https://github.com/cloudera/yunikorn-web

Source and Intellectual Property Submission Plan
External Dependencies

External dependencies are listed in below table

Library


Type


License

k8s.io/api


K8s API


Apache License 2.0

k8s.io/apimachinery


K8s API


Apache License 2.0

k8s.io/client-go


K8s client library


Apache License 2.0

github.com/looplab/fsm


Go state machine library


MIT License

github.com/satori/go.uuid


Go UUID library


MIT License

github.com/uber-go/zap


Go logging library


MIT License

github.com/golang/protobuf


Go protobuf library


BSD 3-Clause License

github.com/gorilla/mux


Go network library


BSD 3-Clause License

google.golang.org/grpc


Go RPC library


Apache License 2.0

gopkg.in/yaml.v2


Go YAML library


Apache License 2.0

github.com/prometheus/client_golang


Prometheus Client Library


Apache License 2.0

Angular v6.1.x


Angular UI Framework Libraries


MIT License

TypeScript


TypeScript Language Compiler


Apache License 2.0

Chart.js


JavaScript Charting Library


MIT License

Moment.js


JavaScript Date & Time Library


MIT License


Build and test only:

gotest.tools


Test library


Apache License 2.0

github.com/stretchr/testify


Test library


MIT License

Karma


Unit test library


MIT License

Protactor


End2End test library


MIT License

Json-server


Test server


MIT License

Yarn


Dependency manager


BSD 2-Clause License


Cryptography

YuniKorn does not currently include any cryptography-related code.
Required Resources
Mailing lists:

    private@yunikorn.incubator.apache.org (PMC list)
    commits@yunikorn.incubator.apache.org (git push emails)
    issues@yunikorn.incubator.apache.org (JIRA issue feed)
    dev@yunikorn.incubator.apache.org (Dev discussion)
    user@yunikorn.incubator.apache.org (User questions)

Git Repositories

Git is the preferred source control system

    git://git.apache.org/yunikorn-* (We have multiple git repositories)

Issue Tracking

JIRA YuniKorn (YUNIKORN-)
Other Resources

None
Initial Committers and Affinities

    Akhil PB (apb@cloudera.com) (Cloudera)
    Sunil Govindan (sunilg@apache.org) (Cloudera)
    Vinod Kumar Vavilapalli (vinodkv@apache.org) (Cloudera)
    Wangda Tan (wangda@apache.org) (Cloudera)
    Weiwei Yang (wwei@apache.org) (Cloudera)
    Wilfred Spiegelenburg (wspiegelenburg@cloudera.com) (Cloudera)
    Carlo Curino (curino@apache.org) (Microsoft)
    Subramaniam Krishnan (subru@apache.org) (Microsoft)
    Arun Suresh (asuresh@apache.org) (Microsoft)
    Konstantinos Karanasos (kkaranasos@apache.org) (Microsoft)
    Jonathan Hung (jhung@apache.org) (LinkedIn)
    DB Tsai (dbtsai@apache.org) (Apple)
    Junping Du (junping_du@apache.org) (Tencent)
    Tao Yang (taoyang@apache.org) (Alibaba)
    Jason Lowe (jlowe@apache.org) (Nvidia)

Sponsors
Champion

Vinod Kumar Vavilapalli (vinodkv@apache.org)
Nominated Mentors

Junping Du (Tencent), (junping_du@apache.org)

Felix Cheung (Uber), (felixcheung@apache.org)

Jason Lowe (Nvidia), (jlowe@apache.org)

Holden Karau (Apple), (holden@apache.org)
Sponsoring Entity

The Apache Incubator

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message