incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shuai Di" <shuaidi1...@163.com>
Subject [DISCUSS] Linkis Proposal (a Computation Middleware project)
Date Mon, 12 Jul 2021 02:21:47 GMT
Greetings!


We would like to start an open discussion on bringing Linkis (https://github.com/WeBankFinTech/Linkis),
a computation middleware project, to the Apache Incubator.


The proposal can be found below and is also listed in the Incubator wiki: https://cwiki.apache.org/confluence/display/INCUBATOR/LinkisProposal,
thanks @Junping Du for creating the page! We appreciate anyone who would give guidance or
be willing to support us a an additional mentor.


======
Linkis Proposal




=Abstract=

Linkis builds a computation middleware layer to decouple the upper applications and the underlying
data engines, provides standardized interfaces (REST, JDBC, WebSocket etc.) to easily connect
to various underlying engines (Spark, Presto, Flink, etc.), while enables cross engine context
sharing, unified job& engine governance and orchestration.

Linkis codebase: https://github.com/WeBankFinTech/Linkis




=Proposal=

Linkis is designed to solve computation governance problems in complex distributed environments
(typically in a big data platform), where you have to deal with different types, versions,
or clusters of underlying data engines and hundreds of diversified engine clients at the upper
application layer as well.

Linkis acts as a proxy between the upper applications layer and underlying engines layer.
By abstracting and implementing the 3 common phases of a job/request for submit, prepare and
execute, Linkis is able to facilitate the connectivity, governance and orchestration capabilities
of different kind of engines like OLAP, OLTP (developing), Streaming, and handle all these
"computation governance" affairs in a standardized reusable way.

We are actively operating the Linkis community and we are looking forward to increase community
activity continuously.

We propose to contribute the Linkis codebase to the Apache Software Foundation. We believe
that bringing Linkis into Apache Software Foundation and following the COMMUNITY-LED DEVELOPMENT
"APACHE WAY" could continuously improve project quality and community vitality.




=Background=

In today's complex and distributed environment, the communication, coordination and governance
of application services have developed mature solutions from SOA to micro-services, and many
practices from ESB to Service Mesh to decouple different services.

However, things go different while an application service needs to communicate with the underlying
engines. Engines are isolated from each other, and the client-server tight coupling pattern
goes everywhere. Each and every upper application has to directly connect to and access various
underlying engines in a tightly coupled way, and solves the "computation governance" problems
on its own, including maintaining different client environments, submiting the job, monitoring
job status, fetching the output, handling large number of concurrent client instances, watching
the bad jobs, adapt to engine version changes, etc.

It lacks a common layer of "computation middleware" between the numerous upper-layer applications
and the countless underlying engines to handle all these "computation governance" affairs
in a standardized reusable way, that's why we started the Linkis project.

Firstly, Linkis could reduce the complexity of connectivity. Instead of maintaining a variety
of engine client environments, users now only need to install the Linkis client, or even just
HTTP client while using the REST interface. Routing query to desired clusters could be done
by simply providing a tag.

Secondly, Links provides governance capabilities such as multi-tenancy, concurrency control,
resource management, query validation, privilege enhancement and auditing.

Meanwhile, Linkis enables orchestration strategies such as routing, load-balance, active-active
and hybrid computation across engines (some still under development).




=Rationale=

Linkis is built on distributed microservice architecture with great scalability and extendibility.
The enhancements of high concurrency and fault tolerance make it more stable and reliable.
It has already supported many production environments with large number of daily jobs over
a long term.

Linkis's microservices are divided into 3 groups: Computation Governance Services, Public
Enhancement Services, and Microservice Governance Services.

Computation Governance Services(CGS) group is responsible for the core process of job/request
submission, preparation and execution, lifecycle management, resource management, validation
and orchestration.

Public enhancement Services(PES) group provides basic public functions including job context
sharing, material management and data source management, to serve other Linkis services and
upper application systems.

Microservice Governance Services(MGS) group includes customized Spring Cloud Gateway, Eureka
and Open Feign, to provide basic functions like routing, service registration and discovery,
and RPC framework.

By providing capabilities of multi-tenant, high concurrency, job dispatching/management policies,
unified resource control and orchestration, Linkis makes the submission, preparation and execution
of computation jobs more flexible, reliable and controllable, and successfully return the
results. It could greatly reduce the overall development, operation and maintenance costs,
and the architecture complexity.

Based on Linkis the computation middleware, new upper layer applications could be quickly
developed by reusing the Linkis computation governance functions, as what’s done in the
open source big data platform suite “WeDataSphere” (https://github.com/WeBankFinTech/WeDataSphere).

Linkis currently mainly supports OLAP and Streaming engines, and we are planning to support
OLTP engines better. Containerization is also one of the important development directions
of Linkis.




=Initial Goal=

- Migrate the existing codebase, website, and documentation to Apache-hosted infrastructure.

- Work with the infrastructure team to implement and approve our code review, build, and testing
workflows in the context of the ASF.

- Incremental development and release under Apache guidelines.

- Grow and diversify the Linkis community in the Apache Way.




=Current Status=

==Meritocracy==

Linkis project was started at WeBank and has been an open-source project on GitHub since July
2019. Linkis has been quickly adopted by many organizations, more than 500 organizations have
tested Linkis based on our sandbox application records, dozens of them have introduced Linkis
into production based on the users’ spontaneous feedbacks, distributed in various industries
including banking, telecommunications, insurance, manufacturing, education, internet, etc.

Linkis already has contributors and users from different companies. We’ve set up the Committer
team and we’re constantly seeking for potential new committer. New Contributors are always
highly welcomed and guided by existed committers. Users could get timely support from community
IM groups and GitHub.




==Community==

Linkis now has 15 committers from 6 companies including WeBank, China Telecom, Kanzhun Ltd.,
iQIYI Inc., HONOR Mobile Phone, and Samoyed Digital. We have a developer IM group for more
than 100 people from different organizations, and 9 user IM groups for more than 4,500 people.




==Core Developers==

The core developers of Linkis are working in the big data team of different companies, mainly
in WeBank since the project was initiated there.

- Shuai Di (WeBank)

- Qiang Yin (WeBank)

- Heping Wang (WeBank)

- Yongkun Yang (WeBank)

- Zhiyue Yang (WeBank)

- You Liu (WeBank)

- XiaoGang Wang (China Telecom)

- Hui Zhu (Kanzhun)

- Zheng Wang (iQiyi)

- Rong Zhang (Honor)




==Releases==

Linkis has released multiple versions as listed here: https://github.com/WeBankFinTech/Linkis/releases

We will follow the ASF guidelines more closely, and adopt the ASF source release process upon
joining the incubator.




==Code Reviews==

Linkis’s code reviews are currently public on Github: https://github.com/WeBankFinTech/Linkis/pulls
.




==Alignment==

As Linkis was built to address connectivity and other computation governance issues with various
underlying engines, it depends on multiple ASF projects such as Spark, Flink, Hive and Hadoop.
Linkis’s Engine Connector Manager service will start different Engine Connectors to connect
to different underlying engines, providing computation governance abilities which benefits
the usage and maintenance of these engines. Linkis will continue to expand the types of engines
it supports in ASF projects, such as HBase, Kylin, and more.




=Known Risks=

==Orphaned Products==

The risk of Linkis becoming an orphan product is very low, because it’s already been the
core infrastructure component in the production environments of dozens of companies' big data
platforms, including large companies like WeBank, China Telecom, Ping An Insurance Company,
Hikvision, etc. Hundreds of thousands of computation jobs are performed through Linkis in
these companies everyday. Developers from these companies are increasingly joining the Linkis
community as contributors.

Linkis has 12 major releases so far, and received 355 PRs from contributors, which indicates
the activity and vitality of the Linkis community. Linkis is also the core component of the
open source big data platform suite “WeDataSphere”, even more users and developers are
already active in this larger community.

We are looking forward to further expand and diversify the community by joining Apache. We
are also further improving the adherence to the Community-Led development pattern, and the
standardization and transparency of community governance.




==Inexperience with Open Source==

Linkis’s core developers have been running Linkis as a community-oriented open source project
for a period of time, some of them already have experience working with other open source
communities. The current Linkis user group scale of more than 4500 people is also a proof
of our commitment and passion for operating the open source community.

Meanwhile, we’ve begun to refine our community governance efforts under the guidance of
Apache mentors, and we’ll learn more about how to operate the open source community effectively
and properly by following the Apache way in our incubator journey.




==Homogenous Developers==

Most of the current core developers work at WeBank where the Linkis project started. We also
had developers from China Telecom, Kanzhun, iQiyi and Honor Mobile Phone elected to the committer
group, and already have led the release of several versions of Linkis. Samoyed Digital has
the latest nominated committer because of their solid contributions to Linkis data source
management module.

Though Linkis community may not be diverse enough yet, we are constantly looking for new contributors
and potential committers to enhance the diversity of the community and the vitality of the
project.




==An Excessive Fascination with the Apache Brand==

We acknowledge that the Apache brand would add a lot of value and reputation to Linkis, and
will benefit the cooperation and promotion at the global scale. However, our primary purpose
is to build a more diverse and viable community and to gain stability for long-term development
as submitting Linkis to Apache. We will also strictly follow the ASF's rules and policies
under the guidance of the Incubator PMC.




=Documentation=

Documentation about Linkis can be found at https://github.com/WeBankFinTech/Linkis-Doc . Following
links provide more information:

- Codebase at Github: https://github.com/WeBankFinTech/Linkis

- Issue Tracking: https://github.com/WeBankFinTech/Linkis/issues

- Releases: https://github.com/WeBankFinTech/Linkis/releases







=Initial Source=

https://github.com/WeBankFinTech/Linkis




=External Dependencies=




Back-end:

| Dependencies |
License
|
Comment
|
|
caffeine
|
Apache 2.0
|


|
| cglib | Apache 2.0 |
|
| commons-beanutils | Apache 2.0 |
|
| commons-codec | Apache 2.0 |
|
| commons-collections | Apache 2.0 |
|
| commons-dbcp | Apache 2.0 |
|
| commons-exec | Apache 2.0 |
|
| commons-io | Apache 2.0 |
|
| commons-lang3 | Apache 2.0 |
|
| commons-math3 | Apache 2.0 |
|
| commons-net | Apache 2.0 |
|
| commons-text | Apache 2.0 |
|
| dozer-core | Apache 2.0 |
|
| druid | Apache 2.0 |
|
| fastjson | Apache 2.0 |
|
| gson | Apache 2.0 |
|
| guava | Apache 2.0 |
|
| hadoop-auth | Apache 2.0 |
|
| hadoop-client | Apache 2.0 |
|
| hadoop-common | Apache 2.0 |
|
| hadoop-hdfs | Apache 2.0 |
|
| hadoop-yarn-client | Apache 2.0 |
|
| hive-common | Apache 2.0 |
|
| hive-exec | Apache 2.0 |
|
| hive-jdbc | Apache 2.0 |
|
| httpclient | Apache 2.0 |
|
| httpmime | Apache 2.0 |
|
| jackson-annotations | Apache 2.0 |
|
| jackson-databind | Apache 2.0 |
|
| jackson-module-scala | Apache 2.0 |
|
| javacsv | LGPL |
|
| jaxrs-ri | CDDL, GPL 1.1 | will remove |
| jersey-container-servlet | CDDL, GPL 1.1 | will remove |
| jersey-container-servlet-core | CDDL, GPL 1.1 | will remove |
| jersey-entity-filtering | CDDL, GPL 1.1 | will remove |
| jersey-json | CDDL, GPL 1.1 | will remove |
| jersey-media-json-jackson | CDDL, GPL 1.1 | will remove |
| jersey-media-multipart | CDDL, GPL 1.1 | will remove |
| jersey-server | CDDL, GPL 1.1 | will remove |
| jersey-servlet | CDDL, GPL 1.1 | will remove |
| jersey-spring3 | CDDL, GPL 1.1 | will remove |
| jetty-server | Apache 2.0, EPL 1.0 |
|
| jetty-webapp | Apache 2.0, EPL 1.0 |
|
| json4s-jackson | Apache 2.0 |
|
| jsp-api | CDDL, GPL 2.0 | will remove |
| junit | EPL 1.0 |
|
| libthrift | Apache 2.0 |
|
| log4j-1.2-api | Apache 2.0 |
|
| log4j-api | Apache 2.0 |
|
| log4j-core | Apache 2.0 |
|
| log4j-slf4j-impl | Apache 2.0 |
|
| mockito-all | MIT |
|
| mybatis-plus-boot-starter | Apache 2.0 |
|
| mysql-connector-java | GPL 2.0 | will remove |
| netty-all | Apache 2.0 |
|
| pagehelper | MIT |
|
| poi-ooxml | Apache 2.0 |
|
| protostuff-api | Apache 2.0 |
|
| protostuff-core | Apache 2.0 |
|
| protostuff-runtime | Apache 2.0 |
|
| py4j | BSD 2-clause |
|
| reactor-netty | Apache 2.0 |
|
| reflections | BSD 2-clause |
|
| scalacheck | BSD 3-clause |
|
| scalacheck-shapeless | Apache 2.0 |
|
| scala-compiler | Apache 2.0 |
|
| scala-library | Apache 2.0 |
|
| scalamock-scalatest-support | MIT |
|
| scalap | Apache 2.0 |
|
| scala-reflect | Apache 2.0 |
|
| scalatest | Apache 2.0 |
|
| slf4j-api | MIT |
|
| spark-core | Apache 2.0 |
|
| spark-hive | Apache 2.0 |
|
| spark-repl | Apache 2.0 |
|
| spark-sql | Apache 2.0 |
|
| spark-testing-base | Apache 2.0 |
|
| spoiwo | MIT |
|
| spring-boot | Apache 2.0 |
|
| spring-boot-actuator-autoconfigure | Apache 2.0 |
|
| spring-boot-starter | Apache 2.0 |
|
| spring-boot-starter-actuator | Apache 2.0 |
|
| spring-boot-starter-aop | Apache 2.0 |
|
| spring-boot-starter-cache | Apache 2.0 |
|
| spring-boot-starter-jetty | Apache 2.0 |
|
| spring-boot-starter-log4j2 | Apache 2.0 |
|
| spring-boot-starter-quartz | Apache 2.0 |
|
| spring-boot-starter-reactor-netty | Apache 2.0 |
|
| spring-boot-starter-web | Apache 2.0 |
|
| spring-cloud-commons | Apache 2.0 |
|
| spring-cloud-config-client | Apache 2.0 |
|
| spring-cloud-context | Apache 2.0 |
|
| spring-cloud-gateway-core | Apache 2.0 |
|
| spring-cloud-starter | Apache 2.0 |
|
| spring-cloud-starter-config | Apache 2.0 |
|
| spring-cloud-starter-gateway | Apache 2.0 |
|
| spring-cloud-starter-netflix-eureka-client | Apache 2.0 |
|
| spring-cloud-starter-netflix-eureka-server | Apache 2.0 |
|
| spring-cloud-starter-openfeign | Apache 2.0 |
|
| spring-core | Apache 2.0 |
|
| spring-jdbc | Apache 2.0 |
|
| spring-security-crypto | Apache 2.0 |
|
| spring-test | Apache 2.0 |
|
| spring-tx | Apache 2.0 |
|
| spring-web | Apache 2.0 |
|
| websocket-client | Apache 2.0, EPL 1.0 |
|
| websocket-server | Apache 2.0, EPL 1.0 |
|
| xlsx-streamer | Apache 2.0 |
|
| xstream | BSD 3-clause |
|




Front-end:

|
Dependencies
|
License
|
Comment
|
|
axios          
|
MIT
|


|
|
highlight.js  
|
BSD-3-Clause
|


|
|
iview          
|
MIT
|


|
|
lodash  
|
MIT
|


|
|
moment   
|
MIT
|


|
|
monaco-editor  
|
MIT
|


|
|
sql-formatter      
|
MIT
|


|
| svgo   | MIT |
|
| vue  | MIT |
|
| vue-i18n | MIT |
|
| vue-router   | MIT |
|
| vuedraggable       | MIT |
|
| vuescroll      | MIT |
|




=Required Resources=




==Mailing List==

Currently Linkis has no mailing list. The usual mailing lists are expected to be set up when
entering incubation:

- private@linkis.incubator.apache.orgfor PPMC discussions;

- dev@linkis.incubator.apache.org for development discussions;

- notification@linkis.incubator.apache.org for user notifications, and notifications from
GitHub.




==Git Repositories==

Upon entering incubation, we request to move the existing repository from https://github.com/WeBankFinTech/Linkis
to Apache infrastructure like https://github.com/apache/Incubator-Linkis.




==Issue Tracking==

The Linkis community would like to continue using GitHub Issues if possible.




==Other Resources==

Apache Jenkins




=Source and Intellectual Property Submission Plan=

Most of the current code is Apache 2.0 licensed and the copyright is assigned to WeBank. If
the project enters incubator, WeBank will transfer the source code & trademark ownership
to ASF via a Software Grant Agreement.




=Initial Committers=

- Shuai Di (shuaidi1024@163.com)

- Qiang Yin (690574002@qq.com)

- Heping Wang (374126165@qq.com)

- Yongkun Yang (wimkunkun@gmail.com)

- Zhiyue Yang (904666286@qq.com)

- You Liu (405240259@qq.com)

- Deyi Hua (david_hua1996@hotmail.com)

- Le Bai (120190695@qq.com)

- Xiaogang Wang (913546481@qq.com)

- Hui Zhu (46580583@qq.com)

- Zhen Wang (643348094@qq.com)

- Rong Zhang (693404752@qq.com)

- Xiaohua Yi (405078363@qq.com)

- Ke Zhou (zhouke309@vip.qq.com)

- Jian Xie (xj2jx@163.com)




=Affiliations=

Shuai Di, Qiang Yin, Heping Wang, Yongkun Yang, Zhiyue Yang, You Liu, Deyi Hua, Le Bai, Ke
Zhou and Jian Xie of the initial committers are employees of WeBank.

Xiaogang Wang of the initial committers is an employee of China Telecom.

Hui Zhu of the initial committers is an employee of Kanzhun.

Zhen Wang of the initial committers is an employee of iQiyi.

Rong Zhang of the initial committers is an employee of HONOR Mobile Phone.

Xiaohua Yi of the initial committers is an employee of Samoyed Digital.




=Sponsors=

==Champion==

Junping_Du (ASF Member, IPMC Member), junping_du@apache.org




==Nominated Mentors==

Shao Feng Shi (ASF Member, IPMC Member), shaofengshi@apache.org

Duo Zhang (ASF Member, IPMC Member), zhangduo@apache.org

Jerry Shao (ASF Member, IPMC Member), jshao@apache.org

Lidong Dai (IPMC Member), lidongdai@apache.org




=Sponsoring Entity=

We request the Apache Incubator to sponsor this project.

======

Best Regards,

Shuai Di 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message