incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li,De(BDG)" <l...@baidu.com>
Subject Re: Looking for Champion
Date Sat, 09 Jun 2018 03:37:16 GMT
Thank you Willem, warmly welcome.

在 2018/6/8 下午11:03, "Willem Jiang" <willem.jiang@gmail.com> 写入:

>Hi,
>
>I'm willing to be the Mentor.
>Please count me in.
>
>
>
>Willem Jiang
>
>Twitter: willemjiang
>Weibo: 姜宁willem
>
>On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <dave2wave@comcast.net> wrote:
>
>> Hi -
>>
>> I’m willing to Champion and Mentor. I have a couple of comments inline.
>> I’ll look at dependency licenses later today. It’s early for me.
>>
>>
>> > On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <lide@baidu.com> wrote:
>> >
>> > Hi all,
>> >
>> > I am Reed, as a developer worked with the team for Palo (a MPP-based
>> interactive SQL data warehousing).
>> > https://github.com/baidu/palo/wiki/Palo-Overview
>> >
>> > We propose to contribute Palo as an Apache Incubator project, and
>> > we are still looking for possible Champion if anyone would like to
>> volunteer. Thanks a lot.
>> >
>> > Best Regards,
>> > Reed
>> >
>> > ===================
>> > The draft of the proposal as below:
>> >
>> > #Apache Palo
>> >
>> > ##Abstract
>> >
>> > Palo is a MPP-based interactive SQL data warehousing for reporting and
>> analysis.
>> >
>> > ##Proposal
>> >
>> > We propose to contribute the Palo codebase and associated artifacts
>> (e.g. documentation, web-site content etc.) to the Apache Software
>> Foundation with the intent of forming a productive, meritocratic and
>>open
>> community around Palo’s continued development, according to the ‘Apache
>> Way’.
>> >
>> > Baidu owns several trademarks regarding Palo, and proposes to transfer
>> ownership of those trademarks in full to the ASF.
>> >
>> > ###Overview of Palo
>> >
>> > Palo’s implementation consists of two daemons: Frontend (FE) and
>>Backend
>> (BE).
>> >
>> > **Frontend daemon** consists of query coordinator and catalog manager.
>> Query coordinator is responsible for receiving users’ sql queries,
>> compiling queries and managing queries execution. Catalog manager is
>> responsible for managing metadata such as databases, tables, partitions,
>> replicas and etc. Several frontend daemons could be deployed to
>>guarantee
>> fault-tolerance, and load balancing.
>> >
>> > **Backend daemon** stores the data and executes the query fragments.
>> Many backend daemons could also be deployed to provide scalability and
>> fault-tolerance.
>> >
>> > A typical Palo cluster generally composes of several frontend daemons
>> and dozens to hundreds of backend daemons.
>> >
>> > Users can use MySQL client tools to connect any frontend daemon to
>> submit SQL query. Frontend receives the query and compiles it into query
>> plans executable by the Backend. Then Frontend sends the query plan
>> fragments to Backend. Backend will build a query execution DAG. Data is
>> fetched and pipelined into the DAG. The final result response is sent to
>> client via Frontend. The distribution of query fragment execution takes
>> minimizing data movement and maximizing scan locality as the main goal.
>> >
>> > ##Background
>> >
>> > At Baidu, Prior to Palo, different tools were deployed to solve
>>diverse
>> requirements in many ways. And when a use case requires the simultaneous
>> availability of capabilities that cannot all be provided by a single
>>tool,
>> users were forced to build hybrid architectures that stitch multiple
>>tools
>> together, but we believe that they shouldn’t need to accept such
>>inherent
>> complexity. A storage system built to provide great performance across a
>> broad range of workloads provides a more elegant solution to the
>>problems
>> that hybrid architectures aim to solve. Palo is the solution.
>> >
>> > Palo is designed to be a simple and single tightly coupled system, not
>> depending on other systems. Palo provides high concurrent low latency
>>point
>> query performance, but also provides high throughput queries of ad-hoc
>> analysis. Palo provides bulk-batch data loading, but also provides near
>> real-time mini-batch data loading. Palo also provides high availability,
>> reliability, fault tolerance, and scalability.
>> >
>> > ##Rationale
>> >
>> > Palo mainly integrates the technology of Google Mesa and Apache
>>Impala.
>> >
>> > Mesa is a highly scalable analytic data storage system that stores
>> critical measurement data related to Google's Internet advertising
>> business. Mesa is designed to satisfy complex and challenging set of
>>users’
>> and systems’ requirements, including near real-time data ingestion and
>> query ability, as well as high availability, reliability, fault
>>tolerance,
>> and scalability for large data and query volumes.
>> >
>> > Impala is a modern, open-source MPP SQL engine architected from the
>> ground up for the Hadoop data processing environment. At present, by
>>virtue
>> of its superior performance and rich functionality, Impala has been
>> comparable to many commercial MPP database query engine. Mesa can
>>satisfy
>> the needs of many of our storage requirements, however Mesa itself does
>>not
>> provide a SQL query engine; Impala is a very good MPP SQL query engine,
>>but
>> the lack of a perfect distributed storage engine. So in the end we chose
>> the combination of these two technologies.
>> >
>> > Learning from Mesa’s data model, we developed a distributed storage
>> engine. Unlike Mesa, this storage engine does not rely on any
>>distributed
>> file system. Then we deeply integrate this storage engine with Impala
>>query
>> engine. Query compiling, query execution coordination and catalog
>> management of storage engine are integrated to be frontend daemon; query
>> execution and data storage are integrated to be backend daemon. With
>>this
>> integration, we implemented a single, full-featured, high performance
>>state
>> the art of MPP database, as well as maintaining the simplicity.
>> >
>> > ##Current Status
>> >
>> > Palo has been an open source project on GitHub (
>> https://github.com/baidu/palo).
>> >
>> > ###Meritocracy
>> >
>> > Palo has been deployed in production at Baidu and is applying more
>>than
>> 200 lines of business. It has demonstrated great performance benefits
>>and
>> has proved to be a better way for reporting and analysis based big data.
>> Still We look forward to growing a rich user and developer community.
>> >
>> > ###Community
>> >
>> > Palo seeks to develop developer and user communities during
>>incubation.
>> >
>> > ###Core Developers
>> >
>> > * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<mailto:maruy
>> ue@baidu.com>)
>> > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>> aa.zhaoc@gmail.com>)
>> > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> > * De Li(https://github.com/lide-reed, mailtolide@sina.com)<mailto:ma
>> iltolide@sina.com%EF%BC%89>
>> > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>> <mailto:chenhao16@baidu.com>)
>> > * Chaoyong Li (https://github.com/cyongli,
>>lichaoyong@baidu.com<mailto:
>> lichaoyong@baidu.com>)
>> > * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
>> gbinlb@gmail.com>)
>> >
>> > ###Alignment
>> >
>> > Palo is related to several other Apache projects:
>> >
>> > * Palo can also read data stored in Apache Hadoop clusters powered by
>> the HDFS filesystem.
>> > * Palo is closely integrated with Impala, which is also being proposed
>> to the Incubator.
>>
>> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>>
>> > * Palo uses Apache Thrift as its RPC and serialization framework of
>> choice.
>> >
>> > ##Known Risks
>> >
>> > ###Orphaned Products
>> >
>> > The core developers of Palo team plan to work full time on this
>>project.
>> There is very little risk of Palo getting orphaned since at least one
>>large
>> company (Baidu) is extensively using it in their production. For
>>example,
>> currently there are more than 200 use cases using Palo in production.
>> Furthermore, since Palo was open sourced at the beginning of October
>>2017,
>> it has received more than 660 stars and been forked nearly 170 times. We
>> plan to extend and diversify this community further through Apache.
>> >
>> > ###Inexperience with Open Source
>> >
>> > The core developers are all active users and followers of open source.
>> They are already committers and contributors to the Palo Github project.
>> All have been involved with the source code that has been released
>>under an
>> open source license, and several of them also have experience developing
>> code in an open source environment. Though the core set of Developers do
>> not have Apache Open Source experience, there are plans to onboard
>> individuals with Apache open source experience on to the project.
>> >
>> > ###Homogenous Developers
>> >
>> > The most of core developers are from Baidu, but after Palo was open
>> sourced, Palo received a lot of bug fixes and enhancements from other
>> developers not working at Baidu.
>> >
>> > ###Reliance on Salaried Developers
>> >
>> > Baidu invested in Palo as the OLAP solution and some of its key
>> engineers are working full time on the project. In addition, since
>>there is
>> a growing Big Data need for scalable OLAP solutions, we look forward to
>> other Apache developers and researchers to contribute to the project.
>>Also
>> key to addressing the risk associated with relying on Salaried
>>developers
>> from a single entity is to increase the diversity of the contributors
>>and
>> actively lobby for Domain experts in the BI space to contribute. Apache
>> Palo intends to do this.
>> >
>> > ###An Excessive Fascination with the Apache Brand
>> >
>> > Palo is proposing to enter incubation at Apache in order to help
>>efforts
>> to diversify the committer-base, not so much to capitalize on the Apache
>> brand. The Palo project is in production use already inside Baidu, but
>>is
>> not expected to be an Baidu product for external customers. As such, the
>> Palo project is not seeking to use the Apache brand as a marketing tool.
>> >
>> > ##Documentation
>> >
>> > Information about Palo can be found at https://github.com/baidu/palo.
>> The following links provide more information about Palo in open source:
>> >
>> > * Palo wiki site: https://github.com/baidu/palo/wiki
>> > * Codebase at Github: https://github.com/baidu/palo
>> > * Issue Tracking: https://github.com/baidu/palo/issues
>> > * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>> > * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>> >
>> > ##Initial Source
>> >
>> > Palo has been under development since 2017 by a team of engineers at
>> Baidu Inc. It is currently hosted on Github.com under an Apache license
>>at
>> https://github.com/baidu/palo.
>> >
>> > ##External Dependencies
>> >
>> > Palo has the following external dependencies.
>> >
>> > * Google gflags (BSD)
>> > * Google glog (BSD)
>> > * Apache Thrift (Apache Software License v2.0)
>> > * Apache Commons (Apache Software License v2.0)
>> > * Boost (Boost Software License)
>> > * OpenLdap (OpenLDAP Software License)
>> > * rapidjson (Tencent)
>> > * Google RE2 (BSD-style)
>> > * lz4 (BSD)
>> > * snappy (BSD)
>> > * cyrus-sasl (CMU License)
>> > * Twitter Bootstrap (Apache Software License v2.0)
>> > * d3 (BSD)
>> > * LLVM (BSD-like)
>> >
>> > Build and test dependencies:
>> >
>> > * ant (Apache Software License v2.0)
>> > * Apache Maven (Apache Software License v2.0)
>> > * cmake (BSD)
>> > * clang (BSD)
>> > * Google gtest (Apache Software License v2.0)
>> >
>> > ##Required Resources
>> >
>> > ###Mailing List
>> >
>> > There are currently no mailing lists. The usual mailing lists are
>> expected to be set up when entering incubation:
>> >
>> > private@palo.incubator.apache.org<mailto:private@palo.
>> incubator.apache.org>
>> > dev@palo.incubator.apache.org<mailto:dev@palo.incubator.apache.org>
>> > commits@palo.incubator.apache.org<mailto:commits@palo.
>> incubator.apache.org>
>> >
>> > ###Subversion Directory
>> >
>> > Upon entering incubation: https://github.com/baidu/palo.
>> > After incubation, we want to move the existing repo from
>> https://github.com/baidu/palo to Apache infrastructure.
>> >
>> > ###Issue Tracking
>> >
>> > Palo currently uses GitHub to track issues. Would like to continue to
>>do
>> so while we discuss migration possibilities with the ASF Infra
>>committee.
>> >
>> > ###Other Resources
>> >
>> > The existing code already has unit tests so we will make use of
>>existing
>> Apache continuous testing infrastructure. The resulting load should not
>>be
>> very large.
>> >
>> > ##Initial Committers
>> >
>> > * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<mailto:maruy
>> ue@baidu.com>)
>> > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>> aa.zhaoc@gmail.com>)
>> > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> > * De Li(https://github.com/lide-reed, mailtolide@sina.com)<mailto:ma
>> iltolide@sina.com%EF%BC%89>
>> > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>> <mailto:chenhao16@baidu.com>)
>> > * Chaoyong Li (https://github.com/cyongli,
>>lichaoyong@baidu.com<mailto:
>> lichaoyong@baidu.com>)
>> > * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
>> gbinlb@gmail.com>)
>> >
>> > ##Affiliations
>> >
>> > The initial committers are employees of Baidu Inc.. The nominated
>> mentors are employees of TODO.
>> >
>> > ##Sponsors
>> >
>> > ###Champion
>> >
>> > TODO
>> >
>> > ###Nominated Mentors
>> >
>> > * sijie guo, guosijie@gmail.com<mailto:guosijie@gmail.com>
>> > * Luke Han, lukehan@apache.org<mailto:lukehan@apache.org>
>> > * Zheng Shao, zshao@apache.org<mailto:zshao@apache.org>
>>
>> Mentors must be members of the IPMC and almost always Members of the
>>ASF.
>>
>> At this moment only Luke Han is qualified.
>>
>> Regards,
>> Dave
>>
>> >
>> > ###Sponsoring Entity
>> >
>> > We are requesting the Incubator to sponsor this project.
>>
>>

Mime
View raw message