incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tan,Zhongyi" <tanzhon...@baidu.com>
Subject Re: Looking for Champion
Date Fri, 08 Jun 2018 11:24:57 GMT
Hi,guys, 

palo is one good project ,

Is there anyone who volunteer to be the champion of it to
help us to go through process to become an apache project?

Thanks

>
>On 2018/06/08 04:45:32, "Li,De(BDG)" <lide@baidu.com> wrote:
>> Hi all,
>> 
>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>>interactive SQL data warehousing).
>> https://github.com/baidu/palo/wiki/Palo-Overview
>> 
>> We propose to contribute Palo as an Apache Incubator project, and
>> we are still looking for possible Champion if anyone would like to
>>volunteer. Thanks a lot.
>> 
>> Best Regards,
>> Reed
>> 
>> ===================
>> The draft of the proposal as below:
>> 
>> #Apache Palo
>> 
>> ##Abstract
>> 
>> Palo is a MPP-based interactive SQL data warehousing for reporting and
>>analysis.
>> 
>> ##Proposal
>> 
>> We propose to contribute the Palo codebase and associated artifacts
>>(e.g. documentation, web-site content etc.) to the Apache Software
>>Foundation with the intent of forming a productive, meritocratic and
>>open community around Palo’s continued development, according to the
>>‘Apache Way’.
>> 
>> Baidu owns several trademarks regarding Palo, and proposes to transfer
>>ownership of those trademarks in full to the ASF.
>> 
>> ###Overview of Palo
>> 
>> Palo’s implementation consists of two daemons: Frontend (FE) and
>>Backend (BE).
>> 
>> **Frontend daemon** consists of query coordinator and catalog manager.
>>Query coordinator is responsible for receiving users’ sql queries,
>>compiling queries and managing queries execution. Catalog manager is
>>responsible for managing metadata such as databases, tables, partitions,
>>replicas and etc. Several frontend daemons could be deployed to
>>guarantee fault-tolerance, and load balancing.
>> 
>> **Backend daemon** stores the data and executes the query fragments.
>>Many backend daemons could also be deployed to provide scalability and
>>fault-tolerance.
>> 
>> A typical Palo cluster generally composes of several frontend daemons
>>and dozens to hundreds of backend daemons.
>> 
>> Users can use MySQL client tools to connect any frontend daemon to
>>submit SQL query. Frontend receives the query and compiles it into query
>>plans executable by the Backend. Then Frontend sends the query plan
>>fragments to Backend. Backend will build a query execution DAG. Data is
>>fetched and pipelined into the DAG. The final result response is sent to
>>client via Frontend. The distribution of query fragment execution takes
>>minimizing data movement and maximizing scan locality as the main goal.
>> 
>> ##Background
>> 
>> At Baidu, Prior to Palo, different tools were deployed to solve diverse
>>requirements in many ways. And when a use case requires the simultaneous
>>availability of capabilities that cannot all be provided by a single
>>tool, users were forced to build hybrid architectures that stitch
>>multiple tools together, but we believe that they shouldn’t need to
>>accept such inherent complexity. A storage system built to provide great
>>performance across a broad range of workloads provides a more elegant
>>solution to the problems that hybrid architectures aim to solve. Palo is
>>the solution.
>> 
>> Palo is designed to be a simple and single tightly coupled system, not
>>depending on other systems. Palo provides high concurrent low latency
>>point query performance, but also provides high throughput queries of
>>ad-hoc analysis. Palo provides bulk-batch data loading, but also
>>provides near real-time mini-batch data loading. Palo also provides high
>>availability, reliability, fault tolerance, and scalability.
>> 
>> ##Rationale
>> 
>> Palo mainly integrates the technology of Google Mesa and Apache Impala.
>> 
>> Mesa is a highly scalable analytic data storage system that stores
>>critical measurement data related to Google's Internet advertising
>>business. Mesa is designed to satisfy complex and challenging set of
>>users’ and systems’ requirements, including near real-time data
>>ingestion and query ability, as well as high availability, reliability,
>>fault tolerance, and scalability for large data and query volumes.
>> 
>> Impala is a modern, open-source MPP SQL engine architected from the
>>ground up for the Hadoop data processing environment. At present, by
>>virtue of its superior performance and rich functionality, Impala has
>>been comparable to many commercial MPP database query engine. Mesa can
>>satisfy the needs of many of our storage requirements, however Mesa
>>itself does not provide a SQL query engine; Impala is a very good MPP
>>SQL query engine, but the lack of a perfect distributed storage engine.
>>So in the end we chose the combination of these two technologies.
>> 
>> Learning from Mesa’s data model, we developed a distributed storage
>>engine. Unlike Mesa, this storage engine does not rely on any
>>distributed file system. Then we deeply integrate this storage engine
>>with Impala query engine. Query compiling, query execution coordination
>>and catalog management of storage engine are integrated to be frontend
>>daemon; query execution and data storage are integrated to be backend
>>daemon. With this integration, we implemented a single, full-featured,
>>high performance state the art of MPP database, as well as maintaining
>>the simplicity.
>> 
>> ##Current Status
>> 
>> Palo has been an open source project on GitHub
>>(https://github.com/baidu/palo).
>> 
>> ###Meritocracy
>> 
>> Palo has been deployed in production at Baidu and is applying more than
>>200 lines of business. It has demonstrated great performance benefits
>>and has proved to be a better way for reporting and analysis based big
>>data. Still We look forward to growing a rich user and developer
>>community.
>> 
>> ###Community
>> 
>> Palo seeks to develop developer and user communities during incubation.
>> 
>> ###Core Developers
>> 
>> * Ruyue Ma (https://github.com/maruyue,
>>maruyue@baidu.com<mailto:maruyue@baidu.com>)
>> * Chun Zhao (https://github.com/imay,
>>buaa.zhaoc@gmail.com<mailto:buaa.zhaoc@gmail.com>)
>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> * De Li(https://github.com/lide-reed,
>>mailtolide@sina.com)<mailto:mailtolide@sina.com%EF%BC%89>
>> * Hao Chen (https://github.com/chenhao7253886,
>>chenhao16@baidu.com<mailto:chenhao16@baidu.com>)
>> * Chaoyong Li (https://github.com/cyongli,
>>lichaoyong@baidu.com<mailto:lichaoyong@baidu.com>)
>> * Bin Lin (https://github.com/lingbin,
>>lingbinlb@gmail.com<mailto:lingbinlb@gmail.com>)
>> 
>> ###Alignment
>> 
>> Palo is related to several other Apache projects:
>> 
>> * Palo can also read data stored in Apache Hadoop clusters powered by
>>the HDFS filesystem.
>> * Palo is closely integrated with Impala, which is also being proposed
>>to the Incubator.
>> * Palo uses Apache Thrift as its RPC and serialization framework of
>>choice.
>> 
>> ##Known Risks
>> 
>> ###Orphaned Products
>> 
>> The core developers of Palo team plan to work full time on this
>>project. There is very little risk of Palo getting orphaned since at
>>least one large company (Baidu) is extensively using it in their
>>production. For example, currently there are more than 200 use cases
>>using Palo in production. Furthermore, since Palo was open sourced at
>>the beginning of October 2017, it has received more than 660 stars and
>>been forked nearly 170 times. We plan to extend and diversify this
>>community further through Apache.
>> 
>> ###Inexperience with Open Source
>> 
>> The core developers are all active users and followers of open source.
>>They are already committers and contributors to the Palo Github project.
>>All have been involved with the source code that has been released under
>>an open source license, and several of them also have experience
>>developing code in an open source environment. Though the core set of
>>Developers do not have Apache Open Source experience, there are plans to
>>onboard individuals with Apache open source experience on to the project.
>> 
>> ###Homogenous Developers
>> 
>> The most of core developers are from Baidu, but after Palo was open
>>sourced, Palo received a lot of bug fixes and enhancements from other
>>developers not working at Baidu.
>> 
>> ###Reliance on Salaried Developers
>> 
>> Baidu invested in Palo as the OLAP solution and some of its key
>>engineers are working full time on the project. In addition, since there
>>is a growing Big Data need for scalable OLAP solutions, we look forward
>>to other Apache developers and researchers to contribute to the project.
>>Also key to addressing the risk associated with relying on Salaried
>>developers from a single entity is to increase the diversity of the
>>contributors and actively lobby for Domain experts in the BI space to
>>contribute. Apache Palo intends to do this.
>> 
>> ###An Excessive Fascination with the Apache Brand
>> 
>> Palo is proposing to enter incubation at Apache in order to help
>>efforts to diversify the committer-base, not so much to capitalize on
>>the Apache brand. The Palo project is in production use already inside
>>Baidu, but is not expected to be an Baidu product for external
>>customers. As such, the Palo project is not seeking to use the Apache
>>brand as a marketing tool.
>> 
>> ##Documentation
>> 
>> Information about Palo can be found at https://github.com/baidu/palo.
>>The following links provide more information about Palo in open source:
>> 
>> * Palo wiki site: https://github.com/baidu/palo/wiki
>> * Codebase at Github: https://github.com/baidu/palo
>> * Issue Tracking: https://github.com/baidu/palo/issues
>> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>> 
>> ##Initial Source
>> 
>> Palo has been under development since 2017 by a team of engineers at
>>Baidu Inc. It is currently hosted on Github.com under an Apache license
>>at https://github.com/baidu/palo.
>> 
>> ##External Dependencies
>> 
>> Palo has the following external dependencies.
>> 
>> * Google gflags (BSD)
>> * Google glog (BSD)
>> * Apache Thrift (Apache Software License v2.0)
>> * Apache Commons (Apache Software License v2.0)
>> * Boost (Boost Software License)
>> * OpenLdap (OpenLDAP Software License)
>> * rapidjson (Tencent)
>> * Google RE2 (BSD-style)
>> * lz4 (BSD)
>> * snappy (BSD)
>> * cyrus-sasl (CMU License)
>> * Twitter Bootstrap (Apache Software License v2.0)
>> * d3 (BSD)
>> * LLVM (BSD-like)
>> 
>> Build and test dependencies:
>> 
>> * ant (Apache Software License v2.0)
>> * Apache Maven (Apache Software License v2.0)
>> * cmake (BSD)
>> * clang (BSD)
>> * Google gtest (Apache Software License v2.0)
>> 
>> ##Required Resources
>> 
>> ###Mailing List
>> 
>> There are currently no mailing lists. The usual mailing lists are
>>expected to be set up when entering incubation:
>> 
>> 
>>private@palo.incubator.apache.org<mailto:private@palo.incubator.apache.or
>>g>
>> dev@palo.incubator.apache.org<mailto:dev@palo.incubator.apache.org>
>> 
>>commits@palo.incubator.apache.org<mailto:commits@palo.incubator.apache.or
>>g>
>> 
>> ###Subversion Directory
>> 
>> Upon entering incubation: https://github.com/baidu/palo.
>> After incubation, we want to move the existing repo from
>>https://github.com/baidu/palo to Apache infrastructure.
>> 
>> ###Issue Tracking
>> 
>> Palo currently uses GitHub to track issues. Would like to continue to
>>do so while we discuss migration possibilities with the ASF Infra
>>committee.
>> 
>> ###Other Resources
>> 
>> The existing code already has unit tests so we will make use of
>>existing Apache continuous testing infrastructure. The resulting load
>>should not be very large.
>> 
>> ##Initial Committers
>> 
>> * Ruyue Ma (https://github.com/maruyue,
>>maruyue@baidu.com<mailto:maruyue@baidu.com>)
>> * Chun Zhao (https://github.com/imay,
>>buaa.zhaoc@gmail.com<mailto:buaa.zhaoc@gmail.com>)
>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> * De Li(https://github.com/lide-reed,
>>mailtolide@sina.com)<mailto:mailtolide@sina.com%EF%BC%89>
>> * Hao Chen (https://github.com/chenhao7253886,
>>chenhao16@baidu.com<mailto:chenhao16@baidu.com>)
>> * Chaoyong Li (https://github.com/cyongli,
>>lichaoyong@baidu.com<mailto:lichaoyong@baidu.com>)
>> * Bin Lin (https://github.com/lingbin,
>>lingbinlb@gmail.com<mailto:lingbinlb@gmail.com>)
>> 
>> ##Affiliations
>> 
>> The initial committers are employees of Baidu Inc.. The nominated
>>mentors are employees of TODO.
>> 
>> ##Sponsors
>> 
>> ###Champion
>> 
>> TODO
>> 
>> ###Nominated Mentors
>> 
>> * sijie guo, guosijie@gmail.com<mailto:guosijie@gmail.com>
>> * Luke Han, lukehan@apache.org<mailto:lukehan@apache.org>
>> * Zheng Shao, zshao@apache.org<mailto:zshao@apache.org>
>> 
>> ###Sponsoring Entity
>> 
>> We are requesting the Incubator to sponsor this project.
>> 
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>For additional commands, e-mail: general-help@incubator.apache.org
>

Mime
View raw message