incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: Looking for Champion
Date Tue, 12 Jun 2018 20:20:24 GMT
Note that there is an existing database product called Palo - an open source OLAP engine by
German company Jedox[1]. There there is a high likelihood that Palo would have to change its
name during incubation, if accepted.

Julian

[1] https://en.wikipedia.org/wiki/Palo_(OLAP_database) <https://en.wikipedia.org/wiki/Palo_(OLAP_database)>



> On Jun 10, 2018, at 3:49 AM, Han Luke <luke.hq@gmail.com> wrote:
> 
> Cool Dave, it’s great to have you to be the campaign.
> 
> 
> ________________________________
> From: Tan,Zhongyi <tanzhongyi@baidu.com <mailto:tanzhongyi@baidu.com>>
> Sent: Saturday, June 9, 2018 8:16:28 AM
> To: general@incubator.apache.org <mailto:general@incubator.apache.org>
> Subject: Re: Looking for Champion
> 
> thanks,willem
> 
> we are very appreciate.
> 
>> 在 2018年6月8日,23:03,Willem Jiang <willem.jiang@gmail.com> 写道:
>> 
>> Hi,
>> 
>> I'm willing to be the Mentor.
>> Please count me in.
>> 
>> 
>> 
>> Willem Jiang
>> 
>> Twitter: willemjiang
>> Weibo: 姜宁willem
>> 
>>> On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <dave2wave@comcast.net> wrote:
>>> 
>>> Hi -
>>> 
>>> I’m willing to Champion and Mentor. I have a couple of comments inline.
>>> I’ll look at dependency licenses later today. It’s early for me.
>>> 
>>> 
>>>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <lide@baidu.com> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>>> interactive SQL data warehousing).
>>>> https://github.com/baidu/palo/wiki/Palo-Overview
>>>> 
>>>> We propose to contribute Palo as an Apache Incubator project, and
>>>> we are still looking for possible Champion if anyone would like to
>>> volunteer. Thanks a lot.
>>>> 
>>>> Best Regards,
>>>> Reed
>>>> 
>>>> ===================
>>>> The draft of the proposal as below:
>>>> 
>>>> #Apache Palo
>>>> 
>>>> ##Abstract
>>>> 
>>>> Palo is a MPP-based interactive SQL data warehousing for reporting and
>>> analysis.
>>>> 
>>>> ##Proposal
>>>> 
>>>> We propose to contribute the Palo codebase and associated artifacts
>>> (e.g. documentation, web-site content etc.) to the Apache Software
>>> Foundation with the intent of forming a productive, meritocratic and open
>>> community around Palo’s continued development, according to the ‘Apache
>>> Way’.
>>>> 
>>>> Baidu owns several trademarks regarding Palo, and proposes to transfer
>>> ownership of those trademarks in full to the ASF.
>>>> 
>>>> ###Overview of Palo
>>>> 
>>>> Palo’s implementation consists of two daemons: Frontend (FE) and Backend
>>> (BE).
>>>> 
>>>> **Frontend daemon** consists of query coordinator and catalog manager.
>>> Query coordinator is responsible for receiving users’ sql queries,
>>> compiling queries and managing queries execution. Catalog manager is
>>> responsible for managing metadata such as databases, tables, partitions,
>>> replicas and etc. Several frontend daemons could be deployed to guarantee
>>> fault-tolerance, and load balancing.
>>>> 
>>>> **Backend daemon** stores the data and executes the query fragments.
>>> Many backend daemons could also be deployed to provide scalability and
>>> fault-tolerance.
>>>> 
>>>> A typical Palo cluster generally composes of several frontend daemons
>>> and dozens to hundreds of backend daemons.
>>>> 
>>>> Users can use MySQL client tools to connect any frontend daemon to
>>> submit SQL query. Frontend receives the query and compiles it into query
>>> plans executable by the Backend. Then Frontend sends the query plan
>>> fragments to Backend. Backend will build a query execution DAG. Data is
>>> fetched and pipelined into the DAG. The final result response is sent to
>>> client via Frontend. The distribution of query fragment execution takes
>>> minimizing data movement and maximizing scan locality as the main goal.
>>>> 
>>>> ##Background
>>>> 
>>>> At Baidu, Prior to Palo, different tools were deployed to solve diverse
>>> requirements in many ways. And when a use case requires the simultaneous
>>> availability of capabilities that cannot all be provided by a single tool,
>>> users were forced to build hybrid architectures that stitch multiple tools
>>> together, but we believe that they shouldn’t need to accept such inherent
>>> complexity. A storage system built to provide great performance across a
>>> broad range of workloads provides a more elegant solution to the problems
>>> that hybrid architectures aim to solve. Palo is the solution.
>>>> 
>>>> Palo is designed to be a simple and single tightly coupled system, not
>>> depending on other systems. Palo provides high concurrent low latency point
>>> query performance, but also provides high throughput queries of ad-hoc
>>> analysis. Palo provides bulk-batch data loading, but also provides near
>>> real-time mini-batch data loading. Palo also provides high availability,
>>> reliability, fault tolerance, and scalability.
>>>> 
>>>> ##Rationale
>>>> 
>>>> Palo mainly integrates the technology of Google Mesa and Apache Impala.
>>>> 
>>>> Mesa is a highly scalable analytic data storage system that stores
>>> critical measurement data related to Google's Internet advertising
>>> business. Mesa is designed to satisfy complex and challenging set of users’
>>> and systems’ requirements, including near real-time data ingestion and
>>> query ability, as well as high availability, reliability, fault tolerance,
>>> and scalability for large data and query volumes.
>>>> 
>>>> Impala is a modern, open-source MPP SQL engine architected from the
>>> ground up for the Hadoop data processing environment. At present, by virtue
>>> of its superior performance and rich functionality, Impala has been
>>> comparable to many commercial MPP database query engine. Mesa can satisfy
>>> the needs of many of our storage requirements, however Mesa itself does not
>>> provide a SQL query engine; Impala is a very good MPP SQL query engine, but
>>> the lack of a perfect distributed storage engine. So in the end we chose
>>> the combination of these two technologies.
>>>> 
>>>> Learning from Mesa’s data model, we developed a distributed storage
>>> engine. Unlike Mesa, this storage engine does not rely on any distributed
>>> file system. Then we deeply integrate this storage engine with Impala query
>>> engine. Query compiling, query execution coordination and catalog
>>> management of storage engine are integrated to be frontend daemon; query
>>> execution and data storage are integrated to be backend daemon. With this
>>> integration, we implemented a single, full-featured, high performance state
>>> the art of MPP database, as well as maintaining the simplicity.
>>>> 
>>>> ##Current Status
>>>> 
>>>> Palo has been an open source project on GitHub (
>>> https://github.com/baidu/palo).
>>>> 
>>>> ###Meritocracy
>>>> 
>>>> Palo has been deployed in production at Baidu and is applying more than
>>> 200 lines of business. It has demonstrated great performance benefits and
>>> has proved to be a better way for reporting and analysis based big data.
>>> Still We look forward to growing a rich user and developer community.
>>>> 
>>>> ###Community
>>>> 
>>>> Palo seeks to develop developer and user communities during incubation.
>>>> 
>>>> ###Core Developers
>>>> 
>>>> * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<mailto:maruy
>>> ue@baidu.com>)
>>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>>> aa.zhaoc@gmail.com>)
>>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>>>> * De Li(https://github.com/lide-reed, mailtolide@sina.com)<mailto:ma
>>> iltolide@sina.com%EF%BC%89>
>>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>>> <mailto:chenhao16@baidu.com>)
>>>> * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<mailto:
>>> lichaoyong@baidu.com>)
>>>> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
>>> gbinlb@gmail.com>)
>>>> 
>>>> ###Alignment
>>>> 
>>>> Palo is related to several other Apache projects:
>>>> 
>>>> * Palo can also read data stored in Apache Hadoop clusters powered by
>>> the HDFS filesystem.
>>>> * Palo is closely integrated with Impala, which is also being proposed
>>> to the Incubator.
>>> 
>>> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>>> 
>>>> * Palo uses Apache Thrift as its RPC and serialization framework of
>>> choice.
>>>> 
>>>> ##Known Risks
>>>> 
>>>> ###Orphaned Products
>>>> 
>>>> The core developers of Palo team plan to work full time on this project.
>>> There is very little risk of Palo getting orphaned since at least one large
>>> company (Baidu) is extensively using it in their production. For example,
>>> currently there are more than 200 use cases using Palo in production.
>>> Furthermore, since Palo was open sourced at the beginning of October 2017,
>>> it has received more than 660 stars and been forked nearly 170 times. We
>>> plan to extend and diversify this community further through Apache.
>>>> 
>>>> ###Inexperience with Open Source
>>>> 
>>>> The core developers are all active users and followers of open source.
>>> They are already committers and contributors to the Palo Github project.
>>> All have been involved with the source code that has been released under an
>>> open source license, and several of them also have experience developing
>>> code in an open source environment. Though the core set of Developers do
>>> not have Apache Open Source experience, there are plans to onboard
>>> individuals with Apache open source experience on to the project.
>>>> 
>>>> ###Homogenous Developers
>>>> 
>>>> The most of core developers are from Baidu, but after Palo was open
>>> sourced, Palo received a lot of bug fixes and enhancements from other
>>> developers not working at Baidu.
>>>> 
>>>> ###Reliance on Salaried Developers
>>>> 
>>>> Baidu invested in Palo as the OLAP solution and some of its key
>>> engineers are working full time on the project. In addition, since there is
>>> a growing Big Data need for scalable OLAP solutions, we look forward to
>>> other Apache developers and researchers to contribute to the project. Also
>>> key to addressing the risk associated with relying on Salaried developers
>>> from a single entity is to increase the diversity of the contributors and
>>> actively lobby for Domain experts in the BI space to contribute. Apache
>>> Palo intends to do this.
>>>> 
>>>> ###An Excessive Fascination with the Apache Brand
>>>> 
>>>> Palo is proposing to enter incubation at Apache in order to help efforts
>>> to diversify the committer-base, not so much to capitalize on the Apache
>>> brand. The Palo project is in production use already inside Baidu, but is
>>> not expected to be an Baidu product for external customers. As such, the
>>> Palo project is not seeking to use the Apache brand as a marketing tool.
>>>> 
>>>> ##Documentation
>>>> 
>>>> Information about Palo can be found at https://github.com/baidu/palo.
>>> The following links provide more information about Palo in open source:
>>>> 
>>>> * Palo wiki site: https://github.com/baidu/palo/wiki
>>>> * Codebase at Github: https://github.com/baidu/palo
>>>> * Issue Tracking: https://github.com/baidu/palo/issues
>>>> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>>>> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>>>> 
>>>> ##Initial Source
>>>> 
>>>> Palo has been under development since 2017 by a team of engineers at
>>> Baidu Inc. It is currently hosted on Github.com under an Apache license at
>>> https://github.com/baidu/palo.
>>>> 
>>>> ##External Dependencies
>>>> 
>>>> Palo has the following external dependencies.
>>>> 
>>>> * Google gflags (BSD)
>>>> * Google glog (BSD)
>>>> * Apache Thrift (Apache Software License v2.0)
>>>> * Apache Commons (Apache Software License v2.0)
>>>> * Boost (Boost Software License)
>>>> * OpenLdap (OpenLDAP Software License)
>>>> * rapidjson (Tencent)
>>>> * Google RE2 (BSD-style)
>>>> * lz4 (BSD)
>>>> * snappy (BSD)
>>>> * cyrus-sasl (CMU License)
>>>> * Twitter Bootstrap (Apache Software License v2.0)
>>>> * d3 (BSD)
>>>> * LLVM (BSD-like)
>>>> 
>>>> Build and test dependencies:
>>>> 
>>>> * ant (Apache Software License v2.0)
>>>> * Apache Maven (Apache Software License v2.0)
>>>> * cmake (BSD)
>>>> * clang (BSD)
>>>> * Google gtest (Apache Software License v2.0)
>>>> 
>>>> ##Required Resources
>>>> 
>>>> ###Mailing List
>>>> 
>>>> There are currently no mailing lists. The usual mailing lists are
>>> expected to be set up when entering incubation:
>>>> 
>>>> private@palo.incubator.apache.org<mailto:private@palo.
>>> incubator.apache.org>
>>>> dev@palo.incubator.apache.org<mailto:dev@palo.incubator.apache.org>
>>>> commits@palo.incubator.apache.org<mailto:commits@palo.
>>> incubator.apache.org>
>>>> 
>>>> ###Subversion Directory
>>>> 
>>>> Upon entering incubation: https://github.com/baidu/palo.
>>>> After incubation, we want to move the existing repo from
>>> https://github.com/baidu/palo to Apache infrastructure.
>>>> 
>>>> ###Issue Tracking
>>>> 
>>>> Palo currently uses GitHub to track issues. Would like to continue to do
>>> so while we discuss migration possibilities with the ASF Infra committee.
>>>> 
>>>> ###Other Resources
>>>> 
>>>> The existing code already has unit tests so we will make use of existing
>>> Apache continuous testing infrastructure. The resulting load should not be
>>> very large.
>>>> 
>>>> ##Initial Committers
>>>> 
>>>> * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<mailto:maruy
>>> ue@baidu.com>)
>>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>>> aa.zhaoc@gmail.com>)
>>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>>>> * De Li(https://github.com/lide-reed, mailtolide@sina.com)<mailto:ma
>>> iltolide@sina.com%EF%BC%89>
>>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>>> <mailto:chenhao16@baidu.com>)
>>>> * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<mailto:
>>> lichaoyong@baidu.com>)
>>>> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
>>> gbinlb@gmail.com>)
>>>> 
>>>> ##Affiliations
>>>> 
>>>> The initial committers are employees of Baidu Inc.. The nominated
>>> mentors are employees of TODO.
>>>> 
>>>> ##Sponsors
>>>> 
>>>> ###Champion
>>>> 
>>>> TODO
>>>> 
>>>> ###Nominated Mentors
>>>> 
>>>> * sijie guo, guosijie@gmail.com<mailto:guosijie@gmail.com>
>>>> * Luke Han, lukehan@apache.org<mailto:lukehan@apache.org>
>>>> * Zheng Shao, zshao@apache.org<mailto:zshao@apache.org>
>>> 
>>> Mentors must be members of the IPMC and almost always Members of the ASF.
>>> 
>>> At this moment only Luke Han is qualified.
>>> 
>>> Regards,
>>> Dave
>>> 
>>>> 
>>>> ###Sponsoring Entity
>>>> 
>>>> We are requesting the Incubator to sponsor this project.
>>> 
>>> 
> B婯KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB??[溳X溫軞X橩K[XZ[?賉橽榌
][溳X溫軞X橮[樰X榏軏榎X?K涇櫭B憶軋Y][蹣[圹[X[??K[XZ[?賉橽榌 Z[[樰X榏軏榎X?K涇櫭B


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message