incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li,De(BDG)" <l...@baidu.com>
Subject Re: Looking for Champion
Date Wed, 13 Jun 2018 00:54:39 GMT
Hi Julian,

Thank you.

It looks like that we have to find another one.
If anyone has a good name, please feel free to let me know.

Best Regards,
Reed

在 2018/6/13 上午4:20, "Julian Hyde" <jhyde@apache.org> 写入:

>Note that there is an existing database product called Palo - an open
>source OLAP engine by German company Jedox[1]. There there is a high
>likelihood that Palo would have to change its name during incubation, if
>accepted.
>
>Julian
>
>[1] https://en.wikipedia.org/wiki/Palo_(OLAP_database)
><https://en.wikipedia.org/wiki/Palo_(OLAP_database)>
>
>
>
>> On Jun 10, 2018, at 3:49 AM, Han Luke <luke.hq@gmail.com> wrote:
>> 
>> Cool Dave, it’s great to have you to be the campaign.
>> 
>> 
>> ________________________________
>> From: Tan,Zhongyi <tanzhongyi@baidu.com <mailto:tanzhongyi@baidu.com>>
>> Sent: Saturday, June 9, 2018 8:16:28 AM
>> To: general@incubator.apache.org <mailto:general@incubator.apache.org>
>> Subject: Re: Looking for Champion
>> 
>> thanks,willem
>> 
>> we are very appreciate.
>> 
>>> 在 2018年6月8日,23:03,Willem Jiang <willem.jiang@gmail.com> 写道:
>>> 
>>> Hi,
>>> 
>>> I'm willing to be the Mentor.
>>> Please count me in.
>>> 
>>> 
>>> 
>>> Willem Jiang
>>> 
>>> Twitter: willemjiang
>>> Weibo: 姜宁willem
>>> 
>>>> On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <dave2wave@comcast.net>
>>>>wrote:
>>>> 
>>>> Hi -
>>>> 
>>>> I’m willing to Champion and Mentor. I have a couple of comments
>>>>inline.
>>>> I’ll look at dependency licenses later today. It’s early for me.
>>>> 
>>>> 
>>>>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <lide@baidu.com> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>>>> interactive SQL data warehousing).
>>>>> https://github.com/baidu/palo/wiki/Palo-Overview
>>>>> 
>>>>> We propose to contribute Palo as an Apache Incubator project, and
>>>>> we are still looking for possible Champion if anyone would like to
>>>> volunteer. Thanks a lot.
>>>>> 
>>>>> Best Regards,
>>>>> Reed
>>>>> 
>>>>> ===================
>>>>> The draft of the proposal as below:
>>>>> 
>>>>> #Apache Palo
>>>>> 
>>>>> ##Abstract
>>>>> 
>>>>> Palo is a MPP-based interactive SQL data warehousing for reporting
>>>>>and
>>>> analysis.
>>>>> 
>>>>> ##Proposal
>>>>> 
>>>>> We propose to contribute the Palo codebase and associated artifacts
>>>> (e.g. documentation, web-site content etc.) to the Apache Software
>>>> Foundation with the intent of forming a productive, meritocratic and
>>>>open
>>>> community around Palo’s continued development, according to the
>>>>‘Apache
>>>> Way’.
>>>>> 
>>>>> Baidu owns several trademarks regarding Palo, and proposes to
>>>>>transfer
>>>> ownership of those trademarks in full to the ASF.
>>>>> 
>>>>> ###Overview of Palo
>>>>> 
>>>>> Palo’s implementation consists of two daemons: Frontend (FE) and
>>>>>Backend
>>>> (BE).
>>>>> 
>>>>> **Frontend daemon** consists of query coordinator and catalog
>>>>>manager.
>>>> Query coordinator is responsible for receiving users’ sql queries,
>>>> compiling queries and managing queries execution. Catalog manager is
>>>> responsible for managing metadata such as databases, tables,
>>>>partitions,
>>>> replicas and etc. Several frontend daemons could be deployed to
>>>>guarantee
>>>> fault-tolerance, and load balancing.
>>>>> 
>>>>> **Backend daemon** stores the data and executes the query fragments.
>>>> Many backend daemons could also be deployed to provide scalability and
>>>> fault-tolerance.
>>>>> 
>>>>> A typical Palo cluster generally composes of several frontend daemons
>>>> and dozens to hundreds of backend daemons.
>>>>> 
>>>>> Users can use MySQL client tools to connect any frontend daemon to
>>>> submit SQL query. Frontend receives the query and compiles it into
>>>>query
>>>> plans executable by the Backend. Then Frontend sends the query plan
>>>> fragments to Backend. Backend will build a query execution DAG. Data
>>>>is
>>>> fetched and pipelined into the DAG. The final result response is sent
>>>>to
>>>> client via Frontend. The distribution of query fragment execution
>>>>takes
>>>> minimizing data movement and maximizing scan locality as the main
>>>>goal.
>>>>> 
>>>>> ##Background
>>>>> 
>>>>> At Baidu, Prior to Palo, different tools were deployed to solve
>>>>>diverse
>>>> requirements in many ways. And when a use case requires the
>>>>simultaneous
>>>> availability of capabilities that cannot all be provided by a single
>>>>tool,
>>>> users were forced to build hybrid architectures that stitch multiple
>>>>tools
>>>> together, but we believe that they shouldn’t need to accept such
>>>>inherent
>>>> complexity. A storage system built to provide great performance
>>>>across a
>>>> broad range of workloads provides a more elegant solution to the
>>>>problems
>>>> that hybrid architectures aim to solve. Palo is the solution.
>>>>> 
>>>>> Palo is designed to be a simple and single tightly coupled system,
>>>>>not
>>>> depending on other systems. Palo provides high concurrent low latency
>>>>point
>>>> query performance, but also provides high throughput queries of ad-hoc
>>>> analysis. Palo provides bulk-batch data loading, but also provides
>>>>near
>>>> real-time mini-batch data loading. Palo also provides high
>>>>availability,
>>>> reliability, fault tolerance, and scalability.
>>>>> 
>>>>> ##Rationale
>>>>> 
>>>>> Palo mainly integrates the technology of Google Mesa and Apache
>>>>>Impala.
>>>>> 
>>>>> Mesa is a highly scalable analytic data storage system that stores
>>>> critical measurement data related to Google's Internet advertising
>>>> business. Mesa is designed to satisfy complex and challenging set of
>>>>users’
>>>> and systems’ requirements, including near real-time data ingestion and
>>>> query ability, as well as high availability, reliability, fault
>>>>tolerance,
>>>> and scalability for large data and query volumes.
>>>>> 
>>>>> Impala is a modern, open-source MPP SQL engine architected from the
>>>> ground up for the Hadoop data processing environment. At present, by
>>>>virtue
>>>> of its superior performance and rich functionality, Impala has been
>>>> comparable to many commercial MPP database query engine. Mesa can
>>>>satisfy
>>>> the needs of many of our storage requirements, however Mesa itself
>>>>does not
>>>> provide a SQL query engine; Impala is a very good MPP SQL query
>>>>engine, but
>>>> the lack of a perfect distributed storage engine. So in the end we
>>>>chose
>>>> the combination of these two technologies.
>>>>> 
>>>>> Learning from Mesa’s data model, we developed a distributed storage
>>>> engine. Unlike Mesa, this storage engine does not rely on any
>>>>distributed
>>>> file system. Then we deeply integrate this storage engine with Impala
>>>>query
>>>> engine. Query compiling, query execution coordination and catalog
>>>> management of storage engine are integrated to be frontend daemon;
>>>>query
>>>> execution and data storage are integrated to be backend daemon. With
>>>>this
>>>> integration, we implemented a single, full-featured, high performance
>>>>state
>>>> the art of MPP database, as well as maintaining the simplicity.
>>>>> 
>>>>> ##Current Status
>>>>> 
>>>>> Palo has been an open source project on GitHub (
>>>> https://github.com/baidu/palo).
>>>>> 
>>>>> ###Meritocracy
>>>>> 
>>>>> Palo has been deployed in production at Baidu and is applying more
>>>>>than
>>>> 200 lines of business. It has demonstrated great performance benefits
>>>>and
>>>> has proved to be a better way for reporting and analysis based big
>>>>data.
>>>> Still We look forward to growing a rich user and developer community.
>>>>> 
>>>>> ###Community
>>>>> 
>>>>> Palo seeks to develop developer and user communities during
>>>>>incubation.
>>>>> 
>>>>> ###Core Developers
>>>>> 
>>>>> * Ruyue Ma (https://github.com/maruyue,
>>>>>maruyue@baidu.com<mailto:maruy
>>>> ue@baidu.com>)
>>>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>>>> aa.zhaoc@gmail.com>)
>>>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>>>>> * De Li(https://github.com/lide-reed, mailtolide@sina.com)<mailto:ma
>>>> iltolide@sina.com%EF%BC%89>
>>>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>>>> <mailto:chenhao16@baidu.com>)
>>>>> * Chaoyong Li (https://github.com/cyongli,
>>>>>lichaoyong@baidu.com<mailto:
>>>> lichaoyong@baidu.com>)
>>>>> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
>>>> gbinlb@gmail.com>)
>>>>> 
>>>>> ###Alignment
>>>>> 
>>>>> Palo is related to several other Apache projects:
>>>>> 
>>>>> * Palo can also read data stored in Apache Hadoop clusters powered by
>>>> the HDFS filesystem.
>>>>> * Palo is closely integrated with Impala, which is also being
>>>>>proposed
>>>> to the Incubator.
>>>> 
>>>> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>>>> 
>>>>> * Palo uses Apache Thrift as its RPC and serialization framework of
>>>> choice.
>>>>> 
>>>>> ##Known Risks
>>>>> 
>>>>> ###Orphaned Products
>>>>> 
>>>>> The core developers of Palo team plan to work full time on this
>>>>>project.
>>>> There is very little risk of Palo getting orphaned since at least one
>>>>large
>>>> company (Baidu) is extensively using it in their production. For
>>>>example,
>>>> currently there are more than 200 use cases using Palo in production.
>>>> Furthermore, since Palo was open sourced at the beginning of October
>>>>2017,
>>>> it has received more than 660 stars and been forked nearly 170 times.
>>>>We
>>>> plan to extend and diversify this community further through Apache.
>>>>> 
>>>>> ###Inexperience with Open Source
>>>>> 
>>>>> The core developers are all active users and followers of open
>>>>>source.
>>>> They are already committers and contributors to the Palo Github
>>>>project.
>>>> All have been involved with the source code that has been released
>>>>under an
>>>> open source license, and several of them also have experience
>>>>developing
>>>> code in an open source environment. Though the core set of Developers
>>>>do
>>>> not have Apache Open Source experience, there are plans to onboard
>>>> individuals with Apache open source experience on to the project.
>>>>> 
>>>>> ###Homogenous Developers
>>>>> 
>>>>> The most of core developers are from Baidu, but after Palo was open
>>>> sourced, Palo received a lot of bug fixes and enhancements from other
>>>> developers not working at Baidu.
>>>>> 
>>>>> ###Reliance on Salaried Developers
>>>>> 
>>>>> Baidu invested in Palo as the OLAP solution and some of its key
>>>> engineers are working full time on the project. In addition, since
>>>>there is
>>>> a growing Big Data need for scalable OLAP solutions, we look forward
>>>>to
>>>> other Apache developers and researchers to contribute to the project.
>>>>Also
>>>> key to addressing the risk associated with relying on Salaried
>>>>developers
>>>> from a single entity is to increase the diversity of the contributors
>>>>and
>>>> actively lobby for Domain experts in the BI space to contribute.
>>>>Apache
>>>> Palo intends to do this.
>>>>> 
>>>>> ###An Excessive Fascination with the Apache Brand
>>>>> 
>>>>> Palo is proposing to enter incubation at Apache in order to help
>>>>>efforts
>>>> to diversify the committer-base, not so much to capitalize on the
>>>>Apache
>>>> brand. The Palo project is in production use already inside Baidu,
>>>>but is
>>>> not expected to be an Baidu product for external customers. As such,
>>>>the
>>>> Palo project is not seeking to use the Apache brand as a marketing
>>>>tool.
>>>>> 
>>>>> ##Documentation
>>>>> 
>>>>> Information about Palo can be found at https://github.com/baidu/palo.
>>>> The following links provide more information about Palo in open
>>>>source:
>>>>> 
>>>>> * Palo wiki site: https://github.com/baidu/palo/wiki
>>>>> * Codebase at Github: https://github.com/baidu/palo
>>>>> * Issue Tracking: https://github.com/baidu/palo/issues
>>>>> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>>>>> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>>>>> 
>>>>> ##Initial Source
>>>>> 
>>>>> Palo has been under development since 2017 by a team of engineers at
>>>> Baidu Inc. It is currently hosted on Github.com under an Apache
>>>>license at
>>>> https://github.com/baidu/palo.
>>>>> 
>>>>> ##External Dependencies
>>>>> 
>>>>> Palo has the following external dependencies.
>>>>> 
>>>>> * Google gflags (BSD)
>>>>> * Google glog (BSD)
>>>>> * Apache Thrift (Apache Software License v2.0)
>>>>> * Apache Commons (Apache Software License v2.0)
>>>>> * Boost (Boost Software License)
>>>>> * OpenLdap (OpenLDAP Software License)
>>>>> * rapidjson (Tencent)
>>>>> * Google RE2 (BSD-style)
>>>>> * lz4 (BSD)
>>>>> * snappy (BSD)
>>>>> * cyrus-sasl (CMU License)
>>>>> * Twitter Bootstrap (Apache Software License v2.0)
>>>>> * d3 (BSD)
>>>>> * LLVM (BSD-like)
>>>>> 
>>>>> Build and test dependencies:
>>>>> 
>>>>> * ant (Apache Software License v2.0)
>>>>> * Apache Maven (Apache Software License v2.0)
>>>>> * cmake (BSD)
>>>>> * clang (BSD)
>>>>> * Google gtest (Apache Software License v2.0)
>>>>> 
>>>>> ##Required Resources
>>>>> 
>>>>> ###Mailing List
>>>>> 
>>>>> There are currently no mailing lists. The usual mailing lists are
>>>> expected to be set up when entering incubation:
>>>>> 
>>>>> private@palo.incubator.apache.org<mailto:private@palo.
>>>> incubator.apache.org>
>>>>> dev@palo.incubator.apache.org<mailto:dev@palo.incubator.apache.org>
>>>>> commits@palo.incubator.apache.org<mailto:commits@palo.
>>>> incubator.apache.org>
>>>>> 
>>>>> ###Subversion Directory
>>>>> 
>>>>> Upon entering incubation: https://github.com/baidu/palo.
>>>>> After incubation, we want to move the existing repo from
>>>> https://github.com/baidu/palo to Apache infrastructure.
>>>>> 
>>>>> ###Issue Tracking
>>>>> 
>>>>> Palo currently uses GitHub to track issues. Would like to continue
>>>>>to do
>>>> so while we discuss migration possibilities with the ASF Infra
>>>>committee.
>>>>> 
>>>>> ###Other Resources
>>>>> 
>>>>> The existing code already has unit tests so we will make use of
>>>>>existing
>>>> Apache continuous testing infrastructure. The resulting load should
>>>>not be
>>>> very large.
>>>>> 
>>>>> ##Initial Committers
>>>>> 
>>>>> * Ruyue Ma (https://github.com/maruyue,
>>>>>maruyue@baidu.com<mailto:maruy
>>>> ue@baidu.com>)
>>>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>>>> aa.zhaoc@gmail.com>)
>>>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>>>>> * De Li(https://github.com/lide-reed, mailtolide@sina.com)<mailto:ma
>>>> iltolide@sina.com%EF%BC%89>
>>>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>>>> <mailto:chenhao16@baidu.com>)
>>>>> * Chaoyong Li (https://github.com/cyongli,
>>>>>lichaoyong@baidu.com<mailto:
>>>> lichaoyong@baidu.com>)
>>>>> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
>>>> gbinlb@gmail.com>)
>>>>> 
>>>>> ##Affiliations
>>>>> 
>>>>> The initial committers are employees of Baidu Inc.. The nominated
>>>> mentors are employees of TODO.
>>>>> 
>>>>> ##Sponsors
>>>>> 
>>>>> ###Champion
>>>>> 
>>>>> TODO
>>>>> 
>>>>> ###Nominated Mentors
>>>>> 
>>>>> * sijie guo, guosijie@gmail.com<mailto:guosijie@gmail.com>
>>>>> * Luke Han, lukehan@apache.org<mailto:lukehan@apache.org>
>>>>> * Zheng Shao, zshao@apache.org<mailto:zshao@apache.org>
>>>> 
>>>> Mentors must be members of the IPMC and almost always Members of the
>>>>ASF.
>>>> 
>>>> At this moment only Luke Han is qualified.
>>>> 
>>>> Regards,
>>>> Dave
>>>> 
>>>>> 
>>>>> ###Sponsoring Entity
>>>>> 
>>>>> We are requesting the Incubator to sponsor this project.
>>>> 
>>>> 
>> ?B婯
>>KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB??[溳
>>X溫軞X橩??K[XZ[??賉橽榌 ][溳X溫軞X橮?[樰X榏?軏榎?X?K涇櫭B憶軋?Y??]?[蹣[??
圹[X[???K[XZ[??賉橽榌 
>>Z?[???[樰X榏?軏榎?X?K涇櫭B
>

Mime
View raw message