incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Apple <jbap...@cloudera.com.INVALID>
Subject Re: Looking for Champion
Date Mon, 18 Jun 2018 18:51:36 GMT
I'm not a binding vote on incubator entry, but I think it would be
great to have roadmaps as soon as feasible on addressing Tim's concern
(which is deeply related to #2, "Licensing") and on addressing the
code and toil duplication.

On Mon, Jun 18, 2018 at 11:08 AM, Dave Fisher <dave2wave@comcast.net> wrote:
> Hi Li,De -
>
> Since I agreed to champion this project I think that we need a summary about
> what the Incubator PMC cares about in order to accept a podling. What the
> prospective project needs to address. We also need to be clear what should
> happen during Incubation and at what time. I think that many of the
> questions that came up in this thread had to do with assessing how much
> effort it will take to Incubate Palo (or whatever the name will be)
>
> (1) The name Palo. Since there seems to be an issue with that name we should
> have a new name. It is not unknown for a podling to change its name, but
> that does generate extra work for Infrastructure to change the name after
> podling start up. It would be our preference for Palo to find a new name
> prior to VOTING on the proposal. Please do this elsewhere and come back to
> me with the new name so that I can help with the updated proposal.
>
> (2) Licensing of the software. Several bits came up as questionable.
> Regardless of cleanup that has already occurred we have identified that we
> will need to be very careful. It will be important to discuss and carefully
> handle the Software Grant Agreement to make sure that the source listed is
> correct. I think that the SGA must come early during incubation.
>
> (3) Relationship with Impala. Palo has apparently forked portions of Impala.
> This means that some are concerned that there is a missed synergy with the
> Apache Impala project. Is there a clean interface that can be built between
> the projects? It would help if the Palo developers would explore this with
> Impala at dev@impala.apache.org.
>
> That said, part of the Incubation process is to learn the Apache Way. IMHO
> it is ok for the relationship between Impala PMC and a pooling PPMC to be a
> work in process.
>
> (4) Currently, Willem, Luke Han and Dave Fisher are qualified to officially
> mentor. I suggest that Sijie Guo and Zheng Shao be included as Initial
> Committers in order to help from within the PPMC.
>
> On Jun 14, 2018, at 11:03 AM, Jim Apple <jbapple@cloudera.com.INVALID>
> wrote:
>
> I don't want to be a stickler, but I don't think "For issues mentioned by
> Jim, Todd and Tim, I have replied on last Saturday."
>
> To my email about Palo being an ASF project as a storage system without a
> query engine, you replied only, "We will seriously consider this proposal."
>
> I see no response to Tim's concern that "The code isn't owned by any
> individual, I contributed it to Apache and it's
> free for anyone to do what they want to do with it, but pulling in
> improvements from other projects without any attempt to attribute it or
> contribute improvements back seems contrary to the Apache way.”
>
>
> Jim - do you need answers to these concerns prior to agreeing to accept this
> project into the Incubator?
>
> Regards,
> Dave
>
>
> On Thu, Jun 14, 2018 at 12:48 AM, Li,De(BDG) <lide@baidu.com> wrote:
>
> Hi all,
>
> About Palo, we have fixed following issues.
>
> 1. Related Impala
> For issues mentioned by Jim, Todd and Tim, I have replied on last Saturday.
>
> 2、Lisence issue
> For issues mentioned by Todd and Ted.
> 1) be/aes/* come from mysql-5.6, GPL v2.1 license
> Fixed: removed aes related codes.
> https://github.com/baidu/palo/commit/ac770c33d445a4c18a0b74f56b28a4
> 180b30bf
> b7
> https://github.com/baidu/palo/commit/3c9f2ae6695ffebe41e39b6bf65440
> 77698f1c
> ed
>
> 2) be/util/mysql_dtoa.cpp copy from MySQL, GPL license
> Fixed: removed mysql_dtoa related codes.
> https://github.com/baidu/palo/commit/bfe1bc7cf39e165a7c52b2c9415509
> 75b1f841
> a1
>
> 3) be/http/mongoose.h, Copyright (c) 2004-2012 Sergey Lyubka
> Fixed: restored to original lisence, we are searching another http server
> to replace it.
> https://github.com/baidu/palo/commit/81baef34f48a2dbe7401712c5e0a50
> f59f04a8
> 31
>
> 4) be/rpc/*
> Fixed: We have replaced it with brpc, and we will remove Hypertable after
> few weeks for waiting users' upgrade to brpc.
> https://github.com/baidu/palo/tree/master/be/src/rpc
>
> 3、Dependency licenses
> For issue mentioned by Dave, It looks like that Palo have not depend on
> OpenLdap and cyrus-sasl directly,
> but some thirdpary libraries need them to compile, libcurl and gperftools
> for instance.
> For rapidjson, we are looking for alternative one.
>
> 4、About the name of Palo
> For issue mentioned by Julian.
> We are figuring out a better one.
>
> Best Regards,
> Reed
>
>
>
> 在 2018/6/13 上午8:54, "Li,De(BDG)" <lide@baidu.com> 写入:
>
> Hi Julian,
>
> Thank you.
>
> It looks like that we have to find another one.
> If anyone has a good name, please feel free to let me know.
>
> Best Regards,
> Reed
>
> 在 2018/6/13 上午4:20, "Julian Hyde" <jhyde@apache.org> 写入:
>
> Note that there is an existing database product called Palo - an open
> source OLAP engine by German company Jedox[1]. There there is a high
> likelihood that Palo would have to change its name during incubation, if
> accepted.
>
> Julian
>
> [1] https://en.wikipedia.org/wiki/Palo_(OLAP_database)
> <https://en.wikipedia.org/wiki/Palo_(OLAP_database)>
>
>
>
> On Jun 10, 2018, at 3:49 AM, Han Luke <luke.hq@gmail.com> wrote:
>
> Cool Dave, it’s great to have you to be the campaign.
>
>
> ________________________________
> From: Tan,Zhongyi <tanzhongyi@baidu.com <mailto:tanzhongyi@baidu.com>>
> Sent: Saturday, June 9, 2018 8:16:28 AM
> To: general@incubator.apache.org <mailto:general@incubator.apache.org>
> Subject: Re: Looking for Champion
>
> thanks,willem
>
> we are very appreciate.
>
> 在 2018年6月8日,23:03,Willem Jiang <willem.jiang@gmail.com> 写道:
>
> Hi,
>
> I'm willing to be the Mentor.
> Please count me in.
>
>
>
> Willem Jiang
>
> Twitter: willemjiang
> Weibo: 姜宁willem
>
> On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <dave2wave@comcast.net>
> wrote:
>
> Hi -
>
> I’m willing to Champion and Mentor. I have a couple of comments
> inline.
> I’ll look at dependency licenses later today. It’s early for me.
>
>
> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <lide@baidu.com> wrote:
>
> Hi all,
>
> I am Reed, as a developer worked with the team for Palo (a MPP-based
>
> interactive SQL data warehousing).
>
> https://github.com/baidu/palo/wiki/Palo-Overview
>
> We propose to contribute Palo as an Apache Incubator project, and
> we are still looking for possible Champion if anyone would like to
>
> volunteer. Thanks a lot.
>
>
> Best Regards,
> Reed
>
> ===================
> The draft of the proposal as below:
>
> #Apache Palo
>
> ##Abstract
>
> Palo is a MPP-based interactive SQL data warehousing for reporting
> and
>
> analysis.
>
>
> ##Proposal
>
> We propose to contribute the Palo codebase and associated artifacts
>
> (e.g. documentation, web-site content etc.) to the Apache Software
> Foundation with the intent of forming a productive, meritocratic and
> open
> community around Palo’s continued development, according to the
> ‘Apache
> Way’.
>
>
> Baidu owns several trademarks regarding Palo, and proposes to
> transfer
>
> ownership of those trademarks in full to the ASF.
>
>
> ###Overview of Palo
>
> Palo’s implementation consists of two daemons: Frontend (FE) and
> Backend
>
> (BE).
>
>
> **Frontend daemon** consists of query coordinator and catalog
> manager.
>
> Query coordinator is responsible for receiving users’ sql queries,
> compiling queries and managing queries execution. Catalog manager is
> responsible for managing metadata such as databases, tables,
> partitions,
> replicas and etc. Several frontend daemons could be deployed to
> guarantee
> fault-tolerance, and load balancing.
>
>
> **Backend daemon** stores the data and executes the query fragments.
>
> Many backend daemons could also be deployed to provide scalability
> and
> fault-tolerance.
>
>
> A typical Palo cluster generally composes of several frontend
> daemons
>
> and dozens to hundreds of backend daemons.
>
>
> Users can use MySQL client tools to connect any frontend daemon to
>
> submit SQL query. Frontend receives the query and compiles it into
> query
> plans executable by the Backend. Then Frontend sends the query plan
> fragments to Backend. Backend will build a query execution DAG. Data
> is
> fetched and pipelined into the DAG. The final result response is sent
> to
> client via Frontend. The distribution of query fragment execution
> takes
> minimizing data movement and maximizing scan locality as the main
> goal.
>
>
> ##Background
>
> At Baidu, Prior to Palo, different tools were deployed to solve
> diverse
>
> requirements in many ways. And when a use case requires the
> simultaneous
> availability of capabilities that cannot all be provided by a single
> tool,
> users were forced to build hybrid architectures that stitch multiple
> tools
> together, but we believe that they shouldn’t need to accept such
> inherent
> complexity. A storage system built to provide great performance
> across a
> broad range of workloads provides a more elegant solution to the
> problems
> that hybrid architectures aim to solve. Palo is the solution.
>
>
> Palo is designed to be a simple and single tightly coupled system,
> not
>
> depending on other systems. Palo provides high concurrent low latency
> point
> query performance, but also provides high throughput queries of
> ad-hoc
> analysis. Palo provides bulk-batch data loading, but also provides
> near
> real-time mini-batch data loading. Palo also provides high
> availability,
> reliability, fault tolerance, and scalability.
>
>
> ##Rationale
>
> Palo mainly integrates the technology of Google Mesa and Apache
> Impala.
>
> Mesa is a highly scalable analytic data storage system that stores
>
> critical measurement data related to Google's Internet advertising
> business. Mesa is designed to satisfy complex and challenging set of
> users’
> and systems’ requirements, including near real-time data ingestion
> and
> query ability, as well as high availability, reliability, fault
> tolerance,
> and scalability for large data and query volumes.
>
>
> Impala is a modern, open-source MPP SQL engine architected from the
>
> ground up for the Hadoop data processing environment. At present, by
> virtue
> of its superior performance and rich functionality, Impala has been
> comparable to many commercial MPP database query engine. Mesa can
> satisfy
> the needs of many of our storage requirements, however Mesa itself
> does not
> provide a SQL query engine; Impala is a very good MPP SQL query
> engine, but
> the lack of a perfect distributed storage engine. So in the end we
> chose
> the combination of these two technologies.
>
>
> Learning from Mesa’s data model, we developed a distributed storage
>
> engine. Unlike Mesa, this storage engine does not rely on any
> distributed
> file system. Then we deeply integrate this storage engine with Impala
> query
> engine. Query compiling, query execution coordination and catalog
> management of storage engine are integrated to be frontend daemon;
> query
> execution and data storage are integrated to be backend daemon. With
> this
> integration, we implemented a single, full-featured, high performance
> state
> the art of MPP database, as well as maintaining the simplicity.
>
>
> ##Current Status
>
> Palo has been an open source project on GitHub (
>
> https://github.com/baidu/palo).
>
>
> ###Meritocracy
>
> Palo has been deployed in production at Baidu and is applying more
> than
>
> 200 lines of business. It has demonstrated great performance benefits
> and
> has proved to be a better way for reporting and analysis based big
> data.
> Still We look forward to growing a rich user and developer community.
>
>
> ###Community
>
> Palo seeks to develop developer and user communities during
> incubation.
>
> ###Core Developers
>
> * Ruyue Ma (https://github.com/maruyue,
> maruyue@baidu.com<mailto:maruy
>
> ue@baidu.com>)
>
> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:
>
> bu
>
> aa.zhaoc@gmail.com>)
>
> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> * De Li(https://github.com/lide-reed, mailtolide@sina.com)<mailto:
>
> ma
>
> iltolide@sina.com%EF%BC%89>
>
> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>
> <mailto:chenhao16@baidu.com>)
>
> * Chaoyong Li (https://github.com/cyongli,
> lichaoyong@baidu.com<mailto:
>
> lichaoyong@baidu.com>)
>
> * Bin Lin (https://github.com/lingbin,
> lingbinlb@gmail.com<mailto:lin
>
> gbinlb@gmail.com>)
>
>
> ###Alignment
>
> Palo is related to several other Apache projects:
>
> * Palo can also read data stored in Apache Hadoop clusters powered
> by
>
> the HDFS filesystem.
>
> * Palo is closely integrated with Impala, which is also being
> proposed
>
> to the Incubator.
>
> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>
> * Palo uses Apache Thrift as its RPC and serialization framework of
>
> choice.
>
>
> ##Known Risks
>
> ###Orphaned Products
>
> The core developers of Palo team plan to work full time on this
> project.
>
> There is very little risk of Palo getting orphaned since at least one
> large
> company (Baidu) is extensively using it in their production. For
> example,
> currently there are more than 200 use cases using Palo in production.
> Furthermore, since Palo was open sourced at the beginning of October
> 2017,
> it has received more than 660 stars and been forked nearly 170 times.
> We
> plan to extend and diversify this community further through Apache.
>
>
> ###Inexperience with Open Source
>
> The core developers are all active users and followers of open
> source.
>
> They are already committers and contributors to the Palo Github
> project.
> All have been involved with the source code that has been released
> under an
> open source license, and several of them also have experience
> developing
> code in an open source environment. Though the core set of Developers
> do
> not have Apache Open Source experience, there are plans to onboard
> individuals with Apache open source experience on to the project.
>
>
> ###Homogenous Developers
>
> The most of core developers are from Baidu, but after Palo was open
>
> sourced, Palo received a lot of bug fixes and enhancements from other
> developers not working at Baidu.
>
>
> ###Reliance on Salaried Developers
>
> Baidu invested in Palo as the OLAP solution and some of its key
>
> engineers are working full time on the project. In addition, since
> there is
> a growing Big Data need for scalable OLAP solutions, we look forward
> to
> other Apache developers and researchers to contribute to the project.
> Also
> key to addressing the risk associated with relying on Salaried
> developers
> from a single entity is to increase the diversity of the contributors
> and
> actively lobby for Domain experts in the BI space to contribute.
> Apache
> Palo intends to do this.
>
>
> ###An Excessive Fascination with the Apache Brand
>
> Palo is proposing to enter incubation at Apache in order to help
> efforts
>
> to diversify the committer-base, not so much to capitalize on the
> Apache
> brand. The Palo project is in production use already inside Baidu,
> but is
> not expected to be an Baidu product for external customers. As such,
> the
> Palo project is not seeking to use the Apache brand as a marketing
> tool.
>
>
> ##Documentation
>
> Information about Palo can be found at
> https://github.com/baidu/palo.
>
> The following links provide more information about Palo in open
> source:
>
>
> * Palo wiki site: https://github.com/baidu/palo/wiki
> * Codebase at Github: https://github.com/baidu/palo
> * Issue Tracking: https://github.com/baidu/palo/issues
> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>
> ##Initial Source
>
> Palo has been under development since 2017 by a team of engineers at
>
> Baidu Inc. It is currently hosted on Github.com under an Apache
> license at
> https://github.com/baidu/palo.
>
>
> ##External Dependencies
>
> Palo has the following external dependencies.
>
> * Google gflags (BSD)
> * Google glog (BSD)
> * Apache Thrift (Apache Software License v2.0)
> * Apache Commons (Apache Software License v2.0)
> * Boost (Boost Software License)
> * OpenLdap (OpenLDAP Software License)
> * rapidjson (Tencent)
> * Google RE2 (BSD-style)
> * lz4 (BSD)
> * snappy (BSD)
> * cyrus-sasl (CMU License)
> * Twitter Bootstrap (Apache Software License v2.0)
> * d3 (BSD)
> * LLVM (BSD-like)
>
> Build and test dependencies:
>
> * ant (Apache Software License v2.0)
> * Apache Maven (Apache Software License v2.0)
> * cmake (BSD)
> * clang (BSD)
> * Google gtest (Apache Software License v2.0)
>
> ##Required Resources
>
> ###Mailing List
>
> There are currently no mailing lists. The usual mailing lists are
>
> expected to be set up when entering incubation:
>
>
> private@palo.incubator.apache.org<mailto:private@palo.
>
> incubator.apache.org>
>
> dev@palo.incubator.apache.org<mailto:dev@palo.incubator.apache.org>
> commits@palo.incubator.apache.org<mailto:commits@palo.
>
> incubator.apache.org>
>
>
> ###Subversion Directory
>
> Upon entering incubation: https://github.com/baidu/palo.
> After incubation, we want to move the existing repo from
>
> https://github.com/baidu/palo to Apache infrastructure.
>
>
> ###Issue Tracking
>
> Palo currently uses GitHub to track issues. Would like to continue
> to do
>
> so while we discuss migration possibilities with the ASF Infra
> committee.
>
>
> ###Other Resources
>
> The existing code already has unit tests so we will make use of
> existing
>
> Apache continuous testing infrastructure. The resulting load should
> not be
> very large.
>
>
> ##Initial Committers
>
> * Ruyue Ma (https://github.com/maruyue,
> maruyue@baidu.com<mailto:maruy
>
> ue@baidu.com>)
>
> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:
>
> bu
>
> aa.zhaoc@gmail.com>)
>
> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> * De Li(https://github.com/lide-reed, mailtolide@sina.com)<mailto:
>
> ma
>
> iltolide@sina.com%EF%BC%89>
>
> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>
> <mailto:chenhao16@baidu.com>)
>
> * Chaoyong Li (https://github.com/cyongli,
> lichaoyong@baidu.com<mailto:
>
> lichaoyong@baidu.com>)
>
> * Bin Lin (https://github.com/lingbin,
> lingbinlb@gmail.com<mailto:lin
>
> gbinlb@gmail.com>)
>
>
> ##Affiliations
>
> The initial committers are employees of Baidu Inc.. The nominated
>
> mentors are employees of TODO.
>
>
> ##Sponsors
>
> ###Champion
>
> TODO
>
> ###Nominated Mentors
>
> * sijie guo, guosijie@gmail.com<mailto:guosijie@gmail.com>
> * Luke Han, lukehan@apache.org<mailto:lukehan@apache.org>
> * Zheng Shao, zshao@apache.org<mailto:zshao@apache.org>
>
>
> Mentors must be members of the IPMC and almost always Members of the
> ASF.
>
> At this moment only Luke Han is qualified.
>
> Regards,
> Dave
>
>
> ###Sponsoring Entity
>
> We are requesting the Incubator to sponsor this project.
>
>
>
> ?B婯
> KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
>
> KKKKKKKCB??[
>
> 溳
> X溫軞X橩??K[XZ[??賉橽榌 ][溳X溫軞X橮?[樰X榏?軏榎?X?K涇櫭B憶軋?Y??]?[蹣[??
>
> 圹[X[???K[XZ[??賉橽榌
>
> Z?[???[樰X榏?軏榎?X?K涇櫭B
>
>
>
> ?B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
>
> KKKKKKKKCB�
>
> ?�?[��X��ܚX�K??K[XZ[?�?�[�\�[?][��X��ܚX�P?[��X�]?܋�\?X�?K�
>
> ܙ�B��܈?Y??]?[ۘ[?
>
> ?��[X[�?�??K[XZ[?�?�[�\�[?Z?[???[��X�]?܋�\?X�?K�ܙ�B
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message