From general-return-63440-apmail-incubator-general-archive=incubator.apache.org@incubator.apache.org Wed Feb 14 01:01:30 2018 Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0C4AC176C8 for ; Wed, 14 Feb 2018 01:01:30 +0000 (UTC) Received: (qmail 88492 invoked by uid 500); 14 Feb 2018 01:01:28 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 88249 invoked by uid 500); 14 Feb 2018 01:01:28 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 88233 invoked by uid 99); 14 Feb 2018 01:01:27 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Feb 2018 01:01:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 5BD5418015D for ; Wed, 14 Feb 2018 01:01:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id IhCW1CuE10iE for ; Wed, 14 Feb 2018 01:01:21 +0000 (UTC) Received: from mail-it0-f42.google.com (mail-it0-f42.google.com [209.85.214.42]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 612655F1E7 for ; Wed, 14 Feb 2018 01:01:21 +0000 (UTC) Received: by mail-it0-f42.google.com with SMTP id j21so11276993ita.1 for ; Tue, 13 Feb 2018 17:01:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=ARXi+4I22CVpf+GRVCEeyJCQiqyuuyErLoVwqkQQTUI=; b=Yr42FTQEDLAVKrYVe7DWOuayC9Fn1u/70GHZwfXBW347cLRkPb8jIubnX/zSpowTf7 4yF7UTCRtbp+u8IdAoVT44R7ytr51sydhiVaZmdsNa3x7A+UNTSa1ejzaN7eR+R6ictZ olt2jiedGJKH6p3k4aPypsP0Q/HkRcfbIvl9nQGgI0qBRinJ7kT1ea/ob8N/xqMT7go2 ljBfbZGUnXAok8Ooh6b2xo8tC8kf8P/91Y/Kq7cbDFBjSsLuA4aubz9CqZke0RD7Xo5L Bd4LVKVu82CEootmywj+/J7mobct0dkE9nuxq+c/YAsMZfnCkymm1KI4u79p/1OOBy+t 7tBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=ARXi+4I22CVpf+GRVCEeyJCQiqyuuyErLoVwqkQQTUI=; b=mltHn/BFHvqamjSr6jDD4s6pf4bDybs0gUjPMs/f3W1+Fi4xO69/nWUyv65YpbWRPj Mruh9hurwBz/kI2Many7zz/8FWzVIEksXmIG/YFeepLT7F8Pe7TCcZgC9g7f+/Fnsl8j uv/22JqtOm95dzrNsqBLs5El0qr68cKQxt0z1Ps22E3wOFXcR/goz/bmN1K7ILSuRg3h X5+OVJddP9Cq7aGs3/IBFoc7KeMkajthZetY1+yKjIs6BcMcyanP6hSp1hzESky0ev+b Cg1uVzlB+6k+/jzm+Bzaw2hZlsV7ziz7lE5GoNWuvkEy5a9vUE7OKuKlOAqkpNI65ah3 HGrQ== X-Gm-Message-State: APf1xPBvjpaYHjn8cmp9XuauafU6gA/uGjY1QNEgKS3Pm5Go2TUFrxAw L90Ap4whFjdE4umvaaD5XhCn+SneLUGFozeQk0g= X-Google-Smtp-Source: AH8x227L9XdP6Mkl3oXxpQ3e4d0RH2l/gBdHQkB2JvWMXkM7/jsoygbxIu3i8f2/ljbNu36Q4JOJerPHnmFKWjKvyNE= X-Received: by 10.36.103.148 with SMTP id u142mr3948818itc.83.1518570080505; Tue, 13 Feb 2018 17:01:20 -0800 (PST) MIME-Version: 1.0 Received: by 10.2.152.225 with HTTP; Tue, 13 Feb 2018 17:01:19 -0800 (PST) In-Reply-To: References: <21FF95AD-B519-42F9-A2E1-3DF10E65EFD2@comcast.net> <488295ec-f9bf-383e-a3af-43829e209ec2@mcgrail.com> From: kishore g Date: Tue, 13 Feb 2018 17:01:19 -0800 Message-ID: Subject: Re: [DISCUSS] Apache Pinot Incubator Proposal To: general@incubator.apache.org Cc: Dave Fisher Content-Type: multipart/alternative; boundary="001a114ab3d237134f056521a647" --001a114ab3d237134f056521a647 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Kevin, Increasing the adoption of Pinot is one thing that can help build a good diverse community. Few things that come to my mind - Improve documentation - Better integration with cloud providers - Meetup and blog posts. We would also love to get additional mentors from ASF to help us build the community around Pinot. On Tue, Feb 13, 2018 at 4:29 PM, Timothy Chen wrote: > Love to see this in the incubator as well. +1 > > Tim > > On Tue, Feb 13, 2018 at 4:22 PM, Kevin A. McGrail > wrote: > > Agreed. It could use more mentors from ASF which I'm too overloaded to > help > > with but I'd be inclined to +1 this. Do you have some thoughts on > getting > > more community people outside of LI and Uber to help? > > > > On 2/13/2018 7:07 PM, Dave Fisher wrote: > >> > >> Noir or Blanc? Gris or Grigio? What=E2=80=99s the vintage? > >> > >> All kidding aside this looks interesting. > >> > >> Regards, > >> Dave > >> > >> Sent from my iPhone > >> > >>> On Feb 13, 2018, at 12:10 AM, kishore g wrote: > >>> > >>> Hello, > >>> > >>> I would like to propose Pinot as an Apache Incubator project. The > >>> proposal > >>> is available as a draft at > >>> https://wiki.apache.org/incubator/PinotProposal. I > >>> have also included the text of the proposal below. > >>> > >>> Any feedback from the community is much appreciated. > >>> > >>> Regards, > >>> Kishore G > >>> > >>> =3D Pinot Proposal =3D > >>> > >>> =3D=3D Abstract =3D=3D > >>> > >>> Pinot is a distributed columnar storage engine that can ingest data i= n > >>> real-time and serve analytical queries at low latency. There are two > >>> modes > >>> of data ingestion - batch and/or realtime. Batch mode allows users to > >>> generate pinot segments externally using systems such as Hadoop. Thes= e > >>> segments can be uploaded into Pinot via simple curl calls. Pinot can > >>> ingest > >>> data in near real-time from streaming sources such as Kafka. Data > >>> ingested > >>> into Pinot is stored in a columnar format. Pinot provides a SQL like > >>> interface (PQL) that supports filters, aggregations, and group by > >>> operations. It does not support joins by design, in order to guarante= e > >>> predictable latency. It leverages other Apache projects such as > >>> Zookeeper, > >>> Kafka, and Helix, along with many libraries from the ASF. > >>> > >>> =3D=3D Proposal =3D=3D > >>> > >>> Pinot was open sourced by LinkedIn and hosted on GitHub. Majority of > the > >>> development happens at LinkedIn with other contributions from Uber an= d > >>> Slack. We believe that being a part of Apache Software Foundation wil= l > >>> improve the diversity and help form a strong community around the > >>> project. > >>> > >>> LinkedIn submits this proposal to donate the code base to Apache > Software > >>> Foundation. The code is already under Apache License 2.0. Code and t= he > >>> documentation are hosted on Github. > >>> * Code: http://github.com/linkedin/pinot > >>> * Documentation: https://github.com/linkedin/pinot/wiki > >>> > >>> > >>> =3D=3D Background =3D=3D > >>> > >>> LinkedIn, similar to other companies, has many applications that > provide > >>> rich real-time insights to members and customers (internal and > external). > >>> The workload characteristics for these applications vary a lot. Some > >>> internal applications simply need ad-hoc query capabilities with > >>> sub-second > >>> to multiple seconds latency. But external site facing applications > >>> require > >>> strong SLA even very high workloads. Prior to Pinot, LinkedIn had > >>> multiple > >>> solutions depending on the workload generated by the application and > this > >>> was inefficient. Pinot was developed to be the one single platform th= at > >>> addresses all classes of applications. Today at LinkedIn, Pinot power= s > >>> more > >>> than 50 site facing products with workload ranging from few queries p= er > >>> second to 1000=E2=80=99s of queries per second while maintaining the = 99th > >>> percentile latency which can be as low as few milliseconds. All > internal > >>> dashboards at LinkedIn are powered by Pinot. > >>> > >>> =3D=3D Rationale =3D=3D > >>> > >>> We believe that requirement to develop rich real-time analytic > >>> applications > >>> is applicable to other organizations. Both Pinot and the interested > >>> communities would benefit from this work being openly available. > >>> > >>> =3D=3D Current Status =3D=3D > >>> > >>> Pinot is currently open sourced under the Apache License Version 2.0 > and > >>> available at github.com/linkedin/pinot. All the development is done > using > >>> GitHub Pull Requests. We cut releases on a weekly basis and deploy it > at > >>> LinkedIn. mp-0.1.468 is the latest release tag that is deployed in > >>> production. > >>> > >>> =3D=3D Meritocracy =3D=3D > >>> > >>> Following the Apache meritocracy model, we intend to build an open an= d > >>> diverse community around Pinot. We will encourage the community to > >>> contribute to discussion and codebase. > >>> > >>> =3D=3D Community =3D=3D > >>> > >>> Pinot is currently used extensively at LinkedIn and Uber. Several > >>> companies > >>> have expressed interest in the project. We hope to extend the > contributor > >>> base significantly by bringing Pinot into Apache. > >>> > >>> =3D=3D Core Developers =3D=3D > >>> > >>> Pinot was started by engineers at LinkedIn, and now has committers fr= om > >>> Uber. > >>> > >>> =3D=3D Alignment =3D=3D > >>> > >>> Apache is the most natural home for taking Pinot forward. Pinot > leverages > >>> several existing Apache Projects such as Kafka, Helix, Zookeeper, and > >>> Avro. > >>> As Pinot gains adoption, we plan to add support for the ORC and Parqu= et > >>> formats, as well as adding integration with Yarn and Mesos. > >>> > >>> =3D=3D Known Risks =3D=3D > >>> > >>> =3D=3D=3D Orphaned Products =3D=3D=3D > >>> > >>> The risk of the Pinot project being abandoned is minimal. The teams a= t > >>> LinkedIn and Uber are highly incentivized to continue development of > >>> Pinot > >>> as it is a critical part of their infrastructure. > >>> > >>> =3D=3D=3D Inexperience with Open Source =3D=3D=3D > >>> > >>> Post open sourcing, Pinot was completely developed on GitHub. All the > >>> current developers on Pinot are well aware of the open source > development > >>> process. However, most of the developers are new to the Apache proces= s. > >>> Kishore Gopalakrishna, one of the lead developers in Pinot, is VP and > >>> committer of the Apache Helix project. > >>> > >>> =3D=3D=3D Homogenous Developers =3D=3D=3D > >>> > >>> The current core developers are all from LinkedIn and Uber. However, = we > >>> hope to establish a developer community that includes contributors fr= om > >>> several corporations and we are actively encouraging new contributors > via > >>> the mailing lists and public presentations of Pinot. > >>> > >>> =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > >>> > >>> It is expected that Pinot development will occur on both salaried tim= e > >>> and > >>> on volunteer time, after hours. The majority of initial committers ar= e > >>> paid > >>> by their employer to contribute to this project. However, they are al= l > >>> passionate about the project, and we are confident that the project > will > >>> continue even if no salaried developers contribute to the project. We > are > >>> committed to recruiting additional committers including non-salaried > >>> developers. > >>> > >>> =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D > >>> > >>> As mentioned earlier, Pinot uses several Apache Projects such as Kafk= a > to > >>> ingest data in real-time, Zookeeper and Helix for cluster management. > >>> Pinot > >>> also uses Maven for build and release. We foresee adding support for > the > >>> Parquet and ORC formats. Adding the ability to deploy on Yarn and Mes= os > >>> clusters is another interesting project we might pursue. > >>> > >>> =3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D > >>> > >>> While we respect the reputation of the Apache brand and have no doubt= s > >>> that > >>> it will attract contributors and users, we believe ASF is the right > home > >>> for Pinot to foster a great community that will lead to a better > outcome > >>> in > >>> the long term. > >>> > >>> =3D=3D Documentation =3D=3D > >>> > >>> * Code: https://github.com/linkedin/pinot/ > >>> * Documentation: https://github.com/linkedin/pinot/wiki > >>> * User group: https://groups.google.com/forum/#!forum/pinot_users > >>> > >>> =3D=3D Initial Source =3D=3D > >>> > >>> The current Pinot codebase is hosted on Github and licensed under the > >>> Apache License V2. The source tree is self contained and relies on > Maven > >>> as > >>> its build and dependency resolution mechanism. > >>> > >>> =3D=3D External Dependencies =3D=3D > >>> > >>> All dependencies in Pinot have licenses that are compatible with Apac= he > >>> License V2, except for the org.json library, which will be removed > prior > >>> to > >>> Apache incubation. The list below summarizes the external dependencie= s > of > >>> Pinot grouped by license and ASF license category. > >>> > >>> Dependencies from the ASF Category A > >>> =3D=3D=3D Apache License 2.0 =3D=3D=3D > >>> * com.101tec:zkclient:0.7 > >>> * com.alibaba:fastjson:1.1.24 > >>> * com.clearspring.analytics:stream:2.7.0 > >>> * com.fasterxml.jackson.core:jackson-annotations:2.8.0 > >>> * com.fasterxml.jackson.core:jackson-core:2.8.0 > >>> * com.fasterxml.jackson.core:jackson-databind:2.8.0 > >>> * com.google.code.findbugs:jsr305:3.0.0 > >>> * com.google.guava:guava:19 > >>> * com.ning:async-http-client:1.9.21 > >>> * com.yammer.metrics:metrics-core:2.2.0 > >>> * commons-beanutils:commons-beanutils:1.8.3 > >>> * commons-cli:commons-cli:1.2 > >>> * commons-codec:commons-codec:1.6 > >>> * commons-configuration:commons-configuration:1.6 > >>> * commons-fileupload:commons-fileupload:1.2.2 > >>> * commons-httpclient:commons-httpclient:3.1 > >>> * commons-io:commons-io:2.1 > >>> * commons-validator:commons-validator:1.4.0 > >>> * io.netty:netty-all:4.1.4.Final > >>> * io.swagger:swagger-jaxrs:1.5.10 > >>> * io.swagger:swagger-jersey2-jaxrs:1.5.10 > >>> * it.unimi.dsi:fastutil:6.5.16 > >>> * joda-time:joda-time:2 > >>> * log4j:log4j:1.2.17 > >>> * me.lemire.integercompression:JavaFastPFOR:0.0.13 > >>> * nl.jqno.equalsverifier:equalsverifier:1.7.2 > >>> * org.apache.avro:avro:1.7.6 > >>> * org.apache.commons:commons-compress:1.9 > >>> * org.apache.commons:commons-lang3:3.5 > >>> * org.apache.commons:commons-math:2.1 > >>> * org.apache.hadoop:hadoop-client:2.7.0 > >>> * org.apache.hadoop:hadoop-common:2.7.0 > >>> * org.apache.helix:helix-core:0.6.8 > >>> * org.apache.httpcomponents:httpclient:4.1.3 > >>> * org.apache.httpcomponents:httpclient:4.2.5 > >>> * org.apache.httpcomponents:httpcore:4.2.5 > >>> * org.apache.httpcomponents:httpmime:4.2.5 > >>> * org.apache.kafka:kafka_2.10:0.9.0.1 > >>> * org.apache.thrift:libthrift:0.9.1 > >>> * org.apache.zookeeper:zookeeper:3.4.9 > >>> * org.codehaus.jackson:jackson-core-asl:1.9.6 > >>> * org.codehaus.jackson:jackson-mapper-asl:1.9.6 > >>> * org.json:json:20080701 > >>> * org.roaringbitmap:RoaringBitmap:0.5.10 > >>> * org.testng:testng:6.0.1 > >>> * org.twitter4j:twitter4j-core:4.0.3 > >>> * org.webjars:swagger-ui:2.2.2 > >>> * org.xerial.larray:larray:0.2.1 > >>> * org.yaml:snakeyaml:1.16 > >>> * xml-apis:xml-apis:1.0.b2 > >>> =3D=3D=3D Dual license (Apache License 2.0 + LGPL 2.1), using under t= he > Apache > >>> License =3D=3D=3D > >>> * org.codehaus.jackson:jackson-jaxrs:1.9.6 > >>> * org.codehaus.jackson:jackson-xc:1.9.6 > >>> =3D=3D=3D BSD =3D=3D=3D > >>> * com.jcabi:jcabi-log:0.17.1 > >>> * org.antlr:antlr4-annotations:4.3 > >>> * org.antlr:antlr4-runtime:4.3 > >>> =3D=3D=3D MIT =3D=3D=3D > >>> * com.github.nkzawa:socket.io-client:0.5.1 > >>> * org.mockito:mockito-core:2.10.0 > >>> * org.slf4j:slf4j-api:1.7.7 > >>> * org.slf4j:slf4j-log4j12:1.7.7 > >>> > >>> =3D=3D=3D Dependencies from the ASF Category B =3D=3D=3D > >>> Dual license (CDDL 1.1 + GPL 2 w/ CPE), using under the CDDL > >>> * com.sun.jersey:jersey-client:1.19.2 > >>> * javax.servlet:javax.servlet-api:3.0.1 > >>> * org.glassfish.jersey.containers:jersey-container-grizzly2-http:2.23 > >>> * org.glassfish.jersey.core:jersey-common:2.23 > >>> * org.glassfish.jersey.core:jersey-server:2.23 > >>> * org.glassfish.jersey.media:jersey-media-json-jackson:2.24 > >>> * org.glassfish.jersey.media:jersey-media-multipart:2.23 > >>> > >>> =3D=3D=3D Dependencies from the ASF Category X =3D=3D=3D > >>> JSON License > >>> * org.json:json:20080701 (to be removed before Apache incubation) > >>> > >>> > >>> =3D=3D Cryptography =3D=3D > >>> > >>> None > >>> > >>> =3D=3D Required Resources =3D=3D > >>> > >>> =3D=3D=3D Mailing lists =3D=3D=3D > >>> > >>> * pinot-private (with moderated subscriptions) > >>> * pinot-user > >>> * pinot-dev > >>> * pinot-commits > >>> > >>> =3D=3D=3D Git repository =3D=3D=3D > >>> > >>> * git://git.apache.org/pinot > >>> * https://git-wip-us.apache.org/repos/asf/incubator-pinot.git > >>> > >>> =3D=3D=3D Issue Tracking =3D=3D=3D > >>> > >>> A JIRA Issue tracker (PINOT) > >>> > >>> =3D=3D=3D Other Resources =3D=3D=3D > >>> > >>> The existing code already has unit and integration tests and we use > >>> travis > >>> to test the patch before committing it to master. We would like to ha= ve > >>> an > >>> instance of Jenkins to achieve similar functionality. > >>> > >>> =3D=3D Initial Committers =3D=3D > >>> > >>> * Kishore Gopalakrishna > >>> * Ravi Aringunram > >>> * Jean-Fran=C3=A7ois Im > >>> * Mayank Shrivastava > >>> * Subbu Subramaniam > >>> * Adwait Tumbde > >>> * Xiaotian Jiang > >>> * Jennifer Dai > >>> * Seunghyun Lee > >>> * Xiang Fu > >>> * Dhaval Patel > >>> * Neha Pawar > >>> * Alex Pucher > >>> * Yen-Jung Chang > >>> > >>> > >>> > >>> =3D=3D Affiliations =3D=3D > >>> > >>> * Kishore Gopalakrishna (LinkedIn) > >>> * Ravi Aringunram (LinkedIn) > >>> * Jean-Fran=C3=A7ois Im (LinkedIn) > >>> * Mayank Shrivastava (LinkedIn) > >>> * Subbu Subramaniam (LinkedIn) > >>> * Adwait Tumbde (LinkedIn) > >>> * Xiaotian Jiang (LinkedIn) > >>> * Jennifer Dai (LinkedIn) > >>> * Seunghyun Lee (LinkedIn) > >>> * Xiang Fu (Uber) > >>> * Dhaval Patel (Uber) > >>> * Neha Pawar (LinkedIn) > >>> * Alex Pucher (LinkedIn) > >>> * Yen-Jung Chang (LinkedIn) > >>> > >>> =3D=3D Sponsors =3D=3D > >>> > >>> =3D=3D=3D Champion =3D=3D=3D > >>> > >>> * Olivier Lamy < olamy at apache dot org> > >>> > >>> =3D=3D=3D Nominated Mentors =3D=3D=3D > >>> > >>> * Olivier Lamy > >>> > >>> =3D=3D=3D Sponsoring Entity =3D=3D=3D > >>> > >>> The Apache Incubator > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > >> For additional commands, e-mail: general-help@incubator.apache.org > >> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > > For additional commands, e-mail: general-help@incubator.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > > --001a114ab3d237134f056521a647--