From general-return-63513-apmail-incubator-general-archive=incubator.apache.org@incubator.apache.org Thu Feb 22 18:23:19 2018 Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 13F9517834 for ; Thu, 22 Feb 2018 18:23:19 +0000 (UTC) Received: (qmail 36811 invoked by uid 500); 22 Feb 2018 18:23:18 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 36542 invoked by uid 500); 22 Feb 2018 18:23:17 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 36531 invoked by uid 99); 22 Feb 2018 18:23:17 -0000 Received: from mail-relay.apache.org (HELO mailrelay1-lw-us.apache.org) (207.244.88.152) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Feb 2018 18:23:17 +0000 Received: from mail-pl0-f44.google.com (mail-pl0-f44.google.com [209.85.160.44]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id E3A736D7 for ; Thu, 22 Feb 2018 18:23:16 +0000 (UTC) Received: by mail-pl0-f44.google.com with SMTP id x18so3343197pln.0 for ; Thu, 22 Feb 2018 10:23:16 -0800 (PST) X-Gm-Message-State: APf1xPDO5W/v19eC3kL2iNSEGNwFCCZmhdQ/NeLaeHKY88elvIJmsUui LU6hZsjZZkMrIc4uvuv0P/q+v2H5mxXOY7PkyIs= X-Google-Smtp-Source: AH8x224TsJEquIf2iKc+xxMs2vZQxEaR6AOuo8hathyif5VDL8C7J2jKON6/Ud0YRLFGGsZZ6rPF/VENATFbycTxu5Y= X-Received: by 2002:a17:902:a50b:: with SMTP id s11-v6mr7351781plq.440.1519323796012; Thu, 22 Feb 2018 10:23:16 -0800 (PST) MIME-Version: 1.0 Received: by 10.100.153.7 with HTTP; Thu, 22 Feb 2018 10:23:15 -0800 (PST) In-Reply-To: References: From: Julian Hyde Date: Thu, 22 Feb 2018 10:23:15 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [DISCUSS] Druid incubation proposal To: general@incubator.apache.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable It seems that we have consensus (and indeed, an ectopic vote is happening in this discuss thread). I will start a formal vote. All of you who replied '+1' on this thread, thanks for your support, and please cast your vote on the formal thread. Julian On Thu, Feb 22, 2018 at 9:36 AM, Pramod Immaneni w= rote: > +1 > > On Fri, Feb 16, 2018 at 12:15 PM, Gian Merlino wrote: > >> Hi all, >> >> I would like to open up a discussion about incubating Druid at Apache. I= 've >> included a proposal in this mail and have also posted a draft at >> https://wiki.apache.org/incubator/DruidProposal. More information about >> Druid is also available on our project web site at: http://druid.io/ >> >> Thanks for your consideration! >> >> Gian >> >> =3D Druid Proposal =3D >> >> =3D=3D Abstract =3D=3D >> >> Druid is a high-performance, column-oriented, distributed data store. >> >> =3D=3D Proposal =3D=3D >> >> Druid is an open source data store designed for real-time exploratory >> analytics on large data sets. Druid's key features are a column-oriented >> storage layout, a distributed shared-nothing architecture, and ability t= o >> generate and leverage indexing and caching structures. Druid is typicall= y >> deployed in clusters of tens to hundreds of nodes, and has the ability t= o >> load data from Apache Kafka and Apache Hadoop, among other data sources. >> Druid offers two query languages: a SQL dialect (powered by Apache Calci= te) >> and a JSON-over-HTTP API. >> >> Druid was originally developed to power a slice-and-dice analytical UI >> built on top of large event streams. The original use case for Druid >> targeted ingest rates of millions of records/sec, retention of over a ye= ar >> of data, and query latencies of sub-second to a few seconds. Many people >> can benefit from such capability, and many already have (see >> http://druid.io/druid-powered.html). In addition, new use cases have >> emerged since Druid's original development, such as OLAP acceleration of >> data warehouse tables and more highly concurrent applications operating >> with relatively narrower queries. >> >> =3D=3D Background =3D=3D >> >> Druid is a data store designed for fast analytics. It would typically be >> used in lieu of more general purpose query systems like Hadoop !MapReduc= e >> or Spark when query latency is of the utmost importance. Druid is often >> used as a data store for powering GUI analytical applications. >> >> The buzzwordy description of Druid is a high-performance, column-oriente= d, >> distributed data store. What we mean by this is: >> >> * "high performance": Druid aims to provide low query latency and high >> ingest rates possible. >> * "column-oriented": Druid stores data in a column-oriented format, lik= e >> most other systems designed for analytics. It can also store indexes alo= ng >> with the columns. >> * "distributed": Druid is deployed in clusters, typically of tens to >> hundreds of nodes. >> * "data store": Druid loads your data and stores a copy of it on the >> cluster's local disks (and may cache it in memory). It doesn't query you= r >> data from some other storage system. >> >> =3D=3D Rationale =3D=3D >> >> Druid is a mature, active project with a large number of production >> installations, dozens of contributors to each release, and multiple vend= ors >> offering professional support. Given Druid's strong community, its close >> integration with many other Apache projects (such as Kafka, Hadoop, and >> Calcite), and its pre-existing Apache-inspired governance structure, we >> feel that Apache is the best home for the project on a long-term basis. >> >> =3D=3D Current Status =3D=3D >> >> =3D=3D=3D Meritocracy =3D=3D=3D >> Since Druid was first open sourced the original developers have solicite= d >> contributions from others, including through our blog, the project maili= ng >> lists, and through accepting !GitHub pull requests. We have an >> Apache-inspired governance structure with a PMC and committers, and our >> committer ranks include a good number of people from outside the origina= l >> development team. >> >> =3D=3D=3D Community =3D=3D=3D >> >> The Druid core developers have sought to nurture a community throughout = the >> life of the project. We use !GitHub as the focal point for bug reports a= nd >> code contributions, and the mailing lists for most other discussion. To = try >> to make people feel welcome, we've also spelled this out on a "CONTRIBUT= E" >> link from the project page: http://druid.io/community/. Today we have an >> active contributor base (a typical release has ~40 contributors) and >> mailing list. >> >> =3D=3D=3D Core Developers =3D=3D=3D >> >> Druid enjoys good diversity of committer affiliation. The most active >> developers over the past year are affiliated with four different compani= es: >> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are al= so >> committers on other ASF projects as well, including Apache Airflow, Apac= he >> Curator, and Apache Calcite. The original developers of Druid remain >> involved in the project. >> >> =3D=3D=3D Alignment =3D=3D=3D >> >> Druid's current governance structure is Apache-inspired with a PMC and >> committers chosen by a meritocratic process. Additionally, Druid integra= tes >> with a number of other Apache projects, including Kafka, Hadoop, Hive, >> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper. >> >> =3D=3D Known Risks =3D=3D >> >> =3D=3D=3D Orphaned products =3D=3D=3D >> >> The risk of Druid becoming orphaned is low, due to a diverse committer b= ase >> that is invested in the future of the project. >> >> =3D=3D=3D Inexperience with Open Source =3D=3D=3D >> >> Druid's core developers have been running it as a community-oriented ope= n >> source project for some time now, and many of them are committers on oth= er >> open source projects as well, including Apache Airflow, Apache Curator, = and >> Apache Calcite. >> >> =3D=3D=3D Homogenous Developers =3D=3D=3D >> >> Druid's current diversity of committer affiliation means that we have >> become accustomed to working collaboratively and in the open. We hope th= at >> a transition to the ASF helps Druid's contributor base become even more >> diverse. >> >> =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D >> >> Druid's user base and contributor base skews heavily towards salaried >> developers. We believe this is natural since Druid is a technology desig= ned >> to be deployed on large clusters, and due to this, tends to be deployed = by >> organizations rather than by individuals. Nevertheless, many current Dru= id >> developers have continued working on the project even through job change= s, >> which we take to be a good sign of developer commitment and personal >> interest. >> >> =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D >> >> Druid integrates with a number of other Apache projects. Druid internall= y >> uses Calcite for SQL planning, and Curator and !ZooKeeper for coordinati= on. >> Druid can read data in Avro or Parquet format. Druid can load data from >> streams in Kafka or from files in Hadoop. Druid integrates with Hive as = an >> option for SQL query acceleration. Druid data can be visualized by Super= set >> (incubating). >> >> =3D=3D=3D A Excessive Fascination with the Apache Brand =3D=3D=3D >> >> Druid is a successful project with a diverse community. The main reason = for >> pursuing incubation is to find a stable, long term home for the project >> with a well known governance philosophy. >> >> =3D=3D Required Resources =3D=3D >> >> =3D=3D=3D Mailing lists =3D=3D=3D >> >> We would like to migrate the existing Druid mailing lists from Google >> Groups to Apache. >> >> * druid-user@googlegroups -> users@druid.incubator.apache.org >> * druid-development@googlegroups -> dev@druid.incubator.apache.org >> >> =3D=3D=3D Source control =3D=3D=3D >> >> Druid development currently takes place on !GitHub. We would like to >> continue using !GitHub, if possible, in order to preserve the workflows = the >> community has developed around !GitHub pull requests. >> >> =3D=3D=3D Issue tracking =3D=3D=3D >> Druid currently uses !GitHub issues for issue tracking. We would like to >> migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID. >> >> =3D=3D Documentation =3D=3D >> >> Druid's documentation can be found at http://druid.io/docs/latest/. >> >> =3D=3D Initial Source =3D=3D >> >> Druid was initially open-sourced by Metamarkets in 2012 and has been run= in >> a community-governed fashion since then. The code is currently hosted at >> https://github.com/druid-io/ and includes the following repositories: >> >> * druid (primary repository) >> * druid-console (web console for Druid) >> * druid-io.github.io (source for Druid's website at http://druid.io/) >> * tranquility (realtime stream push client for Druid) >> * docker-druid (Docker image for Druid) >> * pydruid (Python library) >> * RDruid (R library) >> * oss-parent (Maven POM files) >> >> =3D=3D Source and Intellectual Property Submission Plan =3D=3D >> >> A complete set of the open source code needs to be licensed from the own= ing >> organization to the Foundation. Commercial legal counsel for the owning >> organization will review the standard Foundation licensing paperwork and >> propose any updates as needed. This license will enable Apache to incuba= te >> and manage the Druid project moving forward. >> >> Other Druid paraphernalia to be transferred to Apache consists of: >> >> * !GitHub organization at https://github.com/druid-io/ >> * Twitter account at https://twitter.com/druidio >> * "druid.io" domain name >> * "Druid" trademark assignment per Foundation standard paper. The >> trademark assignment paperwork shall be reviewed by the owning >> organization's commercial and IP counsel >> * CLAs - all rights in the code licensed above should encompass the CLA= s >> that existed between developers and owning organization >> >> A copyright license to the code, trademark assignment of Druid, and >> transfer of other paraphernalia to Apache should be sufficient to cover = all >> rights required by Apache to operate the project. >> >> =3D=3D External Dependencies =3D=3D >> External dependencies distributed with Druid currently all have one of t= he >> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with = one >> exception: the optional Druid MySQL metadata store extension depends on >> MySQL Connector/J, which is GPL licensed. Druid currently packages this = as >> a separate download; see our current presentation on: >> http://druid.io/downloads.html. As part of incubation we intend to >> determine the best strategy for handling the MySQL extension. >> >> =3D=3D Cryptography =3D=3D >> Not applicable. >> >> =3D=3D Initial Committers =3D=3D >> >> The initial committers for incubation are the current set of committers = on >> Druid who have expressed interest in being involved in Apache incubation= . >> Affiliations are listed where relevant. We may seek to add other committ= ers >> during incubation; for example, we would want to add any current Druid >> committers who express an interest after incubation begins. >> >> * Charles Allen (charles@allen-net.com) (Snap) >> * David Lim (david.clarence.lim@gmail.com) (Imply) >> * Eric Tschetter (cheddar@apache.org) (Splunk) >> * Fangjin Yang (fj@imply.io) (Imply) >> * Gian Merlino (gian@apache.org) (Imply) >> * Himanshu Gupta (g.himanshu@gmail.com) (Oath) >> * Jihoon Son (jihoonson@apache.org) (Imply) >> * Jonathan Wei (jon.wei@imply.io) (Imply) >> * Maxime Beauchemin (maximebeauchemin@gmail.com) (Lyft) >> * Mohamed Slim Bouguerra (slim.bouguerra@gmail.com) (Hortonworks) >> * Nishant Bangarwa (nishant@apache.org) (Hortonworks) >> * Parag Jain (paragjain16@gmail.com) (Oath) >> * Roman Leventov (leventov.ru@gmail.com) (Metamarkets) >> * Xavier L=C3=A9aut=C3=A9 (xavier@leaute.com) (Confluent) >> >> =3D=3D Sponsors =3D=3D >> >> * Champion: Julian Hyde >> * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao >> * Sponsoring entity: Apache Incubator >> --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org