incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selvamohan Neethiraj <sneet...@apache.org>
Subject Re: [DISCUSS] Druid incubation proposal
Date Thu, 22 Feb 2018 03:08:05 GMT
+1 for adding Druid to ASF

Thanks,
Selva-

> On Feb 21, 2018, at 10:03 PM, Jitendra Pandey <jitendra@hortonworks.com> wrote:
> 
> +1
> Druid will be a great addition to ASF.
> 
> On 2/21/18, 5:06 PM, "Ashutosh Chauhan" <hashutosh@apache.org> wrote:
> 
>    +1 for Druid in ASF.
>    I have been involved with Hive Druid integration. If you are looking for
>    mentors, happy to help.
> 
>    Thanks,
>    Ashutosh
> 
>    On Fri, Feb 16, 2018 at 2:20 PM, Tom Barber <tom@spicule.co.uk> wrote:
> 
>> I can second most of that from the peanut gallery, my high level
>> interactions with a few Druid folk and keeping a watchful eye on a very
>> exciting project over the last few years.
>> 
>> I think the Druid project would make an excellent addition to the ASF
>> portfolio.
>> 
>> Tom
>> 
>> 
>> On 16/02/18 22:17, Julian Hyde wrote:
>> 
>>> As Champion for this proposal, let me say that the Druid project will be
>>> an excellent addition to the ASF. I have been an observer of the project
>>> for a couple of years, and in many respects it is already operating in the
>>> Apache Way. Druid had paid developers from a number of companies, some of
>>> whom were in competition, and its governance was strong enough to navigate
>>> the choppy waters that that can create.
>>> 
>>> A number of Druid committers subsequently started to work on Apache
>>> projects (Gian on Calcite, and Slim and Nishant on Hive) and so already
>>> know what to expect.
>>> 
>>> You can get a sense of the project dynamic by reading the archives of
>>> their dev list: https://groups.google.com/forum/#!forum/druid-development
>>> <https://groups.google.com/forum/#!forum/druid-development>
>>> 
>>> Julian
>>> 
>>> 
>>> On Feb 16, 2018, at 12:15 PM, Gian Merlino <gian@apache.org> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I would like to open up a discussion about incubating Druid at Apache.
>>>> I've
>>>> included a proposal in this mail and have also posted a draft at
>>>> https://wiki.apache.org/incubator/DruidProposal. More information about
>>>> Druid is also available on our project web site at: http://druid.io/
>>>> 
>>>> Thanks for your consideration!
>>>> 
>>>> Gian
>>>> 
>>>> = Druid Proposal =
>>>> 
>>>> == Abstract ==
>>>> 
>>>> Druid is a high-performance, column-oriented, distributed data store.
>>>> 
>>>> == Proposal ==
>>>> 
>>>> Druid is an open source data store designed for real-time exploratory
>>>> analytics on large data sets. Druid's key features are a column-oriented
>>>> storage layout, a distributed shared-nothing architecture, and ability to
>>>> generate and leverage indexing and caching structures. Druid is typically
>>>> deployed in clusters of tens to hundreds of nodes, and has the ability to
>>>> load data from Apache Kafka and Apache Hadoop, among other data sources.
>>>> Druid offers two query languages: a SQL dialect (powered by Apache
>>>> Calcite)
>>>> and a JSON-over-HTTP API.
>>>> 
>>>> Druid was originally developed to power a slice-and-dice analytical UI
>>>> built on top of large event streams. The original use case for Druid
>>>> targeted ingest rates of millions of records/sec, retention of over a
>>>> year
>>>> of data, and query latencies of sub-second to a few seconds. Many people
>>>> can benefit from such capability, and many already have (see
>>>> http://druid.io/druid-powered.html). In addition, new use cases have
>>>> emerged since Druid's original development, such as OLAP acceleration of
>>>> data warehouse tables and more highly concurrent applications operating
>>>> with relatively narrower queries.
>>>> 
>>>> == Background ==
>>>> 
>>>> Druid is a data store designed for fast analytics. It would typically be
>>>> used in lieu of more general purpose query systems like Hadoop !MapReduce
>>>> or Spark when query latency is of the utmost importance. Druid is often
>>>> used as a data store for powering GUI analytical applications.
>>>> 
>>>> The buzzwordy description of Druid is a high-performance,
>>>> column-oriented,
>>>> distributed data store. What we mean by this is:
>>>> 
>>>> * "high performance": Druid aims to provide low query latency and high
>>>> ingest rates possible.
>>>> * "column-oriented": Druid stores data in a column-oriented format, like
>>>> most other systems designed for analytics. It can also store indexes
>>>> along
>>>> with the columns.
>>>> * "distributed": Druid is deployed in clusters, typically of tens to
>>>> hundreds of nodes.
>>>> * "data store": Druid loads your data and stores a copy of it on the
>>>> cluster's local disks (and may cache it in memory). It doesn't query your
>>>> data from some other storage system.
>>>> 
>>>> == Rationale ==
>>>> 
>>>> Druid is a mature, active project with a large number of production
>>>> installations, dozens of contributors to each release, and multiple
>>>> vendors
>>>> offering professional support. Given Druid's strong community, its close
>>>> integration with many other Apache projects (such as Kafka, Hadoop, and
>>>> Calcite), and its pre-existing Apache-inspired governance structure, we
>>>> feel that Apache is the best home for the project on a long-term basis.
>>>> 
>>>> == Current Status ==
>>>> 
>>>> === Meritocracy ===
>>>> Since Druid was first open sourced the original developers have solicited
>>>> contributions from others, including through our blog, the project
>>>> mailing
>>>> lists, and through accepting !GitHub pull requests. We have an
>>>> Apache-inspired governance structure with a PMC and committers, and our
>>>> committer ranks include a good number of people from outside the original
>>>> development team.
>>>> 
>>>> === Community ===
>>>> 
>>>> The Druid core developers have sought to nurture a community throughout
>>>> the
>>>> life of the project. We use !GitHub as the focal point for bug reports
>>>> and
>>>> code contributions, and the mailing lists for most other discussion. To
>>>> try
>>>> to make people feel welcome, we've also spelled this out on a
>>>> "CONTRIBUTE"
>>>> link from the project page: http://druid.io/community/. Today we have an
>>>> active contributor base (a typical release has ~40 contributors) and
>>>> mailing list.
>>>> 
>>>> === Core Developers ===
>>>> 
>>>> Druid enjoys good diversity of committer affiliation. The most active
>>>> developers over the past year are affiliated with four different
>>>> companies:
>>>> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are
>>>> also
>>>> committers on other ASF projects as well, including Apache Airflow,
>>>> Apache
>>>> Curator, and Apache Calcite. The original developers of Druid remain
>>>> involved in the project.
>>>> 
>>>> === Alignment ===
>>>> 
>>>> Druid's current governance structure is Apache-inspired with a PMC and
>>>> committers chosen by a meritocratic process. Additionally, Druid
>>>> integrates
>>>> with a number of other Apache projects, including Kafka, Hadoop, Hive,
>>>> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
>>>> 
>>>> == Known Risks ==
>>>> 
>>>> === Orphaned products ===
>>>> 
>>>> The risk of Druid becoming orphaned is low, due to a diverse committer
>>>> base
>>>> that is invested in the future of the project.
>>>> 
>>>> === Inexperience with Open Source ===
>>>> 
>>>> Druid's core developers have been running it as a community-oriented open
>>>> source project for some time now, and many of them are committers on
>>>> other
>>>> open source projects as well, including Apache Airflow, Apache Curator,
>>>> and
>>>> Apache Calcite.
>>>> 
>>>> === Homogenous Developers ===
>>>> 
>>>> Druid's current diversity of committer affiliation means that we have
>>>> become accustomed to working collaboratively and in the open. We hope
>>>> that
>>>> a transition to the ASF helps Druid's contributor base become even more
>>>> diverse.
>>>> 
>>>> === Reliance on Salaried Developers ===
>>>> 
>>>> Druid's user base and contributor base skews heavily towards salaried
>>>> developers. We believe this is natural since Druid is a technology
>>>> designed
>>>> to be deployed on large clusters, and due to this, tends to be deployed
>>>> by
>>>> organizations rather than by individuals. Nevertheless, many current
>>>> Druid
>>>> developers have continued working on the project even through job
>>>> changes,
>>>> which we take to be a good sign of developer commitment and personal
>>>> interest.
>>>> 
>>>> === Relationships with Other Apache Products ===
>>>> 
>>>> Druid integrates with a number of other Apache projects. Druid internally
>>>> uses Calcite for SQL planning, and Curator and !ZooKeeper for
>>>> coordination.
>>>> Druid can read data in Avro or Parquet format. Druid can load data from
>>>> streams in Kafka or from files in Hadoop. Druid integrates with Hive as
>>>> an
>>>> option for SQL query acceleration. Druid data can be visualized by
>>>> Superset
>>>> (incubating).
>>>> 
>>>> === A Excessive Fascination with the Apache Brand ===
>>>> 
>>>> Druid is a successful project with a diverse community. The main reason
>>>> for
>>>> pursuing incubation is to find a stable, long term home for the project
>>>> with a well known governance philosophy.
>>>> 
>>>> == Required Resources ==
>>>> 
>>>> === Mailing lists ===
>>>> 
>>>> We would like to migrate the existing Druid mailing lists from Google
>>>> Groups to Apache.
>>>> 
>>>> * druid-user@googlegroups -> users@druid.incubator.apache.org
>>>> * druid-development@googlegroups -> dev@druid.incubator.apache.org
>>>> 
>>>> === Source control ===
>>>> 
>>>> Druid development currently takes place on !GitHub. We would like to
>>>> continue using !GitHub, if possible, in order to preserve the workflows
>>>> the
>>>> community has developed around !GitHub pull requests.
>>>> 
>>>> === Issue tracking ===
>>>> Druid currently uses !GitHub issues for issue tracking. We would like to
>>>> migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.
>>>> 
>>>> == Documentation ==
>>>> 
>>>> Druid's documentation can be found at http://druid.io/docs/latest/.
>>>> 
>>>> == Initial Source ==
>>>> 
>>>> Druid was initially open-sourced by Metamarkets in 2012 and has been run
>>>> in
>>>> a community-governed fashion since then. The code is currently hosted at
>>>> https://github.com/druid-io/ and includes the following repositories:
>>>> 
>>>> * druid (primary repository)
>>>> * druid-console (web console for Druid)
>>>> * druid-io.github.io (source for Druid's website at http://druid.io/)
>>>> * tranquility (realtime stream push client for Druid)
>>>> * docker-druid (Docker image for Druid)
>>>> * pydruid (Python library)
>>>> * RDruid (R library)
>>>> * oss-parent (Maven POM files)
>>>> 
>>>> == Source and Intellectual Property Submission Plan ==
>>>> 
>>>> A complete set of the open source code needs to be licensed from the
>>>> owning
>>>> organization to the Foundation. Commercial legal counsel for the owning
>>>> organization will review the standard Foundation licensing paperwork and
>>>> propose any updates as needed. This license will enable Apache to
>>>> incubate
>>>> and manage the Druid project moving forward.
>>>> 
>>>> Other Druid paraphernalia to be transferred to Apache consists of:
>>>> 
>>>> * !GitHub organization at https://github.com/druid-io/
>>>> * Twitter account at https://twitter.com/druidio
>>>> * "druid.io" domain name
>>>> * "Druid" trademark assignment per Foundation standard paper.  The
>>>> trademark assignment paperwork shall be reviewed by the owning
>>>> organization's commercial and IP counsel
>>>> * CLAs - all rights in the code licensed above should encompass the CLAs
>>>> that existed between developers and owning organization
>>>> 
>>>> A copyright license to the code, trademark assignment of Druid, and
>>>> transfer of other paraphernalia to Apache should be sufficient to cover
>>>> all
>>>> rights required by Apache to operate the project.
>>>> 
>>>> == External Dependencies ==
>>>> External dependencies distributed with Druid currently all have one of
>>>> the
>>>> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with
>>>> one
>>>> exception: the optional Druid MySQL metadata store extension depends on
>>>> MySQL Connector/J, which is GPL licensed. Druid currently packages this
>>>> as
>>>> a separate download; see our current presentation on:
>>>> http://druid.io/downloads.html. As part of incubation we intend to
>>>> determine the best strategy for handling the MySQL extension.
>>>> 
>>>> == Cryptography ==
>>>> Not applicable.
>>>> 
>>>> == Initial Committers ==
>>>> 
>>>> The initial committers for incubation are the current set of committers
>>>> on
>>>> Druid who have expressed interest in being involved in Apache incubation.
>>>> Affiliations are listed where relevant. We may seek to add other
>>>> committers
>>>> during incubation; for example, we would want to add any current Druid
>>>> committers who express an interest after incubation begins.
>>>> 
>>>> * Charles Allen (charles@allen-net.com) (Snap)
>>>> * David Lim (david.clarence.lim@gmail.com) (Imply)
>>>> * Eric Tschetter (cheddar@apache.org) (Splunk)
>>>> * Fangjin Yang (fj@imply.io) (Imply)
>>>> * Gian Merlino (gian@apache.org) (Imply)
>>>> * Himanshu Gupta (g.himanshu@gmail.com) (Oath)
>>>> * Jihoon Son (jihoonson@apache.org) (Imply)
>>>> * Jonathan Wei (jon.wei@imply.io) (Imply)
>>>> * Maxime Beauchemin (maximebeauchemin@gmail.com) (Lyft)
>>>> * Mohamed Slim Bouguerra (slim.bouguerra@gmail.com) (Hortonworks)
>>>> * Nishant Bangarwa (nishant@apache.org) (Hortonworks)
>>>> * Parag Jain (paragjain16@gmail.com) (Oath)
>>>> * Roman Leventov (leventov.ru@gmail.com) (Metamarkets)
>>>> * Xavier Léauté (xavier@leaute.com) (Confluent)
>>>> 
>>>> == Sponsors ==
>>>> 
>>>> * Champion: Julian Hyde
>>>> * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
>>>> * Sponsoring entity: Apache Incubator
>>>> 
>>> 
>>> 
>> 
>> --
>> 
>> 
>> Spicule Limited is registered in England & Wales. Company Number:
>> 09954122. Registered office: First Floor, Telecom House, 125-135 Preston
>> Road, Brighton, England, BN1 6AF. VAT No. 251478891.
>> 
>> 
>> All engagements are subject to Spicule Terms and Conditions of Business.
>> This email and its contents are intended solely for the individual to whom
>> it is addressed and may contain information that is confidential,
>> privileged or otherwise protected from disclosure, distributing or copying.
>> Any views or opinions presented in this email are solely those of the
>> author and do not necessarily represent those of Spicule Limited. The
>> company accepts no liability for any damage caused by any virus transmitted
>> by this email. If you have received this message in error, please notify us
>> immediately by reply email before deleting it from your system. Service of
>> legal notice cannot be effected on Spicule Limited by email.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>> 
>> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message