incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: [PROPOSAL] Grill as new Incubator project
Date Mon, 22 Sep 2014 02:46:26 GMT
Thank you Sharad!

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Sharad Agarwal <sharad@apache.org>
Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>,
"sharad@apache.org" <sharad@apache.org>
Date: Friday, September 19, 2014 8:59 PM
To: "general@incubator.apache.org" <general@incubator.apache.org>
Subject: Re: [PROPOSAL] Grill as new Incubator project

>Chris,
>Multi-dimensional here is in the context of OLAP cube ->
>http://en.wikipedia.org/wiki/OLAP_cube
>Grill data model consists of set of measures which can be analysed on
>different dimensions.
>For remote sensing, data can be modelled as cube ->  measurement on
>various
>set of attributes(dimensions) as Facts; and time and space can be thought
>of dimensions.
>Yes, it supports numerical data.
>
>
>Ted,
>Both are in same general area, but I think there is very little chance of
>confusion as clearly their propositions are completely different. And both
>words are simple and widely used nouns.
>We liked the name Grill as it is simple to spell and pronounce, and in
>some
>way convey the project's meaning -> to question intensely.
>
>Thanks,
>Sharad
>
>On Sat, Sep 20, 2014 at 12:11 AM, Ted Dunning <ted.dunning@gmail.com>
>wrote:
>
>> There is a strong phonetic similarity to Apache Drill, a project in the
>> same general domain.
>>
>> Is the Grill name already baked in (pun intended)?
>>
>>
>>
>> On Fri, Sep 19, 2014 at 7:24 AM, Mattmann, Chris A (3980) <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>
>> > Thank you Sharad. So I could use this system for remote sensing
>> > data, like 3-dimension (time, space, and measurement) type of cubes?
>> > Does it support numerical data well?
>> >
>> > Sorry for so many questions just excited :)
>> >
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > Chris Mattmann, Ph.D.
>> > Chief Architect
>> > Instrument Software and Science Data Systems Section (398)
>> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > Office: 168-519, Mailstop: 168-527
>> > Email: chris.a.mattmann@nasa.gov
>> > WWW:  http://sunset.usc.edu/~mattmann/
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > Adjunct Associate Professor, Computer Science Department
>> > University of Southern California, Los Angeles, CA 90089 USA
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >
>> >
>> >
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Sharad Agarwal <sharad@apache.org>
>> > Reply-To: "sharad@apache.org" <sharad@apache.org>
>> > Date: Friday, September 19, 2014 4:06 AM
>> > To: Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>
>> > Cc: "general@incubator.apache.org" <general@incubator.apache.org>
>> > Subject: Re: [PROPOSAL] Grill as new Incubator project
>> >
>> > >Chris, Thanks for your comments.
>> > >
>> > >
>> > >The differences that I see are:
>> > >- SciDB exposes Array Data model and Array Query Language (AQL).
>>Grill
>> > >data model is based on OLAP Fact and Dimensions. Grill exposes SQL
>>like
>> > >language (a subset of Hive QL) that works on *logical* entities
>>(facts,
>> > >dimensions)
>> > >
>> > >
>> > >- The goal of Grill is not to build a new query execution database,
>>but
>> > >to unify them by having a central metadata catalog, and provide a
>>Cube
>> > >abstraction layer on top of it.
>> > >
>> > >
>> > >
>> > >Thanks,
>> > >Sharad
>> > >
>> > >
>> > >On Fri, Sep 19, 2014 at 9:34 AM, Mattmann, Chris A (3980)
>> > ><chris.a.mattmann@jpl.nasa.gov> wrote:
>> > >
>> > >This sounds super cool!
>> > >
>> > >How does this relate to SciDB? is it trying to do a similar thing?
>> > >
>> > >Cheers,
>> > >Chris
>> > >
>> > >
>> > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >Chris Mattmann, Ph.D.
>> > >Chief Architect
>> > >Instrument Software and Science Data Systems Section (398)
>> > >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > >Office: 168-519, Mailstop: 168-527
>> > >Email: chris.a.mattmann@nasa.gov
>> > >WWW:  http://sunset.usc.edu/~mattmann/
>> > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >Adjunct Associate Professor, Computer Science Department
>> > >University of Southern California, Los Angeles, CA 90089 USA
>> > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >-----Original Message-----
>> > >From: Sharad Agarwal <sharad@apache.org>
>> > >Reply-To: "general@incubator.apache.org"
>><general@incubator.apache.org
>> >,
>> > >"sharad@apache.org" <sharad@apache.org>
>> > >Date: Thursday, September 18, 2014 8:54 PM
>> > >To: "general@incubator.apache.org" <general@incubator.apache.org>
>> > >Subject: [PROPOSAL] Grill as new Incubator project
>> > >
>> > >>Grill Proposal
>> > >>==========
>> > >>
>> > >># Abstract
>> > >>
>> > >>Grill is a platform that enables multi-dimensional queries in a
>>unified
>> > >>way
>> > >>over datasets stored in multiple warehouses. Grill integrates Apache
>> Hive
>> > >>with other data warehouses by tiering them together to form logical
>> data
>> > >>cubes.
>> > >>
>> > >>
>> > >># Proposal
>> > >>
>> > >>Grill provides a unified Cube abstraction for data stored in
>>different
>> > >>stores. Grill tiers multiple data warehouses for unified
>>representation
>> > >>and
>> > >>efficient access. It provides SQL-like Cube query language to query
>>and
>> > >>describe data sets organized in data cubes. It enables users to run
>> > >>queries
>> > >>against Facts and Dimensions that can span multiple physical tables
>> > >>stored
>> > >>in different stores.
>> > >>
>> > >>The primary use cases that Grill aims to solve:
>> > >>- Facilitate analytical queries by providing the OLAP like Cube
>> > >>abstraction
>> > >>- Data Discovery by providing single metadata layer for data stored
>>in
>> > >>different stores
>> > >>- Unified access to data by integrating Hive with other traditional
>> data
>> > >>warehouses
>> > >>
>> > >>
>> > >># Background
>> > >>
>> > >>Apache Hive is a data warehouse that facilitates querying and
>>managing
>> > >>large datasets stored in distributed storage systems like HDFS. It
>> > >>provides
>> > >>SQL like language called HiveQL aka HQL.  Apache Hive is a widely
>>used
>> > >>platform in various organizations for doing adhoc analytical
>>queries.
>> > >>In a typical Data warehouse scenario, the data is multi-dimensional
>>and
>> > >>organized into Facts and Dimensions to form Data Cubes. Grill
>>provides
>> > >>this
>> > >>logical layer to enable querying and manage data as Cubes.
>> > >>The Grill project is actively being developed at InMobi to provide
>>the
>> > >>higher level of analytical abstraction to query data stored in
>> different
>> > >>storages including Hive and beyond seamlessly.
>> > >>
>> > >>
>> > >># Rationale
>> > >>
>> > >>The Grill project aims to ease the analytical querying capabilities
>>and
>> > >>cut
>> > >>the data-silos by providing a single view of data across multiple
>>data
>> > >>stores.
>> > >>Conceiving data as a cube with hierarchical dimensions leads to
>> > >>conceptually straightforward operations to facilitate analysis.
>> > >>Integrating
>> > >>Apache Hive with other traditional warehouses provides the
>>opportunity
>> to
>> > >>optimize on the query execution cost by tiering the data across
>> multiple
>> > >>warehouses. Grill provides
>> > >>- Access to data Cubes via Cube Query language similar to HiveQL.
>> > >>- Driver based architecture to allow for plugging systems like Hive
>>and
>> > >>other warehouses such as columnar data RDBMS.
>> > >>- Cost based engine selection that provides optimal use of
>>resources by
>> > >>selecting the best execution engine for a given query.
>> > >>
>> > >>In a typical Data warehouse, data is organized in Cubes with
>>multiple
>> > >>dimensions and measures. This facilitates the analysis by conceiving
>> the
>> > >>data in terms of Facts and Dimensions instead of physical tables.
>>Grill
>> > >>aims to provide this logical Cube abstraction on Data warehouses
>>like
>> > >>Hive
>> > >>and other traditional warehouses.
>> > >>
>> > >>
>> > >># Initial Goals
>> > >>
>> > >>- Donate the Grill source code and documentation to Apache Software
>> > >>Foundation
>> > >>- Build a user and developer community
>> > >>- Support Hive and other Columnar data warehouses
>> > >>- Support full query life cycle management
>> > >>- Add authentication for querying cubes
>> > >>- Provide detailed query statistics
>> > >>
>> > >>
>> > >># Long Term Goals
>> > >>
>> > >>Here are some longer-term capabilities that would be added to Grill
>> > >>- Add authorization for managing and querying Cubes
>> > >>- Provide REST and CLI for full Admin controls
>> > >>- Capability to schedule queries
>> > >>- Query caching
>> > >>- Integrate with Apache Spark. Creating Spark RDD from Grill query
>> > >>- Integrate with Apache Optiq
>> > >>
>> > >>
>> > >># Current Status
>> > >>
>> > >>The project is actively developed at InMobi. The first version is
>> > >>deployed
>> > >>at InMobi 4 months back. This version allows querying dimension and
>> fact
>> > >>data stored in Hive over CLI. The source code and documentation is
>> hosted
>> > >>at GitHub.
>> > >>
>> > >>## Meritocracy
>> > >>
>> > >>We intend to build a diverse developer and user community for the
>> project
>> > >>following the Apache meritocracy model. We want to encourage
>> contributors
>> > >>from multiple organizations, provide plenty of support to new
>> developers
>> > >>and welcome them to be committers.
>> > >>
>> > >>## Community
>> > >>
>> > >>Currently the project is being developed at InMobi. We hope to
>>extend
>> our
>> > >>contributor and user base significantly in the future and build a
>>solid
>> > >>open source community around Grill.
>> > >>Core Developers
>> > >>Grill is currently being developed by Amareshwari Sriramadasu,
>>Sharad
>> > >>Agarwal and Jaideep Dhok from InMobi, and Sreekanth Ramakrishnan
>>who is
>> > >>currently employed by SoftwareAG. Raghavendra Singh from InMobi has
>> built
>> > >>the QA automation for Grill.
>> > >>
>> > >>## Alignment
>> > >>
>> > >>The ASF is a natural home to Grill as it is for Apache Hadoop,
>>Apache
>> > >>Hive,
>> > >>Apache Spark and other emerging projects in Big Data space.
>> > >>We believe in any enterprise, multiple data warehouses will
>>co-exist,
>> as
>> > >>not all workloads are cost effective to run on single one. Apache
>>Hive
>> is
>> > >>one of the crucial data warehouse along with upcoming projects like
>> > >>Apache
>> > >>Spark in Hadoop ecosystem. Grill will benefit in working in close
>> > >>proximity
>> > >>with these projects.
>> > >>The traditional Columnar data warehouses complement Apache Hive as
>> > >>certain
>> > >>workloads continue to be cost effective to run in traditional
>>columnar
>> > >>data
>> > >>warehouses. Having multiple data warehouses leads to data silos that
>> > >>Grill
>> > >>aims to cut within the enterprise and provide a holistic unified
>>access
>> > >>to
>> > >>data.
>> > >>
>> > >>
>> > >># Known Risks
>> > >>
>> > >>## Orphaned products & Reliance on Salaried Developers
>> > >>
>> > >>There is little risk of Grill getting orphaned, as Grill is key
>>part of
>> > >>the
>> > >>Data Platform stack at InMobi. The core Grill developers plan to
>>work
>> on
>> > >>it
>> > >>full-time. We think Grill will bring value in the Big Data space
>>and we
>> > >>plan to grow the community of users and contributors.
>> > >>
>> > >>## Inexperience with Open Source
>> > >>
>> > >>All the core developers have long and significant experience in
>>Apache
>> > >>projects and Hadoop ecosystem. Amareshwari Sriramadasu has long
>> standing
>> > >>contributions to Apache Hadoop MapReduce and Apache Hive, she being
>>PMC
>> > >>member of Hadoop and a committer of Hive. Sharad Agarwal is a PMC
>> member
>> > >>of
>> > >>Hadoop and contributed to Hadoop YARN and Hadoop MapReduce. Srikanth
>> > >>Sundarrajan is a PMC member of Apache Falcon.  Sreekanth
>>Ramakrishnan
>> is
>> > >>committer of Apache Hadoop.  Jaideep Dhok has contributed patches to
>> > >>Apache
>> > >>Hive. Gunther is a PMC member of Apache Hive. Vikram is a committer
>>of
>> > >>Apache Hive.
>> > >>
>> > >>## Homogeneous Developers
>> > >>
>> > >>The initial developers are employed by Hortonworks, InMobi and
>> > >>SoftwareAG.
>> > >>We are committed to recruiting additional committers from other
>> companies
>> > >>based on their contribution to the project.
>> > >>
>> > >>## Reliance on Salaried Developers
>> > >>
>> > >>The majority of initial committers are paid by their employee to
>> > >>contribute
>> > >>to the project and few are contributing in their spare time. Once
>>the
>> > >>project has a community built, we are committed to recruit
>>committers
>> and
>> > >>developers from outside the current core developers.
>> > >>
>> > >>## Relationships with Other Apache Products
>> > >>
>> > >>Grill is deeply integrated with other Apache projects. Grill uses
>>and
>> > >>extends Apache Hive HCatalog to store and manage the Data cubes. It
>> uses
>> > >>HDFS and Hive session management libraries. Grill has the
>>driver-based
>> > >>architecture that allows for adding multiple execution drivers.
>>Apart
>> > >>from
>> > >>integrating Apache Hive, it can be integrated with Apache Spark over
>> > >>Spark
>> > >>SQL or Shark, Apache Drill, Apache Tajo and Apache Phoenix.
>> > >>In future we want to use Apache Optiq in Grill for query
>>optimization
>> and
>> > >>cost based driver selection.
>> > >>
>> > >>## An Excessive Fascination with the Apache Brand
>> > >>
>> > >>The project is conceived from beginning to be in line with the
>>Apache
>> > >>philosophy. As the core developers have good experience with Apache,
>> the
>> > >>source code organization, build, review and commit process are
>>highly
>> > >>influenced by Apache. We believe that Apache will be a solid home
>>for
>> > >>Grill
>> > >>to grow and build the open source community. We have also described
>>the
>> > >>reasons in the Rationale and Alignment sections.
>> > >>
>> > >>
>> > >># Documentation
>> > >>
>> > >>http://inmobi.github.io/grill/
>> > >>
>> > >>
>> > >># Initial Source
>> > >>
>> > >>The source is currently in github repository at:
>> > >>https://github.com/inmobi/grill
>> > >>
>> > >>
>> > >># Source and Intellectual Property Submission Plan
>> > >>
>> > >>The complete Grill code is already under Apache Software License 2.
>> > >>
>> > >>
>> > >># External Dependencies
>> > >>
>> > >>The dependencies all have Apache compatible licenses. These include
>> > >>Apache
>> > >>2.0, BSD, MIT, EPL and CDDL licensed dependencies.
>> > >>
>> > >>
>> > >># Cryptography
>> > >>
>> > >>None
>> > >>
>> > >>
>> > >># Required Resources
>> > >>
>> > >>## Mailing lists
>> > >>
>> > >>grill-dev AT incubator DOT apache DOT org
>> > >>grill-commits AT incubator DOT apache DOT org
>> > >>grill-private AT incubator DOT apache DOT org
>> > >>
>> > >>## Subversion Directory
>> > >>
>> > >>Git is the preferred source control system: git://
>> > >>git.apache.org/incubator-grill
>><http://git.apache.org/incubator-grill>
>> > >>
>> > >>## Issue Tracking
>> > >>
>> > >>JIRA Grill (GRILL)
>> > >>
>> > >>
>> > >># Initial Committers
>> > >>
>> > >>Amareshwari Sriramadasu (amareshwari AT apache DOT org)
>> > >>Gunther Hagleitner (gunther AT apache DOT org)
>> > >>Jaideep Dhok (jaideep.dhok AT Inmobi DOT com)
>> > >>Raghavendra Singh (raghavendra.singh AT Inmobi DOT com)
>> > >>Sharad Agarwal (sharad AT apache DOT org)
>> > >>Sreekanth Ramakrishnan (sreekanth AT apache DOT org)
>> > >>Srikanth Sundarrajan (sriksun AT apache DOT org)
>> > >>Suma Shivaprasad (suma.shivaprasad AT Inmobi DOT com)
>> > >>Vikram Dixit (vikram AT apache DOT org)
>> > >>
>> > >>
>> > >># Affiliations
>> > >>
>> > >>Amareshwari SR (InMobi)
>> > >>Gunther Hagleitner (Hortonworks)
>> > >>Jaideep Dhok (InMobi)
>> > >>Raghavendra Singh (InMobi)
>> > >>Sharad Agarwal (InMobi)
>> > >>Sreekanth Ramakrishnan (SoftwareAG)
>> > >>Srikanth Sundarrajan (InMobi)
>> > >>Suma Shivaprasad (InMobi)
>> > >>Vikram Dixit (Hortonworks)
>> > >>
>> > >>
>> > >># Sponsors
>> > >>
>> > >>## Champion
>> > >>
>> > >>Vinod K <vinodkv AT apache DOT org> (Apache Member)
>> > >>
>> > >>## Nominated Mentors
>> > >>
>> > >>Chris Douglas (Microsoft)
>> > >>Jacob Homan (Microsoft)
>> > >>Jean Baptiste Onofre (Talend)
>> > >>Vinod K (Hortonworks)
>> > >>
>> > >>## Sponsoring Entity
>> > >>
>> > >>Incubator PMC
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> > For additional commands, e-mail: general-help@incubator.apache.org
>> >
>> >
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message