incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sharad Agarwal <>
Subject [PROPOSAL] Grill as new Incubator project
Date Fri, 19 Sep 2014 03:54:09 GMT
Grill Proposal

# Abstract

Grill is a platform that enables multi-dimensional queries in a unified way
over datasets stored in multiple warehouses. Grill integrates Apache Hive
with other data warehouses by tiering them together to form logical data

# Proposal

Grill provides a unified Cube abstraction for data stored in different
stores. Grill tiers multiple data warehouses for unified representation and
efficient access. It provides SQL-like Cube query language to query and
describe data sets organized in data cubes. It enables users to run queries
against Facts and Dimensions that can span multiple physical tables stored
in different stores.

The primary use cases that Grill aims to solve:
- Facilitate analytical queries by providing the OLAP like Cube abstraction
- Data Discovery by providing single metadata layer for data stored in
different stores
- Unified access to data by integrating Hive with other traditional data

# Background

Apache Hive is a data warehouse that facilitates querying and managing
large datasets stored in distributed storage systems like HDFS. It provides
SQL like language called HiveQL aka HQL.  Apache Hive is a widely used
platform in various organizations for doing adhoc analytical queries.
In a typical Data warehouse scenario, the data is multi-dimensional and
organized into Facts and Dimensions to form Data Cubes. Grill provides this
logical layer to enable querying and manage data as Cubes.
The Grill project is actively being developed at InMobi to provide the
higher level of analytical abstraction to query data stored in different
storages including Hive and beyond seamlessly.

# Rationale

The Grill project aims to ease the analytical querying capabilities and cut
the data-silos by providing a single view of data across multiple data
Conceiving data as a cube with hierarchical dimensions leads to
conceptually straightforward operations to facilitate analysis. Integrating
Apache Hive with other traditional warehouses provides the opportunity to
optimize on the query execution cost by tiering the data across multiple
warehouses. Grill provides
- Access to data Cubes via Cube Query language similar to HiveQL.
- Driver based architecture to allow for plugging systems like Hive and
other warehouses such as columnar data RDBMS.
- Cost based engine selection that provides optimal use of resources by
selecting the best execution engine for a given query.

In a typical Data warehouse, data is organized in Cubes with multiple
dimensions and measures. This facilitates the analysis by conceiving the
data in terms of Facts and Dimensions instead of physical tables. Grill
aims to provide this logical Cube abstraction on Data warehouses like Hive
and other traditional warehouses.

# Initial Goals

- Donate the Grill source code and documentation to Apache Software
- Build a user and developer community
- Support Hive and other Columnar data warehouses
- Support full query life cycle management
- Add authentication for querying cubes
- Provide detailed query statistics

# Long Term Goals

Here are some longer-term capabilities that would be added to Grill
- Add authorization for managing and querying Cubes
- Provide REST and CLI for full Admin controls
- Capability to schedule queries
- Query caching
- Integrate with Apache Spark. Creating Spark RDD from Grill query
- Integrate with Apache Optiq

# Current Status

The project is actively developed at InMobi. The first version is deployed
at InMobi 4 months back. This version allows querying dimension and fact
data stored in Hive over CLI. The source code and documentation is hosted
at GitHub.

## Meritocracy

We intend to build a diverse developer and user community for the project
following the Apache meritocracy model. We want to encourage contributors
from multiple organizations, provide plenty of support to new developers
and welcome them to be committers.

## Community

Currently the project is being developed at InMobi. We hope to extend our
contributor and user base significantly in the future and build a solid
open source community around Grill.
Core Developers
Grill is currently being developed by Amareshwari Sriramadasu, Sharad
Agarwal and Jaideep Dhok from InMobi, and Sreekanth Ramakrishnan who is
currently employed by SoftwareAG. Raghavendra Singh from InMobi has built
the QA automation for Grill.

## Alignment

The ASF is a natural home to Grill as it is for Apache Hadoop, Apache Hive,
Apache Spark and other emerging projects in Big Data space.
We believe in any enterprise, multiple data warehouses will co-exist, as
not all workloads are cost effective to run on single one. Apache Hive is
one of the crucial data warehouse along with upcoming projects like Apache
Spark in Hadoop ecosystem. Grill will benefit in working in close proximity
with these projects.
The traditional Columnar data warehouses complement Apache Hive as certain
workloads continue to be cost effective to run in traditional columnar data
warehouses. Having multiple data warehouses leads to data silos that Grill
aims to cut within the enterprise and provide a holistic unified access to

# Known Risks

## Orphaned products & Reliance on Salaried Developers

There is little risk of Grill getting orphaned, as Grill is key part of the
Data Platform stack at InMobi. The core Grill developers plan to work on it
full-time. We think Grill will bring value in the Big Data space and we
plan to grow the community of users and contributors.

## Inexperience with Open Source

All the core developers have long and significant experience in Apache
projects and Hadoop ecosystem. Amareshwari Sriramadasu has long standing
contributions to Apache Hadoop MapReduce and Apache Hive, she being PMC
member of Hadoop and a committer of Hive. Sharad Agarwal is a PMC member of
Hadoop and contributed to Hadoop YARN and Hadoop MapReduce. Srikanth
Sundarrajan is a PMC member of Apache Falcon.  Sreekanth Ramakrishnan is
committer of Apache Hadoop.  Jaideep Dhok has contributed patches to Apache
Hive. Gunther is a PMC member of Apache Hive. Vikram is a committer of
Apache Hive.

## Homogeneous Developers

The initial developers are employed by Hortonworks, InMobi and SoftwareAG.
We are committed to recruiting additional committers from other companies
based on their contribution to the project.

## Reliance on Salaried Developers

The majority of initial committers are paid by their employee to contribute
to the project and few are contributing in their spare time. Once the
project has a community built, we are committed to recruit committers and
developers from outside the current core developers.

## Relationships with Other Apache Products

Grill is deeply integrated with other Apache projects. Grill uses and
extends Apache Hive HCatalog to store and manage the Data cubes. It uses
HDFS and Hive session management libraries. Grill has the driver-based
architecture that allows for adding multiple execution drivers. Apart from
integrating Apache Hive, it can be integrated with Apache Spark over Spark
SQL or Shark, Apache Drill, Apache Tajo and Apache Phoenix.
In future we want to use Apache Optiq in Grill for query optimization and
cost based driver selection.

## An Excessive Fascination with the Apache Brand

The project is conceived from beginning to be in line with the Apache
philosophy. As the core developers have good experience with Apache, the
source code organization, build, review and commit process are highly
influenced by Apache. We believe that Apache will be a solid home for Grill
to grow and build the open source community. We have also described the
reasons in the Rationale and Alignment sections.

# Documentation

# Initial Source

The source is currently in github repository at:

# Source and Intellectual Property Submission Plan

The complete Grill code is already under Apache Software License 2.

# External Dependencies

The dependencies all have Apache compatible licenses. These include Apache
2.0, BSD, MIT, EPL and CDDL licensed dependencies.

# Cryptography


# Required Resources

## Mailing lists

grill-dev AT incubator DOT apache DOT org
grill-commits AT incubator DOT apache DOT org
grill-private AT incubator DOT apache DOT org

## Subversion Directory

Git is the preferred source control system: git://

## Issue Tracking


# Initial Committers

Amareshwari Sriramadasu (amareshwari AT apache DOT org)
Gunther Hagleitner (gunther AT apache DOT org)
Jaideep Dhok (jaideep.dhok AT Inmobi DOT com)
Raghavendra Singh (raghavendra.singh AT Inmobi DOT com)
Sharad Agarwal (sharad AT apache DOT org)
Sreekanth Ramakrishnan (sreekanth AT apache DOT org)
Srikanth Sundarrajan (sriksun AT apache DOT org)
Suma Shivaprasad (suma.shivaprasad AT Inmobi DOT com)
Vikram Dixit (vikram AT apache DOT org)

# Affiliations

Amareshwari SR (InMobi)
Gunther Hagleitner (Hortonworks)
Jaideep Dhok (InMobi)
Raghavendra Singh (InMobi)
Sharad Agarwal (InMobi)
Sreekanth Ramakrishnan (SoftwareAG)
Srikanth Sundarrajan (InMobi)
Suma Shivaprasad (InMobi)
Vikram Dixit (Hortonworks)

# Sponsors

## Champion

Vinod K <vinodkv AT apache DOT org> (Apache Member)

## Nominated Mentors

Chris Douglas (Microsoft)
Jacob Homan (Microsoft)
Jean Baptiste Onofre (Talend)
Vinod K (Hortonworks)

## Sponsoring Entity

Incubator PMC

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message