incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <alanfga...@gmail.com>
Subject Re: [VOTE] Accept Apex into the Apache Incubator
Date Thu, 13 Aug 2015 21:32:22 GMT
+1.

Alan.

> Chris Nauroth <mailto:cnauroth@hortonworks.com>
> August 13, 2015 at 9:59
> +1 (binding)
>
> I believe the current proposal covers everything required. Thank you
> to Amol for incorporating the community's feedback.
>
> --Chris Nauroth
>
> From: "P. Taylor Goetz" <ptgoetz@apache.org<mailto:ptgoetz@apache.org>>
> Reply-To:
> <general@incubator.apache.org<mailto:general@incubator.apache.org>>
> Date: Thursday, August 13, 2015 at 7:48 AM
> To: Incubator
> <general@incubator.apache.org<mailto:general@incubator.apache.org>>
> Subject: [VOTE] Accept Apex into the Apache Incubator
>
> Following the discussion thread [1], I would like to call a VOTE for
> Accepting Apex as a new Apache Incubator project.
>
> The proposal is available on the wiki [2] and is also attached below.
>
> The VOTE will be open for at least 72 hours.
>
> [ ] +1 Accept Apex into the Incubator
> [ ] ±0 No opinion
> [ ] -1 Do not accept Apex into the Incubator because…
>
> Thanks,
>
> -Taylor
>
> [1] http://s.apache.org/apex_discuss
> [2] https://wiki.apache.org/incubator/ApexProposal
>
>
> == Abstract ==
> Apex is an enterprise grade native YARN big data-in-motion platform
> that unifies stream processing as well as batch processing. Apex
> processes big data in-motion in a highly scalable, highly performant,
> fault tolerant, stateful, secure, distributed, and an easily operable
> way. It provides a simple API that enables users to write or re-use
> generic Java code, thereby lowering the expertise needed to write big
> data applications.
>
> Functional and operational specifications are separated. Apex is
> designed in a way to enable users to write their own code (aka user
> defined functions) as is and leave all operability to the platform.
> The API is very simple and is designed to allow users to drop in their
> code as is. The platform mainly deals with operability and treats
> functional code as a black box. Operability includes fault tolerance,
> scalability, security, ease of use, metrics api, webservices, etc. In
> other words there is no separation of UDF (user defined functions), as
> all functional code is UDF. This frees users to focus on functional
> development, and lets platform provide operability support. The same
> code runs as is with different operability attributes. The
> data-in-motion architecture of Apex unifies stream as well as batch
> processing in a single platform. Since Apex is a native YARN
> application, it leverages all the components of YARN without
> duplication. Apex was developed with YARN in mind and has no
> overlapping components/functionality with YARN.
>
> The Apex platform is supplemented by project Malhar, which is a
> library of operators that implement common business logic functions
> needed by customers who want to quickly develop applications. These
> operators provide access to HDFS, S3, NFS, FTP, and other file
> systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems;
> MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases
> along with JDBC connectors. The Malhar library also includes a host of
> other common business logic patterns that help users to significantly
> reduce the time it takes to go into production. Ease of integration
> with all other big data technologies is one of the primary missions of
> Malhar.
>
> == Proposal ==
> The goal of this proposal is to establish the core engine of
> DataTorrent RTS product as an Apache Software Foundation (ASF) project
> in order to build a vibrant, diverse, and self-governed open source
> community around the technology. DataTorrent will continue to sell
> management tools, application building tools, easy to use big data
> applications, and custom high end business logic operators. This
> proposal covers the Apex source code (written in Java), Apex
> documentation and other materials currently available on
> https://github.com/DataTorrent/Apex. This proposal also covers the
> Malhar source code (written in Java), Malhar documentation, and other
> materials currently available on
> https://github.com/DataTorrent/Malhar. We have done a trademark check
> on the name Apex, and have concluded that the Apex name is likely to
> be a suitable project name.
>
> == Background ==
> DataTorrent RTS is a mature and robust product developed as a native
> YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was
> launched in Jan 2015. Both were well received by customers. RTS 3.0
> was launched at end of July 2015. RTS is among the first enterprise
> grade platform that was developed from the ground up as native YARN
> application. DataTorrent RTS is currently maintained by engineers as a
> closed source project. Even though the engineers behind RTS are
> experienced software engineers and are knowledge leaders in
> data-in-motion platforms, they have had little exposure to the open
> source governance process. Customers are currently running
> applications based on DataTorrent RTS in production.
>
> == Rationale ==
> Big data applications written for non-Hadoop platforms typically
> require major rewrites to get them to work with Hadoop. This rewriting
> creates a significant bottleneck in terms of resources (expertise)
> which in turn jeopardizes the viability of such an endeavour. It is
> hard enough to acquire big data expertise, demanding additional
> expertise to do a major code conversion makes it a very hard problem
> for projects to successfully migrate to Hadoop. Also, due to the batch
> processing nature of Hadoop’s MapReduce paradigm, users often have to
> wait tens of minutes to see results and act on them due to various
> delays in data flow. DataTorrent’s RTS data-in-motion architecture is
> designed to address this problem. It enables even the non big data
> developer to write code and operate it in a scalable, fault tolerant
> manner. The big data-in-motion architecture of DataTorrent’s RTS
> enables ease of integration into current enterprise infrastructure.
> This goal was achieved by keeping the API simple and empowering users
> to put in the connector code as is (or with minimal changes).
>
> Malhar is a manifestation of this reality, and we or the customer
> engineers were able to create these connectors within a day or so if
> not within a week. Connectors include those to integrate with message
> bus(es), file systems, databases, other protocols, and more continue
> to be added. Over a period of time we expect users to simply pick a
> connector that already exists in Malhar and quickly begin integrating
> with their current enterprise infrastructure. Within the
> data-in-motion architecture a stream application is one with
> connector(s) to say Kafka, JMS, or Flume; while a batch application is
> one with connector(s) to HDFS, HBase, FTP, NFS, S3n etc. This allows
> usage of the platform for both stream as well as batch processing with
> same business logic. Complete separation of user written application
> code from all operational aspects of the system, as well as support
> code for YARN, significantly expands the potential use cases that can
> migrate to use Hadoop.
>
> Apex will enable Hadoop eco-system to migrate a lot more use cases. It
> will enable the Hadoop eco-system to deliver on a promise to rapidly
> transform current IT infrastructure. Apex will help in significantly
> increasing productization of big data projects. One of the main
> barometers of success in the Hadoop eco-system is significant
> reduction of time to market for big data applications migrating to
> Hadoop. We believe that Apex will be one of the platforms that will
> enable users to extract value from big data, by reducing time to
> market. This rapid innovation can be optimally achieved through a
> vibrant, diverse, self-governed community collectively innovating
> around Apex and the Malhar library, while at the same time
> cross-pollinating with various other big data platforms. ASF is an
> ideal place to meet this goal.
>
> == Initial Goals ==
> Our initial goals are to bring Apex and Malhar repositories into the
> ASF, adapt internal engineering processes to open development, and
> foster a collaborative development model in accordance with the
> "Apache Way." DataTorrent plans to develop new functionality in an
> open, community-driven way. To get there, the existing internal build,
> test and release processes will be refactored to support open
> development. We already have an active user community on google groups
> that we intend to migrate to Apache.
>
> == Current Status ==
> Currently, the project Apex code base is available under Apache 2.0
> license (https://github.com/DataTorrent/Apex). Project Malhar code
> base is available under Apache 2.0 license
> (https://github.com/DataTorrent/Malhar). Project Malhar was open
> sourced 2 years ago which should make it easy for the project Malhar
> team to adapt to an open, collaborative, and meritocratic environment.
> Contributors of Malhar are employees of DataTorrent or have agreed to
> the shift to Apache. Project Apex, in contrast, was developed as a
> proprietary, closed-source product, but the internal engineering
> practices adopted by the development team were common to Malhar, and
> should lend themselves well to an open environment. DataTorrent plans
> to execute a software grant agreement as part of the launch of the
> incubation of Apex as an Apache project.
>
> The DataTorrent team has always focused on building a robust end user
> community of paying and non-paying customers. We think that the
> existing community centered around the existing google groups mailing
> list should be relatively easy to transform into an Apache-style
> community including both users and developers.
>
> === Meritocracy ===
> Our proposed list of initial committers include the current RTS R&D
> team, and our existing customers. This group will form a base for the
> broader community we will invite to collaborate on the codebase. We
> intend to radically expand the initial developer and user community by
> running the project in accordance with the "Apache Way". Users and new
> contributors will be treated with respect and welcomed. By
> participating in the community and providing quality patches/support
> that move the project forward, they will earn merit. They also will be
> encouraged to provide non-code contributions (documentation, events,
> presentations, community management, etc.) and will gain merit for
> doing so. Those with a proven support and quality track record will be
> encouraged to become committers.
>
> === Community ===
> If Apex is accepted for incubation, the primary initial goal will be
> transitioning the core community towards embracing the Apache Way of
> project governance. We will solicit major existing contributors to
> become committers on the project from the start. It should be noted
> that the existing community is already more diverse in many ways than
> some top-level Apache projects. We expect that we can encourage even
> more diversity.
>
> === Core Developers ===
> While a few core developers are skilled in working in openly governed
> Apache communities, most of the core developers are currently NOT
> affiliated with the ASF and would require new ICLAs before committing
> to the project. There would also be a learning curve associated with
> this on-boarding. Changing current development practices to be more
> open will be an important step.
>
> === Alignment ===
> The following existing ASF projects provide related functionality as
> that provided by Apex and should be considered when reviewing Apex
> proposal:
>
> Apache Hadoop? is a distributed storage and processing framework for
> very large datasets focusing primarily on batch processing for
> analytic purposes. Apex is a native YARN application. The Apex and
> Malhar roadmap includes plans to continue to leverage YARN, and help
> the YARN community develop the ability to support long running
> applications. Apex uses DFS interface of its core checkpoint/commit.
> Malhar has a large number of operators that leverage HDFS and other
> Apache projects. Our roadmap includes plans to continue to deepen the
> currently close integration with HDFS.
>
> Apache HBase offers tabular data stored in Hadoop based on the Google
> Bigtable model. Malhar has HBase connectors to ease integration with
> HBase. Malhar roadmap includes plans to continue to enhance
> integration with Apache HBase.
>
> Apache Kafka offers distributed and durable publish-subscribe
> messaging. Malhar integrates Kafka with Hadoop through feature rich
> connectors and supports ingest as well as analytical functions to
> incoming data. Raw data can be ingested from Kafka and results can be
> written to Kafka. Malhar roadmap includes plans to continue to enhance
> integration with Apache Kafka.
>
> Apache Flume is a distributed, reliable, and available service for
> efficiently collecting, aggregating, and moving large amounts of log
> data. Malhar has Flume connectors to ease integration with Flume.
> These connectors ensures that ingestion with Flume is fault tolerant
> and thus can be done in real-time with the same SLA as Flume’s HDFS
> connectors. Malhar roadmap includes plans to continue to enhance
> integration with Apache Flume.
>
> Apache Cassandra is a highly scalable, distributed key-value store
> that focuses on eventual consistency. Malhar has connectors to ease
> integration with Cassandra. Malhar roadmap includes plans to continue
> to enhance integration with Apache Cassandra.
>
> Apache Accumulo is a distributed key-value store based on Google’s
> BigTable design. Malhar has connectors to ease integration with
> Accumulo. The Malhar roadmap includes plans to continue to enhance
> integration with Apache Accumulo.
>
> Apache Tez is aimed at building an application framework which allows
> for a complex DAG of tasks for process data. The Apex and Malhar
> roadmaps include plans to integrate with Apache Tez but this is not
> currently supported.
>
> Apache ActiveMQ and its sub project Apache Apollo offers a powerful
> message queue framework. Malhar has ActiveMQ connectors that ease
> integration with ActiveMQ.
>
> Apache Spark is an engine for processing large datasets, typically in
> a Hadoop cluster. Malhar project makes it easy for users to integrate
> with Spark. The Malhar roadmap includes plans to continue to enhance
> integration with Apache Spark.
>
> Apache Flink is an engine for scalable batch and stream data
> processing. Malhar project makes it easy for users to integrate with
> Flink. There is overlap in how Flink leverages data-in-motion
> architecture for both stream and batch processing, and it does
> subscribe to our thought process that data-in-motion can handle both
> stream and batch, meanwhile a batch only engine will find it harder to
> manage streams. We differ in terms of how we handle operability, user
> defined code, metrics, webservices etc. Apex is very operational
> oriented, while Flink has much more focus on functional elements.
> Malhar and rapid availability of common business logic is another
> differentiator. We believe both these approaches are valid and the
> community and innovation will gain by through cross pollination. We
> plan to integrate with Apache Flink via HDFS for now.
>
> Apache Hive software facilitates querying and managing large datasets
> residing in distributed storage. Malhar project makes it easy for
> users to integrate with Apache Hive. The Malhar roadmap includes plans
> to continue to enhance integration with Apache Hive.
>
> Apache Pig is a platform for analyzing large data sets. Pig consists
> of a high-level language for expressing data analysis programs,
> coupled with infrastructure for evaluating these programs. The Apex
> and Malhar roadmaps include plans to integrate with Apache Pig.
>
> Apache Storm is a distributed realtime computation system. Malhar
> makes it easy for users to integrate with Apache Storm. We plan to
> integrate with Apache Storm via HDFS for now. Malhar roadmaps include
> plans to continue to support mechanism for integration with Apache Storm.
>
> Apache Samza is a distributed stream processing framework. Malhar
> makes it easy for users to integrate with Apache Samza. We plan to
> integrate with Apache Samza via HDFS or Apache Kafka for now. Malhar
> roadmaps include plans to continue to support mechanism for
> integration with Apache Samza.
>
> Apache Slider is a YARN application to deploy existing distributed
> applications on YARN, monitor them, and make them larger or smaller as
> desired even when the application is running. Once Slider matures, we
> will take a look at close integration of Apex with Slider.
>
> Project Malhar and Apex are aligned to many more Apache projects and
> other open source projects as ease of integration with other
> technologies is one of the primary goals of this project. These
> include Apache Solr, ElasticSearch, MongoDB, Aerospike, ZeroMQ,
> CouchDB, CouchBase, MemCache, Redis, RabbitMQ, Apache Derby.
>
> == Known Risks ==
> Development has been sponsored mostly by a single company
> (DataTorrent, Inc.) thus far and coordinated mainly by the core
> DataTorrent RTS and Malhar team, with active participation from our
> current customers.
>
> For the project to fully transition to the Apache Way governance
> model, development must shift towards the merit-centric model of
> growing a community of contributors balanced with the needs for
> extreme stability and core implementation coherency.
>
> The tools and development practices in place for the DataTorrent RTS
> and Malhar products are compatible with the ASF infrastructure and
> thus we do not anticipate any on-boarding pains. Migration from the
> current GitHub repository is also expected to be straightforward.
>
> === Orphaned products ===
> DataTorrent is fully committed to DataTorrent Apex and Malhar and the
> product will continue to be based on the Apex project. Moreover,
> DataTorrent has a vested interest in making Apex succeed by driving
> its close integration with sister ASF projects. We expect this to
> further reduce the risk of orphaning the product.
>
> === Inexperience with Open Source ===
> DataTorrent has embraced open source software by open sourcing Malhar
> project under Apache 2.0 license. The DataTorrent team includes
> veterans from the Yahoo! Hadoop team. Although some of the initial
> committers have not been developers on an entirely open source,
> community-driven project, we expect to bring to bear the open
> development practices of Malhar to the Apex project. Additionally,
> several ASF veterans agreed to mentor the project and are listed in
> this proposal. The project will rely on their guidance and collective
> wisdom to quickly transition the entire team of initial committers
> towards practicing the Apache Way. DataTorrent is also driving the
> Kafka on YARN (KOYA) initiative.
>
> === Homogeneous Developers ===
> While most of the initial committers are employed by DataTorrent, we
> have already seen a healthy level of interest from our existing
> customers and partners. We intend to convert that interest directly
> into participation and will be investing in activities to recruit
> additional committers from other companies.
>
> === Reliance on Salaried Developers ===
> Most of the contributors are paid to work in the Big Data space. While
> they might wander from their current employers, they are unlikely to
> venture far from their core expertises and thus will continue to be
> engaged with the project regardless of their current employers.
>
> === Relationships with Other Apache Products ===
> As mentioned in the Alignment section, Apex may consider various
> degrees of integration and code exchange with Apache Hadoop (YARN and
> HDFS), Apache Kafka, Apache HBase, Apache Flume, Apache Cassandra,
> Apache Accumulo, Apache Tez, Apache Hive, Apache Pig, Apache Storm,
> Apache Samza, Apache Spark, Apache Slider. Given the success that the
> DataTorrent RTS product enjoyed, we expect integration points to be
> inside and outside the project. We look forward to collaborating with
> these communities as well as other communities under the Apache umbrella.
>
> === An Excessive Fascination with the Apache Brand ===
> While we intend to leverage the Apache ‘branding’ when talking to
> other projects as testament of our project’s ‘neutrality’, we have no
> plans for making use of Apache brand in press releases nor posting
> billboards advertising acceptance of Apex into Apache Incubator.
>
>
> == Documentation ==
> See documentation for the current state of the project documentation
> available as part of the GitHub repositories -
> https://github.com/DataTorrent/Apex;
> https://github.com/DataTorrent/Malhar. In addition a list of demos
> that serve as a how to guide are available at
> https://github.com/DataTorrent/Malhar/tree/master/demos
>
> == Initial Source ==
> DataTorrent has released the source code for Apex under Apache 2.0
> License at https://github.com/DataTorrent/Apex, and that of Malhar
> under Apache 2.0 licence at https://github.com/DataTorrent/Malhar. We
> encourage ASF community members interested in this proposal to
> download the source code, review it and try out the software.
>
> == Source and Intellectual Property Submission Plan ==
> As soon as Apex is approved to join Apache Incubator, DataTorrent will
> execute a Software Grant Agreement and the source code will be
> transitioned onto ASF infrastructure. The code is already licensed
> under the Apache Software License, version 2.0. We know of no legal
> encumberments that would inhibit the transfer of source code to the ASF.
>
> == External Dependencies ==
> All dependencies fall under the permissive licenses categories, or
> weak copy left (http://www.apache.org/legal/resolved.html#category-b).
> We intend to remove the dependencies on GPL licensed technologies on
> which APex or Malhar depend. These technologies are optional and have
> been marked as such.
>
> Embedded dependencies (relocated):
> * None
>
> Runtime dependencies:
> * activemq-client
> * ant
> * async-http-client
> * bval-jsr303
> * commons-beanutils
> * commons-codec
> * commons-lang3
> * commons-compiler
> * embassador
> * fastutil
> * guava
> * hadoop-common
> * hadoop-common-tests
> * hadoop-yarn-client
> * httpclient
> * jackson-core-asl
> * jackson-mapper-asl
> * javax.mail
> * jersey-apache-client4
> * jersey-client
> * jetty-servlet
> * jetty-websocket
> * jline
> * kryo
> * named-regexp
> * netlet
> * rhino (GPL 2.0, optional)
> * slf4j-api
> * slf4j-log4j12
> * validation-api
> * xbean-asm5-shaded
> * zip4j
>
> Module or optional dependencies
> * accumulo-core
> * aerospike-client
> * amqp-client
> * aws-java-sdk-kinesis
> * cassandra-driver-core
> * couchbase-client
> * CouchbaseMock
> * elasticsearch
> * geoip-api (LGPL, optional)
> * hbase
> * hbase-client
> * hbase-server
> * hive-exec
> * hive-service
> * hiveunit
> * javax.mail-api
> * jedis
> * jms-api
> * jri (GPL, optional)
> * jriengine (LGPL, optional)
> * jruby (LGPL, optional)
> * jython (PSF License, optional)
> * jzmq (LGPL, optional)
> * kafka_2.10
> * lettuce (GPL, optional)
> * libthrift
> * Memcached-Java-Client
> * mongo-java-driver
> * mqtt-client
> * mysql-connector-java (GPL2, optional)
> * org.ektorp
> * rengine (LGPL, optional)
> * rome
> * solr-core
> * solr-solrj
> * spymemcached
> * sqlite4java
> * super-csv
> * twitter4j-core
> * twitter4j-stream
> * uadetector-resources
> * org.apache.servicemix.bundles.splunk
>
> Build only dependencies:
> * None
>
> Test only dependencies:
> * activemq-broker
> * activemq-kahadb-store
> * greenmail
> * hadoop-yarn-server-tests
> * hsqldb
> * janino
> * junit
> * MockFtpServer
> * mockito-all
> * testng
>
> Cryptography N/A
>
> == Required Resources ==
> === Mailing lists ===
> *
> private@apex.incubator.apache.org<mailto:private@apex.incubator.apache.org>
> (moderated subscriptions)
> *
> commits@apex.incubator.apache.org<mailto:commits@apex.incubator.apache.org>
> * dev@apex.incubator.apache.org<mailto:dev@apex.incubator.apache.org>
>
> === Git Repository ===
> * https://git-wip-us.apache.org/repos/asf/incubator-apex-core.git
> * https://git-wip-us.apache.org/repos/asf/incubator-apex-malhar.git
>
> === Issue Tracking ===
> * JIRA Project Apex (APEX_CORE) // If '_' is not allowed, use APEXCORE
> * JIRA Project Malhar (APEX_MALHAR) // If '_' is not allowed use
> APEXMALHAR
>
> === Other Resources ===
> * Means of setting up regular builds for apex-core on
> builds.apache.org<http://builds.apache.org>
> * Means of setting up regular builds for apex-malhar on
> builds.apache.org<http://builds.apache.org>
>
> === Rationale for Malhar and Apex having separate git and jira ===
> We managed Malhar and Apex as two repos and two jiras on purpose. Both
> code bases are released under Apache 2.0 and are proposed for
> incubation. In terms of our vision to enable innovation around a
> native YARN data-in-motion that unifies stream processing as well as
> batch processing Malhar and Apex go hand in hand. Apex has base API
> that consists of java api (functional), and attributes (operability).
> Malhar is a manifestation of this api, but from user perspective,
> Malhar is itself an API to leverage business logic. Over past three
> years we have found that the cadence of release and api changes in
> Malhar is much rapid than Apex and it was operationally much easier to
> separate them into their own repos. Two repos will reflect clear
> separation of engine (Apex) and operators/business logic (Malhar). It
> will allow or independent release cycles (operator change independent
> of engine due to stable API). We however do not believe in two levels
> of committers. We believe there should be one community that works
> across both and innovates with ideas that Malhar and Apex combined
> provide the value proposition. We are proposing that Apache incubation
> process help us to foster development of one community (mailing list,
> committers), and a yet be ok with two repos. We are proposing that
> this be taken up during incubation. Community will learn if this
> works. The decision on whether to split them into two projects be
> taken after the learning curve during incubation.
>
> == Initial Committers ==
> * Roma Ahuja (rahuja at directv dot com)
> * Isha Arkatkar (isha at datatorrent dot com)
> * Raja Ali (raji at silverspringnet dot com)
> * Sunaina Chaudhary ( SChaudhary at directv dot com)
> * Bhupesh Chawda (bhupesh at datatorrent dot com)
> * Chaitanya Chelobu (chaitanya at datatorrent dot com)
> * Bright Chen (bright at datatorrent dot com)
> * Pradeep Dalvi (pradeep dot dalvi at datatorrent dot com)
> * Sandeep Deshmukh (sandeep at datatorrent dot com)
> * Yogi Devendra (yogi at datatorrent dot com)
> * Cem Ezberci (hasan dot ezberci at ge dot com)
> * Timothy Farkas (tim at datatorrent dot com)
> * Ilya Ganelin (ilya dot ganelin at capitalone dot com)
> * Vitthal Gogate (vitthal_gogate at yahoo dot com)
> * Parag Goradia (parag dot goradia at ge dot com)
> * Tushar Gosavi (tushar at datatorrent dot com)
> * Priyanka Gugale (priyanka at datatorrent dot com)
> * Gaurav Gupta (gaurav at datatorrent dot com)
> * Sandesh Hegde (sandesh at datatorrent dot com)
> * Siyuan Hua ( siyuan at datatorrent dot com)
> * Ajith Joseph (ajoseph at silverspring dot com)
> * Amol Kekre ( amol at datatorrent dot com)
> * Chinmay Kolhatkar ( chinmay at datatorrent dot com)
> * Pramod Immaneni ( pramod at datatorrent dot com)
> * Anuj Lal ( anuj dot lal at ge dot com)
> * Dongsu Lee (dlee3 at directv dot com)
> * Vitaly Li (blossom dot valley at gmail dot com)
> * Dean Lockgaard (dean at datatorrent dot com)
> * Rohan Mehta (rohan_mehta at apple dot com)
> * Adi Mishra (apmishra at directv dot com, adi dot mishra at gmail dot
> com)
> * Chetan Narsude (chetan at datatorrent dot com)
> * Darin Nee (dnee at silverspring dot com)
> * Alexander Parfenov (sasha at datatorrent dot com)
> * Andrew Perlitch (andy at datatorrent dot com)
> * Shubham Phatak (shubham at datatorrent dot com)
> * Ashwin Putta (ashwin at datatorrent dot com)
> * Rikin Shah (shah_rikin at yahoo dot com)
> * Luis Ramos (l dot ramos at ge dot com)
> * Munagala Ramanath (ram at datatorrent dot com)
> * Vlad Rozov (vlad dot rozov at datatorrent dot com)
> * Atri Sharma (atri dot jiit at gmail dot com)
> * Chandni Singh (chandni at datatorrent dot com)
> * Venkatesh Sivasubramanian (venkateshs at ge dot com)
> * Aniruddha Thombare (aniruddha at datatorrent dot com)
> * Jessica Wang (jessica at datatorrent dot com)
> * Thomas Weise (thomas at datatorrent dot com)
> * David Yan (david at datatorrent dot com)
> * Kevin Yang (yang dot k at ge dot com)
> * Brennon York (brennon dot york at capitalone dot com)
>
> == Affiliations ==
> * Apple: Vitaly Li, Rohan Mehta
> * Barclays: Atri Sharma
> * Class Software: Justin Mclean
> * CapitalOne: Ilya Ganelin, Brennon York
> * DataTorrent: everyone else on this proposal
> * Datachief: Rikin Shah
> * DirecTV: Roma Ahuja, Sunaina Chaudhary, Dongsu Lee, Adi Mishra
> * E8security: Vitthal Gogate
> * General Electric: Cem Ezberci, Parag Goradia, Anuj Lal, Luis Ramos,
> Venkatesh Sivasubramanian, Kevin Yang
> * Hortonworks: Alan Gates, Taylor Goetz, Chris Nauroth, Hitesh Shah
> * MapR: Ted Dunning
> * SilverSpring Networks: Raja Ali, Ajith Joseph, Darin Nee
>
> == Sponsors ==
>
> === Champion ===
> Ted Dunning
>
> === Nominated Mentors ===
>
> The initial mentors are listed below:
> * Ted Dunning - Apache Member, MapR
> * Alan Gates - Apache Member, Hortonworks
> * Taylor Goetz - Apache Member, Hortonworks
> * Justin Mclean - Apache Member, Class Software
> * Chris Nauroth - Apache Member, Hortonworks
> * Hitesh Shah: Apache Member, Hortonworks
>
> === Sponsoring Entity ===
>
> We would like to propose Apache incubator to sponsor this project.
>
> P. Taylor Goetz <mailto:ptgoetz@apache.org>
> August 13, 2015 at 7:48
> Following the discussion thread [1], I would like to call a VOTE for
> Accepting Apex as a new Apache Incubator project.
>
> The proposal is available on the wiki [2] and is also attached below.
>
> The VOTE will be open for at least 72 hours.
>
> [ ] +1 Accept Apex into the Incubator
> [ ] ±0 No opinion
> [ ] -1 Do not accept Apex into the Incubator because…
>
> Thanks,
>
> -Taylor
>
> [1] http://s.apache.org/apex_discuss
> [2] https://wiki.apache.org/incubator/ApexProposal
>
>
> == Abstract ==
> Apex is an enterprise grade native YARN big data-in-motion platform
> that unifies stream processing as well as batch processing. Apex
> processes big data in-motion in a highly scalable, highly performant,
> fault tolerant, stateful, secure, distributed, and an easily operable
> way. It provides a simple API that enables users to write or re-use
> generic Java code, thereby lowering the expertise needed to write big
> data applications.
>
> Functional and operational specifications are separated. Apex is
> designed in a way to enable users to write their own code (aka user
> defined functions) as is and leave all operability to the platform.
> The API is very simple and is designed to allow users to drop in their
> code as is. The platform mainly deals with operability and treats
> functional code as a black box. Operability includes fault tolerance,
> scalability, security, ease of use, metrics api, webservices, etc. In
> other words there is no separation of UDF (user defined functions), as
> all functional code is UDF. This frees users to focus on functional
> development, and lets platform provide operability support. The same
> code runs as is with different operability attributes. The
> data-in-motion architecture of Apex unifies stream as well as batch
> processing in a single platform. Since Apex is a native YARN
> application, it leverages all the components of YARN without
> duplication. Apex was developed with YARN in mind and has no
> overlapping components/functionality with YARN.
>
> The Apex platform is supplemented by project Malhar, which is a
> library of operators that implement common business logic functions
> needed by customers who want to quickly develop applications. These
> operators provide access to HDFS, S3, NFS, FTP, and other file
> systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems;
> MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases
> along with JDBC connectors. The Malhar library also includes a host of
> other common business logic patterns that help users to significantly
> reduce the time it takes to go into production. Ease of integration
> with all other big data technologies is one of the primary missions of
> Malhar.
>
> == Proposal ==
> The goal of this proposal is to establish the core engine of
> DataTorrent RTS product as an Apache Software Foundation (ASF) project
> in order to build a vibrant, diverse, and self-governed open source
> community around the technology. DataTorrent will continue to sell
> management tools, application building tools, easy to use big data
> applications, and custom high end business logic operators. This
> proposal covers the Apex source code (written in Java), Apex
> documentation and other materials currently available on
> https://github.com/DataTorrent/Apex. This proposal also covers the
> Malhar source code (written in Java), Malhar documentation, and other
> materials currently available on
> https://github.com/DataTorrent/Malhar. We have done a trademark check
> on the name Apex, and have concluded that the Apex name is likely to
> be a suitable project name.
>
> == Background ==
> DataTorrent RTS is a mature and robust product developed as a native
> YARN application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was
> launched in Jan 2015. Both were well received by customers. RTS 3.0
> was launched at end of July 2015. RTS is among the first enterprise
> grade platform that was developed from the ground up as native YARN
> application. DataTorrent RTS is currently maintained by engineers as a
> closed source project. Even though the engineers behind RTS are
> experienced software engineers and are knowledge leaders in
> data-in-motion platforms, they have had little exposure to the open
> source governance process. Customers are currently running
> applications based on DataTorrent RTS in production.
>
> == Rationale ==
> Big data applications written for non-Hadoop platforms typically
> require major rewrites to get them to work with Hadoop. This rewriting
> creates a significant bottleneck in terms of resources (expertise)
> which in turn jeopardizes the viability of such an endeavour. It is
> hard enough to acquire big data expertise, demanding additional
> expertise to do a major code conversion makes it a very hard problem
> for projects to successfully migrate to Hadoop. Also, due to the batch
> processing nature of Hadoop’s MapReduce paradigm, users often have to
> wait tens of minutes to see results and act on them due to various
> delays in data flow. DataTorrent’s RTS data-in-motion architecture is
> designed to address this problem. It enables even the non big data
> developer to write code and operate it in a scalable, fault tolerant
> manner. The big data-in-motion architecture of DataTorrent’s RTS
> enables ease of integration into current enterprise infrastructure.
> This goal was achieved by keeping the API simple and empowering users
> to put in the connector code as is (or with minimal changes).
>
> Malhar is a manifestation of this reality, and we or the customer
> engineers were able to create these connectors within a day or so if
> not within a week. Connectors include those to integrate with message
> bus(es), file systems, databases, other protocols, and more continue
> to be added. Over a period of time we expect users to simply pick a
> connector that already exists in Malhar and quickly begin integrating
> with their current enterprise infrastructure. Within the
> data-in-motion architecture a stream application is one with
> connector(s) to say Kafka, JMS, or Flume; while a batch application is
> one with connector(s) to HDFS, HBase, FTP, NFS, S3n etc. This allows
> usage of the platform for both stream as well as batch processing with
> same business logic. Complete separation of user written application
> code from all operational aspects of the system, as well as support
> code for YARN, significantly expands the potential use cases that can
> migrate to use Hadoop.
>
> Apex will enable Hadoop eco-system to migrate a lot more use cases. It
> will enable the Hadoop eco-system to deliver on a promise to rapidly
> transform current IT infrastructure. Apex will help in significantly
> increasing productization of big data projects. One of the main
> barometers of success in the Hadoop eco-system is significant
> reduction of time to market for big data applications migrating to
> Hadoop. We believe that Apex will be one of the platforms that will
> enable users to extract value from big data, by reducing time to
> market. This rapid innovation can be optimally achieved through a
> vibrant, diverse, self-governed community collectively innovating
> around Apex and the Malhar library, while at the same time
> cross-pollinating with various other big data platforms. ASF is an
> ideal place to meet this goal.
>
> == Initial Goals ==
> Our initial goals are to bring Apex and Malhar repositories into the
> ASF, adapt internal engineering processes to open development, and
> foster a collaborative development model in accordance with the
> "Apache Way." DataTorrent plans to develop new functionality in an
> open, community-driven way. To get there, the existing internal build,
> test and release processes will be refactored to support open
> development. We already have an active user community on google groups
> that we intend to migrate to Apache.
>
> == Current Status ==
> Currently, the project Apex code base is available under Apache 2.0
> license (https://github.com/DataTorrent/Apex). Project Malhar code
> base is available under Apache 2.0 license
> (https://github.com/DataTorrent/Malhar). Project Malhar was open
> sourced 2 years ago which should make it easy for the project Malhar
> team to adapt to an open, collaborative, and meritocratic environment.
> Contributors of Malhar are employees of DataTorrent or have agreed to
> the shift to Apache. Project Apex, in contrast, was developed as a
> proprietary, closed-source product, but the internal engineering
> practices adopted by the development team were common to Malhar, and
> should lend themselves well to an open environment. DataTorrent plans
> to execute a software grant agreement as part of the launch of the
> incubation of Apex as an Apache project.
>
> The DataTorrent team has always focused on building a robust end user
> community of paying and non-paying customers. We think that the
> existing community centered around the existing google groups mailing
> list should be relatively easy to transform into an Apache-style
> community including both users and developers.
>
> === Meritocracy ===
> Our proposed list of initial committers include the current RTS R&D
> team, and our existing customers. This group will form a base for the
> broader community we will invite to collaborate on the codebase. We
> intend to radically expand the initial developer and user community by
> running the project in accordance with the "Apache Way". Users and new
> contributors will be treated with respect and welcomed. By
> participating in the community and providing quality patches/support
> that move the project forward, they will earn merit. They also will be
> encouraged to provide non-code contributions (documentation, events,
> presentations, community management, etc.) and will gain merit for
> doing so. Those with a proven support and quality track record will be
> encouraged to become committers.
>
> === Community ===
> If Apex is accepted for incubation, the primary initial goal will be
> transitioning the core community towards embracing the Apache Way of
> project governance. We will solicit major existing contributors to
> become committers on the project from the start. It should be noted
> that the existing community is already more diverse in many ways than
> some top-level Apache projects. We expect that we can encourage even
> more diversity.
>
> === Core Developers ===
> While a few core developers are skilled in working in openly governed
> Apache communities, most of the core developers are currently NOT
> affiliated with the ASF and would require new ICLAs before committing
> to the project. There would also be a learning curve associated with
> this on-boarding. Changing current development practices to be more
> open will be an important step.
>
> === Alignment ===
> The following existing ASF projects provide related functionality as
> that provided by Apex and should be considered when reviewing Apex
> proposal:
>
> Apache Hadoop(R) is a distributed storage and processing framework for
> very large datasets focusing primarily on batch processing for
> analytic purposes. Apex is a native YARN application. The Apex and
> Malhar roadmap includes plans to continue to leverage YARN, and help
> the YARN community develop the ability to support long running
> applications. Apex uses DFS interface of its core checkpoint/commit.
> Malhar has a large number of operators that leverage HDFS and other
> Apache projects. Our roadmap includes plans to continue to deepen the
> currently close integration with HDFS.
>
> Apache HBase offers tabular data stored in Hadoop based on the Google
> Bigtable model. Malhar has HBase connectors to ease integration with
> HBase. Malhar roadmap includes plans to continue to enhance
> integration with Apache HBase.
>
> Apache Kafka offers distributed and durable publish-subscribe
> messaging. Malhar integrates Kafka with Hadoop through feature rich
> connectors and supports ingest as well as analytical functions to
> incoming data. Raw data can be ingested from Kafka and results can be
> written to Kafka. Malhar roadmap includes plans to continue to enhance
> integration with Apache Kafka.
>
> Apache Flume is a distributed, reliable, and available service for
> efficiently collecting, aggregating, and moving large amounts of log
> data. Malhar has Flume connectors to ease integration with Flume.
> These connectors ensures that ingestion with Flume is fault tolerant
> and thus can be done in real-time with the same SLA as Flume’s HDFS
> connectors. Malhar roadmap includes plans to continue to enhance
> integration with Apache Flume.
>
> Apache Cassandra is a highly scalable, distributed key-value store
> that focuses on eventual consistency. Malhar has connectors to ease
> integration with Cassandra. Malhar roadmap includes plans to continue
> to enhance integration with Apache Cassandra.
>
> Apache Accumulo is a distributed key-value store based on Google’s
> BigTable design. Malhar has connectors to ease integration with
> Accumulo. The Malhar roadmap includes plans to continue to enhance
> integration with Apache Accumulo.
>
> Apache Tez is aimed at building an application framework which allows
> for a complex DAG of tasks for process data. The Apex and Malhar
> roadmaps include plans to integrate with Apache Tez but this is not
> currently supported.
>
> Apache ActiveMQ and its sub project Apache Apollo offers a powerful
> message queue framework. Malhar has ActiveMQ connectors that ease
> integration with ActiveMQ.
>
> Apache Spark is an engine for processing large datasets, typically in
> a Hadoop cluster. Malhar project makes it easy for users to integrate
> with Spark. The Malhar roadmap includes plans to continue to enhance
> integration with Apache Spark.
>
> Apache Flink is an engine for scalable batch and stream data
> processing. Malhar project makes it easy for users to integrate with
> Flink. There is overlap in how Flink leverages data-in-motion
> architecture for both stream and batch processing, and it does
> subscribe to our thought process that data-in-motion can handle both
> stream and batch, meanwhile a batch only engine will find it harder to
> manage streams. We differ in terms of how we handle operability, user
> defined code, metrics, webservices etc. Apex is very operational
> oriented, while Flink has much more focus on functional elements.
> Malhar and rapid availability of common business logic is another
> differentiator. We believe both these approaches are valid and the
> community and innovation will gain by through cross pollination. We
> plan to integrate with Apache Flink via HDFS for now.
>
> Apache Hive software facilitates querying and managing large datasets
> residing in distributed storage. Malhar project makes it easy for
> users to integrate with Apache Hive. The Malhar roadmap includes plans
> to continue to enhance integration with Apache Hive.
>
> Apache Pig is a platform for analyzing large data sets. Pig consists
> of a high-level language for expressing data analysis programs,
> coupled with infrastructure for evaluating these programs. The Apex
> and Malhar roadmaps include plans to integrate with Apache Pig.
>
> Apache Storm is a distributed realtime computation system. Malhar
> makes it easy for users to integrate with Apache Storm. We plan to
> integrate with Apache Storm via HDFS for now. Malhar roadmaps include
> plans to continue to support mechanism for integration with Apache Storm.
>
> Apache Samza is a distributed stream processing framework. Malhar
> makes it easy for users to integrate with Apache Samza. We plan to
> integrate with Apache Samza via HDFS or Apache Kafka for now. Malhar
> roadmaps include plans to continue to support mechanism for
> integration with Apache Samza.
>
> Apache Slider is a YARN application to deploy existing distributed
> applications on YARN, monitor them, and make them larger or smaller as
> desired even when the application is running. Once Slider matures, we
> will take a look at close integration of Apex with Slider.
>
> Project Malhar and Apex are aligned to many more Apache projects and
> other open source projects as ease of integration with other
> technologies is one of the primary goals of this project. These
> include Apache Solr, ElasticSearch, MongoDB, Aerospike, ZeroMQ,
> CouchDB, CouchBase, MemCache, Redis, RabbitMQ, Apache Derby.
>
> == Known Risks ==
> Development has been sponsored mostly by a single company
> (DataTorrent, Inc.) thus far and coordinated mainly by the core
> DataTorrent RTS and Malhar team, with active participation from our
> current customers.
>
> For the project to fully transition to the Apache Way governance
> model, development must shift towards the merit-centric model of
> growing a community of contributors balanced with the needs for
> extreme stability and core implementation coherency.
>
> The tools and development practices in place for the DataTorrent RTS
> and Malhar products are compatible with the ASF infrastructure and
> thus we do not anticipate any on-boarding pains. Migration from the
> current GitHub repository is also expected to be straightforward.
>
> === Orphaned products ===
> DataTorrent is fully committed to DataTorrent Apex and Malhar and the
> product will continue to be based on the Apex project. Moreover,
> DataTorrent has a vested interest in making Apex succeed by driving
> its close integration with sister ASF projects. We expect this to
> further reduce the risk of orphaning the product.
>
> === Inexperience with Open Source ===
> DataTorrent has embraced open source software by open sourcing Malhar
> project under Apache 2.0 license. The DataTorrent team includes
> veterans from the Yahoo! Hadoop team. Although some of the initial
> committers have not been developers on an entirely open source,
> community-driven project, we expect to bring to bear the open
> development practices of Malhar to the Apex project. Additionally,
> several ASF veterans agreed to mentor the project and are listed in
> this proposal. The project will rely on their guidance and collective
> wisdom to quickly transition the entire team of initial committers
> towards practicing the Apache Way. DataTorrent is also driving the
> Kafka on YARN (KOYA) initiative.
>
> === Homogeneous Developers ===
> While most of the initial committers are employed by DataTorrent, we
> have already seen a healthy level of interest from our existing
> customers and partners. We intend to convert that interest directly
> into participation and will be investing in activities to recruit
> additional committers from other companies.
>
> === Reliance on Salaried Developers ===
> Most of the contributors are paid to work in the Big Data space. While
> they might wander from their current employers, they are unlikely to
> venture far from their core expertises and thus will continue to be
> engaged with the project regardless of their current employers.
>
> === Relationships with Other Apache Products ===
> As mentioned in the Alignment section, Apex may consider various
> degrees of integration and code exchange with Apache Hadoop (YARN and
> HDFS), Apache Kafka, Apache HBase, Apache Flume, Apache Cassandra,
> Apache Accumulo, Apache Tez, Apache Hive, Apache Pig, Apache Storm,
> Apache Samza, Apache Spark, Apache Slider. Given the success that the
> DataTorrent RTS product enjoyed, we expect integration points to be
> inside and outside the project. We look forward to collaborating with
> these communities as well as other communities under the Apache umbrella.
>
> === An Excessive Fascination with the Apache Brand ===
> While we intend to leverage the Apache ‘branding’ when talking to
> other projects as testament of our project’s ‘neutrality’, we have no
> plans for making use of Apache brand in press releases nor posting
> billboards advertising acceptance of Apex into Apache Incubator.
>
>
> == Documentation ==
> See documentation for the current state of the project documentation
> available as part of the GitHub repositories -
> https://github.com/DataTorrent/Apex;
> https://github.com/DataTorrent/Malhar. In addition a list of demos
> that serve as a how to guide are available at
> https://github.com/DataTorrent/Malhar/tree/master/demos
>
> == Initial Source ==
> DataTorrent has released the source code for Apex under Apache 2.0
> License at https://github.com/DataTorrent/Apex, and that of Malhar
> under Apache 2.0 licence at https://github.com/DataTorrent/Malhar. We
> encourage ASF community members interested in this proposal to
> download the source code, review it and try out the software.
>
> == Source and Intellectual Property Submission Plan ==
> As soon as Apex is approved to join Apache Incubator, DataTorrent will
> execute a Software Grant Agreement and the source code will be
> transitioned onto ASF infrastructure. The code is already licensed
> under the Apache Software License, version 2.0. We know of no legal
> encumberments that would inhibit the transfer of source code to the ASF.
>
> == External Dependencies ==
> All dependencies fall under the permissive licenses categories, or
> weak copy left (http://www.apache.org/legal/resolved.html#category-b).
> We intend to remove the dependencies on GPL licensed technologies on
> which APex or Malhar depend. These technologies are optional and have
> been marked as such.
>
> Embedded dependencies (relocated):
> * None
>
> Runtime dependencies:
> * activemq-client
> * ant
> * async-http-client
> * bval-jsr303
> * commons-beanutils
> * commons-codec
> * commons-lang3
> * commons-compiler
> * embassador
> * fastutil
> * guava
> * hadoop-common
> * hadoop-common-tests
> * hadoop-yarn-client
> * httpclient
> * jackson-core-asl
> * jackson-mapper-asl
> * javax.mail
> * jersey-apache-client4
> * jersey-client
> * jetty-servlet
> * jetty-websocket
> * jline
> * kryo
> * named-regexp
> * netlet
> * rhino (GPL 2.0, optional)
> * slf4j-api
> * slf4j-log4j12
> * validation-api
> * xbean-asm5-shaded
> * zip4j
>
> Module or optional dependencies
> * accumulo-core
> * aerospike-client
> * amqp-client
> * aws-java-sdk-kinesis
> * cassandra-driver-core
> * couchbase-client
> * CouchbaseMock
> * elasticsearch
> * geoip-api (LGPL, optional)
> * hbase
> * hbase-client
> * hbase-server
> * hive-exec
> * hive-service
> * hiveunit
> * javax.mail-api
> * jedis
> * jms-api
> * jri (GPL, optional)
> * jriengine (LGPL, optional)
> * jruby (LGPL, optional)
> * jython (PSF License, optional)
> * jzmq (LGPL, optional)
> * kafka_2.10
> * lettuce (GPL, optional)
> * libthrift
> * Memcached-Java-Client
> * mongo-java-driver
> * mqtt-client
> * mysql-connector-java (GPL2, optional)
> * org.ektorp
> * rengine (LGPL, optional)
> * rome
> * solr-core
> * solr-solrj
> * spymemcached
> * sqlite4java
> * super-csv
> * twitter4j-core
> * twitter4j-stream
> * uadetector-resources
> * org.apache.servicemix.bundles.splunk
>
> Build only dependencies:
> * None
>
> Test only dependencies:
> * activemq-broker
> * activemq-kahadb-store
> * greenmail
> * hadoop-yarn-server-tests
> * hsqldb
> * janino
> * junit
> * MockFtpServer
> * mockito-all
> * testng
> Cryptography N/A
>
> == Required Resources ==
> === Mailing lists ===
> * private@apex.incubator.apache.org
> <mailto:private@apex.incubator.apache.org> (moderated subscriptions)
> * commits@apex.incubator.apache.org
> <mailto:commits@apex.incubator.apache.org>
> * dev@apex.incubator.apache.org <mailto:dev@apex.incubator.apache.org>
>
> === Git Repository ===
> * https://git-wip-us.apache.org/repos/asf/incubator-apex-core.git
> * https://git-wip-us.apache.org/repos/asf/incubator-apex-malhar.git
>
> === Issue Tracking ===
> * JIRA Project Apex (APEX_CORE) // If '_' is not allowed, use APEXCORE
> * JIRA Project Malhar (APEX_MALHAR) // If '_' is not allowed use
> APEXMALHAR
>
> === Other Resources ===
> * Means of setting up regular builds for apex-core on
> builds.apache.org <http://builds.apache.org>
> * Means of setting up regular builds for apex-malhar on
> builds.apache.org <http://builds.apache.org>
>
> === Rationale for Malhar and Apex having separate git and jira ===
> We managed Malhar and Apex as two repos and two jiras on purpose. Both
> code bases are released under Apache 2.0 and are proposed for
> incubation. In terms of our vision to enable innovation around a
> native YARN data-in-motion that unifies stream processing as well as
> batch processing Malhar and Apex go hand in hand. Apex has base API
> that consists of java api (functional), and attributes (operability).
> Malhar is a manifestation of this api, but from user perspective,
> Malhar is itself an API to leverage business logic. Over past three
> years we have found that the cadence of release and api changes in
> Malhar is much rapid than Apex and it was operationally much easier to
> separate them into their own repos. Two repos will reflect clear
> separation of engine (Apex) and operators/business logic (Malhar). It
> will allow or independent release cycles (operator change independent
> of engine due to stable API). We however do not believe in two levels
> of committers. We believe there should be one community that works
> across both and innovates with ideas that Malhar and Apex combined
> provide the value proposition. We are proposing that Apache incubation
> process help us to foster development of one community (mailing list,
> committers), and a yet be ok with two repos. We are proposing that
> this be taken up during incubation. Community will learn if this
> works. The decision on whether to split them into two projects be
> taken after the learning curve during incubation.
>
> == Initial Committers ==
> * Roma Ahuja (rahuja at directv dot com)
> * Isha Arkatkar (isha at datatorrent dot com)
> * Raja Ali (raji at silverspringnet dot com)
> * Sunaina Chaudhary ( SChaudhary at directv dot com)
> * Bhupesh Chawda (bhupesh at datatorrent dot com)
> * Chaitanya Chelobu (chaitanya at datatorrent dot com)
> * Bright Chen (bright at datatorrent dot com)
> * Pradeep Dalvi (pradeep dot dalvi at datatorrent dot com)
> * Sandeep Deshmukh (sandeep at datatorrent dot com)
> * Yogi Devendra (yogi at datatorrent dot com)
> * Cem Ezberci (hasan dot ezberci at ge dot com)
> * Timothy Farkas (tim at datatorrent dot com)
> * Ilya Ganelin (ilya dot ganelin at capitalone dot com)
> * Vitthal Gogate (vitthal_gogate at yahoo dot com)
> * Parag Goradia (parag dot goradia at ge dot com)
> * Tushar Gosavi (tushar at datatorrent dot com)
> * Priyanka Gugale (priyanka at datatorrent dot com)
> * Gaurav Gupta (gaurav at datatorrent dot com)
> * Sandesh Hegde (sandesh at datatorrent dot com)
> * Siyuan Hua ( siyuan at datatorrent dot com)
> * Ajith Joseph (ajoseph at silverspring dot com)
> * Amol Kekre ( amol at datatorrent dot com)
> * Chinmay Kolhatkar ( chinmay at datatorrent dot com)
> * Pramod Immaneni ( pramod at datatorrent dot com)
> * Anuj Lal ( anuj dot lal at ge dot com)
> * Dongsu Lee (dlee3 at directv dot com)
> * Vitaly Li (blossom dot valley at gmail dot com)
> * Dean Lockgaard (dean at datatorrent dot com)
> * Rohan Mehta (rohan_mehta at apple dot com)
> * Adi Mishra (apmishra at directv dot com, adi dot mishra at gmail dot
> com)
> * Chetan Narsude (chetan at datatorrent dot com)
> * Darin Nee (dnee at silverspring dot com)
> * Alexander Parfenov (sasha at datatorrent dot com)
> * Andrew Perlitch (andy at datatorrent dot com)
> * Shubham Phatak (shubham at datatorrent dot com)
> * Ashwin Putta (ashwin at datatorrent dot com)
> * Rikin Shah (shah_rikin at yahoo dot com)
> * Luis Ramos (l dot ramos at ge dot com)
> * Munagala Ramanath (ram at datatorrent dot com)
> * Vlad Rozov (vlad dot rozov at datatorrent dot com)
> * Atri Sharma (atri dot jiit at gmail dot com)
> * Chandni Singh (chandni at datatorrent dot com)
> * Venkatesh Sivasubramanian (venkateshs at ge dot com)
> * Aniruddha Thombare (aniruddha at datatorrent dot com)
> * Jessica Wang (jessica at datatorrent dot com)
> * Thomas Weise (thomas at datatorrent dot com)
> * David Yan (david at datatorrent dot com)
> * Kevin Yang (yang dot k at ge dot com)
> * Brennon York (brennon dot york at capitalone dot com)
>
> == Affiliations ==
> * Apple: Vitaly Li, Rohan Mehta
> * Barclays: Atri Sharma
> * Class Software: Justin Mclean
> * CapitalOne: Ilya Ganelin, Brennon York
> * DataTorrent: everyone else on this proposal
> * Datachief: Rikin Shah
> * DirecTV: Roma Ahuja, Sunaina Chaudhary, Dongsu Lee, Adi Mishra
> * E8security: Vitthal Gogate
> * General Electric: Cem Ezberci, Parag Goradia, Anuj Lal, Luis Ramos,
> Venkatesh Sivasubramanian, Kevin Yang
> * Hortonworks: Alan Gates, Taylor Goetz, Chris Nauroth, Hitesh Shah
> * MapR: Ted Dunning
> * SilverSpring Networks: Raja Ali, Ajith Joseph, Darin Nee
>
> == Sponsors ==
>
> === Champion ===
> Ted Dunning
>
> === Nominated Mentors ===
>
> The initial mentors are listed below:
> * Ted Dunning - Apache Member, MapR
> * Alan Gates - Apache Member, Hortonworks
> * Taylor Goetz - Apache Member, Hortonworks
> * Justin Mclean - Apache Member, Class Software
> * Chris Nauroth - Apache Member, Hortonworks
> * Hitesh Shah: Apache Member, Hortonworks
>
> === Sponsoring Entity ===
>
> We would like to propose Apache incubator to sponsor this project.
>

Mime
View raw message