incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jan i <j...@apache.org>
Subject Re: [DISCUSS] Apex Incubation Proposal
Date Mon, 10 Aug 2015 14:16:40 GMT
On Monday, August 10, 2015, Amol Kekre <amol@datatorrent.com> wrote:

> Roman,
> It is a single proposal. Ted and I talked at length on this. The main
> difference is the release frequency and need for Malhar to work off a
> stable release of Apex. The proposal is for a single community not two, and
> to share all the community aspects (emails, commiters, etc.)


This makes perfect sense to me, and having one community delivering 2
different releases is not a problem, at the most some logistic issues.

Look forward to the vote.

rgds
jan i


ps. I too would recommend to start with a smaller community, it simplifies
the start phase, but it is totally perfect as it is.

>
> Thks,
> Amol
>
>
>
> On Sun, Aug 9, 2015 at 8:18 PM, Ted Dunning <ted.dunning@gmail.com
> <javascript:;>> wrote:
>
> > I had some long talks with Amol on exactly this point when he was getting
> > started with this proposal.
> >
> > At the current time, it appears that the developer communities for these
> > two systems are indistinguishable.  Moreover, neither project is actually
> > much use without the other.  Malhar requires Apex to run and Apex
> provides
> > low-level capabilities that are not particularly useful without Malhar.
> > These two are even more strongly linked than, say Lucene and Solr.  Or
> HDFS
> > and Yarn.
> >
> > The primary difference between these is that Malhar is expected to be
> > released much more frequently than Apex as new functions are added.
> Also,
> > as a platform, there is much more burden on Apex to be stable, thus
> > decreasing the desirable release frequency.  Over time, it is likely that
> > newcomers are more likely to find it easy to contribute on the Malhar
> side
> > initially, but the existing community is fine with keeping the committer
> > pool uniform.
> >
> >
> >
> > On Sun, Aug 9, 2015 at 8:09 PM, Roman Shaposhnik <roman@shaposhnik.org
> <javascript:;>>
> > wrote:
> >
> > > I'm confused about whether this is one proposal or two
> > > proposals rolled into one. On one hand it seems like
> > > Apex and Malhar are independent. On the other hand
> > > the proposal covers Apex in great details but not so
> > > much Malhar.
> > >
> > > As usual with ASF, the real question here is the community
> > > for the two. If we're talking about the same initial community
> > > for both -- I think it makes sense to treat them as a single
> > > project, not two.
> > >
> > > Thanks,
> > > Roman.
> > >
> > > On Wed, Aug 5, 2015 at 5:23 PM, Amol Kekre <amol@datatorrent.com
> <javascript:;>> wrote:
> > > > I would like to start a discussion on DataTorrent's core engine and
> its
> > > > operators joining the ASF as an incubating project under the name
> Apex.
> > > >
> > > > The proposal is available on the wiki here:
> > > > https://wiki.apache.org/incubator/ApexProposal
> > > >
> > > > The text of the proposal is also available at the end of this email
> > > >
> > > > Apex is an enterprise grade native YARN big data-in-motion platform
> > that
> > > > unifies batch and stream processing. Apex is a highly distributed,
> > > > performant, fault tolerant, stateful and easily operable platform.
> > > >
> > > > Thanks in advance for your time and help.
> > > >
> > > > Thks,
> > > > Amol
> > > >
> > > >
> > >
> >
> --------------------------------------------------------------------------------------------
> > > >
> > > > == Abstract ==
> > > > Apex is an enterprise grade native YARN big data-in-motion platform
> > that
> > > > unifies stream processing as well as batch processing. Apex processes
> > big
> > > > data in-motion in a highly scalable, highly performant, fault
> tolerant,
> > > > stateful, secure, distributed, and an easily operable way. It
> provides
> > a
> > > > simple API that enables users to write or re-use generic Java code,
> > > thereby
> > > > lowering the expertise needed to write big data applications.
> > > >
> > > > Functional and operational specifications are separated. Apex is
> > designed
> > > > in a way to enable users to write their own code (aka user defined
> > > > functions) as is and leave all operability to the platform. The API
> is
> > > very
> > > > simple and is designed to allow users to drop in their code as is.
> The
> > > > platform mainly deals with operability and treats functional code as
> a
> > > > black box. Operability includes fault tolerance, scalability,
> security,
> > > > ease of use, metrics api, webservices, etc. In other words there is
> no
> > > > separation of UDF (user defined functions), as all functional code is
> > > UDF.
> > > > This frees users to focus on functional development, and lets
> platform
> > > > provide operability support. The same code runs as is with different
> > > > operability attributes. The data-in-motion architecture of Apex
> unifies
> > > > stream as well as batch processing in a single platform. Since Apex
> is
> > a
> > > > native YARN application, it leverages all the components of YARN
> > without
> > > > duplication. Apex was developed with YARN in mind and has no
> > overlapping
> > > > components/functionality with YARN.
> > > >
> > > > The Apex platform is supplemented by project Malhar, which is a
> library
> > > of
> > > > operators that implement common business logic functions needed by
> > > > customers who want to quickly develop applications. These operators
> > > provide
> > > > access to HDFS, S3, NFS, FTP, and other file systems;  Kafka,
> ActiveMQ,
> > > > RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB,
> > > Redis,
> > > > HBase, CouchDB and other databases along with JDBC connectors. The
> > Malhar
> > > > library also includes a host of other common business logic patterns
> > that
> > > > help users to significantly reduce the time it takes to go into
> > > production.
> > > > Ease of integration with all other big data technologies is one of
> the
> > > > primary missions of Malhar.
> > > >
> > > > == Proposal ==
> > > > The goal of this proposal is to establish the core engine of
> > DataTorrent
> > > > RTS product as an Apache Software Foundation (ASF) project in order
> to
> > > > build a vibrant, diverse, and self-governed open source community
> > around
> > > > the technology. DataTorrent will continue to sell management tools,
> > > > application building tools, easy to use big data applications, and
> > custom
> > > > high end business logic operators. This proposal covers the Apex
> source
> > > > code (written in Java), Apex documentation and other materials
> > currently
> > > > available on https://github.com/DataTorrent/Apex. This proposal also
> > > covers
> > > > the Malhar source code (written in Java), Malhar documentation, and
> > other
> > > > materials currently available on
> https://github.com/DataTorrent/Malhar
> > .
> > > We
> > > > have done a trademark check on the name Apex, and have concluded that
> > the
> > > > Apex name is likely to be a suitable project name.
> > > >
> > > > == Background ==
> > > > DataTorrent RTS is a mature and robust product developed as a native
> > YARN
> > > > application. RTS 1.0 was launched in summer of 2014; RTS 2.0 was
> > launched
> > > > in Jan 2015. Both were well received by customers. RTS 3.0 was
> launched
> > > at
> > > > end of July 2015. RTS is among the first enterprise grade platform
> that
> > > was
> > > > developed from the ground up as native YARN application. DataTorrent
> > RTS
> > > is
> > > > currently maintained by engineers as a closed source project. Even
> > though
> > > > the engineers behind RTS are experienced software engineers and are
> > > > knowledge leaders in data-in-motion platforms, they have had little
> > > > exposure to the open source governance process. Customers are
> currently
> > > > running applications based on DataTorrent RTS in production.
> > > >
> > > > == Rationale ==
> > > > Big data applications written for non-Hadoop platforms typically
> > require
> > > > major rewrites  to get them to work with Hadoop. This rewriting
> > creates a
> > > > significant bottleneck in terms of resources (expertise) which in
> turn
> > > > jeopardizes the viability of such an endeavour. It is hard enough to
> > > > acquire big data expertise, demanding additional expertise to do a
> > major
> > > > code conversion makes it a very hard problem for projects to
> > successfully
> > > > migrate to Hadoop. Also, due to the batch processing nature of
> Hadoop’s
> > > > MapReduce paradigm, users often have to wait tens of minutes to see
> > > results
> > > > and act on them due to various delays in data flow. DataTorrent’s RTS
> > > > data-in-motion architecture is designed to address this problem. It
> > > enables
> > > > even the non big data developer to write code and operate it in a
> > > scalable,
> > > > fault tolerant manner. The big data-in-motion architecture of
> > > DataTorrent’s
> > > > RTS enables ease of integration into current enterprise
> infrastructure.
> > > > This goal was achieved by keeping the API simple and empowering users
> > to
> > > > put in the connector code as is (or with minimal changes).
> > > >
> > > > Malhar is a manifestation of this reality, and we or the customer
> > > engineers
> > > > were able to create these connectors within a day or so if not
> within a
> > > > week. Connectors include those to integrate with message bus(es),
> file
> > > > systems, databases, other protocols, and more continue to be added.
> > Over
> > > a
> > > > period of time we expect users to simply pick a connector that
> already
> > > > exists in Malhar and quickly begin integrating with their current
> > > > enterprise infrastructure. Within the data-in-motion architecture a
> > > stream
> > > > application is one with connector(s) to say Kafka, JMS, or Flume;
> > while a
> > > > batch application is one with connector(s) to HDFS, HBase, FTP, NFS,
> > S3n
> > > > etc. This allows usage of the platform for both stream as well as
> batch
> > > > processing with same business logic. Complete separation of user
> > written
> > > > application code from all operational aspects of the system, as well
> as
> > > > support code for YARN, significantly expands the potential use cases
> > that
> > > > can migrate to use Hadoop.
> > > >
> > > > Apex will enable Hadoop eco-system to migrate a lot more use cases.
> It
> > > will
> > > > enable the Hadoop eco-system to deliver on a promise to rapidly
> > transform
> > > > current IT infrastructure. Apex will help in significantly increasing
> > > > productization of big data projects. One of the main barometers of
> > > success
> > > > in the Hadoop eco-system is significant reduction of time to market
> for
> > > big
> > > > data applications migrating to Hadoop. We believe that Apex will be
> one
> > > of
> > > > the platforms that will enable users to extract value from big data,
> by
> > > > reducing time to market. This rapid innovation can be optimally
> > achieved
> > > > through a vibrant, diverse, self-governed community collectively
> > > innovating
> > > > around Apex and the Malhar library, while at the same time
> > > > cross-pollinating with various other big data platforms. ASF is an
> > ideal
> > > > place to meet this goal.
> > > >
> > > > == Initial Goals ==
> > > > Our initial goals are to bring Apex and Malhar repositories into the
> > ASF,
> > > > adapt internal engineering processes to open development, and foster
> a
> > > > collaborative development model in accordance with the "Apache Way."
> > > > DataTorrent plans to develop new functionality in an open,
> > > community-driven
> > > > way. To get there, the existing internal build, test and release
> > > processes
> > > > will be refactored to support open development. We already have an
> > active
> > > > user community on google groups that we intend to migrate to Apache.
> > > >
> > > > == Current Status ==
> > > > Currently, the project Apex code base is available under Apache 2.0
> > > license
> > > > (https://github.com/DataTorrent/Apex). Project Malhar code base is
> > > > available under Apache 2.0 license (
> > > https://github.com/DataTorrent/Malhar).
> > > > Project Malhar was open sourced 2 years ago which should make it easy
> > for
> > > > the project Malhar team to adapt to an  open, collaborative, and
> > > > meritocratic environment. Contributors of Malhar are employees of
> > > > DataTorrent or have agreed to the shift to Apache. Project Apex, in
> > > > contrast, was developed as a proprietary, closed-source product, but
> > the
> > > > internal engineering practices adopted by the development team were
> > > common
> > > > to Malhar, and should lend themselves well to an open  environment.
> > > > DataTorrent plans to execute a software grant agreement as part of
> the
> > > > launch of the incubation of Apex as an Apache project.
> > > >
> > > > The DataTorrent team has always focused on building a robust end user
> > > > community of paying and non-paying customers. We think that the
> > existing
> > > > community centered around the existing google groups mailing list
> > should
> > > be
> > > > relatively easy to transform into an Apache-style community including
> > > both
> > > > users and developers.
> > > >
> > > > === Meritocracy ===
> > > > Our proposed list of initial committers include the current RTS R&D
> > team,
> > > > and our existing customers. This group will form a base for the
> broader
> > > > community we will invite to collaborate on the codebase. We intend to
> > > > radically expand the initial developer and user community by running
> > the
> > > > project in accordance with the "Apache Way". Users and new
> contributors
> > > > will be treated with respect and welcomed. By participating in the
> > > > community and providing quality patches/support that move the project
> > > > forward, they will earn merit. They also will be encouraged to
> provide
> > > > non-code contributions (documentation, events, presentations,
> community
> > > > management, etc.) and will gain merit for doing so. Those with a
> proven
> > > > support and quality track record will be encouraged to become
> > committers.
> > > >
> > > > === Community ===
> > > > If Apex is accepted for incubation, the primary initial goal will be
> > > > transitioning the core community towards embracing the Apache Way of
> > > > project governance. We will solicit major existing contributors to
> > become
> > > > committers on the project from the start. It should be noted that the
> > > > existing community is already more diverse in many ways than some
> > > top-level
> > > > Apache projects. We expect that we can encourage even more diversity.
> > > >
> > > > === Core Developers ===
> > > > While a few core developers are skilled in working in openly governed
> > > > Apache communities, most of the core developers are currently NOT
> > > > affiliated with the ASF and would require new ICLAs before committing
> > to
> > > > the project. There would also be a learning curve associated with
> this
> > > > on-boarding. Changing current development practices to be more open
> > will
> > > be
> > > > an important step.
> > > >
> > > > === Alignment ===
> > > > The following existing ASF projects provide related functionality as
> > that
> > > > provided by Apex and should be considered when reviewing Apex
> proposal:
> > > >
> > > > Apache HadoopⓇ is a distributed storage and processing framework for
> > very
> > > > large datasets focusing primarily on batch processing for analytic
> > > > purposes. Apex is a native YARN application. The Apex and Malhar
> > roadmap
> > > > includes plans to continue to leverage YARN, and help the YARN
> > community
> > > > develop the ability to support long running applications. Apex uses
> DFS
> > > > interface of its core checkpoint/commit. Malhar has a large number of
> > > > operators that leverage HDFS and other Apache projects. Our roadmap
> > > > includes plans to continue to deepen the currently close integration
> > with
> > > > HDFS.
> > > >
> > > > Apache HBase offers tabular data stored in Hadoop based on the Google
> > > > Bigtable model. Malhar has HBase connectors to ease integration with
> > > HBase.
> > > > Malhar roadmap includes plans to continue to enhance integration with
> > > > Apache HBase.
> > > >
> > > > Apache Kafka offers distributed and durable publish-subscribe
> > messaging.
> > > > Malhar integrates Kafka with Hadoop through feature rich connectors
> and
> > > > supports ingest as well as analytical functions to incoming data. Raw
> > > data
> > > > can be ingested from Kafka and results can be written to Kafka.
> Malhar
> > > > roadmap includes plans to continue to enhance integration with Apache
> > > Kafka.
> > > >
> > > > Apache Flume is a distributed, reliable, and available service for
> > > > efficiently collecting, aggregating, and moving large amounts of log
> > > data.
> > > > Malhar has Flume connectors to ease integration with Flume. These
> > > > connectors ensures that ingestion with Flume is fault tolerant and
> thus
> > > can
> > > > be done in real-time with the same SLA as Flume’s HDFS connectors.
> > Malhar
> > > > roadmap includes plans to continue to enhance integration with Apache
> > > Flume.
> > > >
> > > > Apache Cassandra is a highly scalable, distributed key-value store
> that
> > > > focuses on eventual consistency. Malhar has connectors to ease
> > > integration
> > > > with Cassandra. Malhar roadmap includes plans to continue to enhance
> > > > integration with Apache Cassandra.
> > > >
> > > > Apache Accumulo is a distributed key-value store based on Google’s
> > > BigTable
> > > > design. Malhar has connectors to ease integration with Accumulo. The
> > > Malhar
> > > > roadmap includes plans to continue to enhance integration with Apache
> > > > Accumulo.
> > > >
> > > > Apache Tez is aimed at building an application framework which allows
> > > for a
> > > > complex DAG of tasks for process data. The Apex and Malhar roadmaps
> > > include
> > > > plans to integrate with Apache Tez but this is not currently
> supported.
> > > >
> > > > Apache ActiveMQ and its sub project Apache Apollo offers a powerful
> > > message
> > > > queue framework. Malhar has ActiveMQ connectors that ease integration
> > > with
> > > > ActiveMQ.
> > > >
> > > > Apache Spark is an engine for processing large datasets, typically
> in a
> > > > Hadoop cluster. Malhar project makes it easy for users to integrate
> > with
> > > > Spark. The Malhar roadmap includes plans to continue to enhance
> > > integration
> > > > with Apache Spark.
> > > >
> > > > Apache Flink is an engine for scalable batch and stream data
> > processing.
> > > > Malhar project makes it easy for users to integrate with Flink. There
> > is
> > > > overlap in how Flink leverages data-in-motion architecture for both
> > > stream
> > > > and batch processing, and it does subscribe to our thought process
> that
> > > > data-in-motion can handle both stream and batch, meanwhile a batch
> only
> > > > engine will find it harder to manage streams. We differ in terms of
> how
> > > we
> > > > handle operability, user defined code, metrics, webservices etc. Apex
> > is
> > > > very operational oriented, while Flink has much more focus on
> > functional
> > > > elements. Malhar and rapid availability of common business logic is
> > > another
> > > > differentiator. We believe both these approaches are valid and the
> > > > community and innovation will gain by through cross pollination. We
> > plan
> > > to
> > > > integrate with Apache Flink via HDFS for now.
> > > >
> > > > Apache Hive software facilitates querying and managing large datasets
> > > > residing in distributed storage. Malhar project makes it easy for
> users
> > > to
> > > > integrate with Apache Hive. The Malhar roadmap includes plans to
> > continue
> > > > to enhance integration with Apache Hive.
> > > >
> > > > Apache Pig is a platform for analyzing large data sets.  Pig consists
> > of
> > > a
> > > > high-level language for expressing data analysis programs, coupled
> with
> > > > infrastructure for evaluating these programs. The Apex and Malhar
> > > roadmaps
> > > > include plans to integrate with Apache Pig.
> > > >
> > > > Apache Storm is a distributed realtime computation system. Malhar
> makes
> > > it
> > > > easy for users to integrate with Apache Storm. We plan to integrate
> > with
> > > > Apache Storm via HDFS for now. Malhar roadmaps include plans to
> > continue
> > > to
> > > > support mechanism for integration with Apache Storm.
> > > >
> > > > Apache Samza is a distributed stream processing framework. Malhar
> makes
> > > it
> > > > easy for users to integrate with Apache Samza. We plan to integrate
> > with
> > > > Apache Samza via HDFS or Apache Kafka for now. Malhar roadmaps
> include
> > > > plans to continue to support mechanism for integration with Apache
> > Samza.
> > > >
> > > > Apache Slider is a YARN application to deploy existing distributed
> > > > applications on YARN, monitor them, and make them larger or smaller
> as
> > > > desired even when the application is running. Once Slider matures, we
> > > will
> > > > take a look at close integration of Apex with Slider.
> > > >
> > > > Project Malhar and Apex are aligned to many more Apache projects and
> > > other
> > > > open source projects as ease of integration with other technologies
> is
> > > one
> > > > of the primary goals of this project. These include Apache Solr,
> > > > ElasticSearch, MongoDB, Aerospike, ZeroMQ, CouchDB, CouchBase,
> > MemCache,
> > > > Redis, RabbitMQ, Apache Derby.
> > > >
> > > > == Known Risks ==
> > > > Development has been sponsored mostly by a single company
> (DataTorrent,
> > > > Inc.) thus far and coordinated mainly by the core DataTorrent RTS and
> > > > Malhar team, with active participation from our current customers.
> > > >
> > > > For the project to fully transition to the Apache Way governance
> model,
> > > > development must shift towards the merit-centric model of growing a
> > > > community of contributors balanced with the needs for extreme
> stability
> > > and
> > > > core implementation coherency.
> > > >
> > > > The tools and development practices in place for the DataTorrent RTS
> > and
> > > > Malhar products are compatible with the ASF infrastructure and thus
> we
> > do
> > > > not anticipate any on-boarding pains. Migration from the current
> GitHub
> > > > repository is also expected to be straightforward.
> > > >
> > > > === Orphaned products ===
> > > > DataTorrent is fully committed to DataTorrent Apex and Malhar and the
> > > > product will continue to be based on the Apex project. Moreover,
> > > > DataTorrent has a vested interest in making Apex succeed by driving
> its
> > > > close integration with sister ASF projects. We expect this to further
> > > > reduce the risk of orphaning the product.
> > > >
> > > > === Inexperience with Open Source ===
> > > > DataTorrent has embraced open source software by open sourcing Malhar
> > > > project under Apache 2.0 license. The DataTorrent team includes
> > veterans
> > > > from the Yahoo! Hadoop team. Although some of the initial committers
> > have
> > > > not been developers on an entirely open source, community-driven
> > project,
> > > > we expect to bring to bear the open development practices of Malhar
> to
> > > the
> > > > Apex project. Additionally, several ASF veterans agreed to mentor the
> > > > project and are listed in this proposal. The project will rely on
> their
> > > > guidance and collective wisdom to quickly transition the entire team
> of
> > > > initial committers towards practicing the Apache Way. DataTorrent is
> > also
> > > > driving the Kafka on YARN (KOYA) initiative.
> > > >
> > > > === Homogeneous Developers ===
> > > > While most of the initial committers are employed by DataTorrent, we
> > have
> > > > already seen a healthy level of interest from our existing customers
> > and
> > > > partners. We intend to convert that interest directly into
> > participation
> > > > and will be investing in activities to recruit additional committers
> > from
> > > > other companies.
> > > >
> > > > === Reliance on Salaried Developers ===
> > > > Most of the contributors are paid to work in the Big Data space.
> While
> > > they
> > > > might wander from their current employers, they are unlikely to
> venture
> > > far
> > > > from their core expertises and thus will continue to be engaged with
> > the
> > > > project regardless of their current employers.
> > > >
> > > > === Relationships with Other Apache Products ===
> > > > As mentioned in the Alignment section, Apex may consider various
> > degrees
> > > of
> > > > integration and code exchange with Apache Hadoop (YARN and HDFS),
> > Apache
> > > > Kafka, Apache HBase, Apache Flume, Apache Cassandra, Apache Accumulo,
> > > > Apache Tez, Apache Hive, Apache Pig, Apache Storm, Apache Samza,
> Apache
> > > > Spark, Apache Slider. Given the success that the DataTorrent RTS
> > product
> > > > enjoyed, we expect integration points to be inside and outside the
> > > project.
> > > > We look forward to collaborating with these communities as well as
> > other
> > > > communities under the Apache umbrella.
> > > >
> > > > === An Excessive Fascination with the Apache Brand ===
> > > > While we intend to leverage the Apache ‘branding’ when talking to
> other
> > > > projects as testament of our project’s ‘neutrality’, we have no
plans
> > for
> > > > making use of Apache brand in press releases nor posting billboards
> > > > advertising acceptance of Apex into Apache Incubator.
> > > >
> > > >
> > > > == Documentation ==
> > > > See documentation for the current state of the project documentation
> > > > available as part of the GitHub repositories -
> > > > https://github.com/DataTorrent/Apex;
> > > https://github.com/DataTorrent/Malhar.
> > > > In addition a list of demos that serve as a how to guide are
> available
> > at
> > > > https://github.com/DataTorrent/Malhar/tree/master/demos
> > > >
> > > > == Initial Source ==
> > > > DataTorrent has released the source code for Apex under Apache 2.0
> > > License
> > > > at https://github.com/DataTorrent/Apex, and that of Malhar under
> > Apache
> > > 2.0
> > > > licence at https://github.com/DataTorrent/Malhar. We encourage ASF
> > > > community members interested in this proposal to download the source
> > > code,
> > > > review it and try out the software.
> > > >
> > > > == Source and Intellectual Property Submission Plan ==
> > > > As soon as Apex is approved to join Apache Incubator, DataTorrent
> will
> > > > execute a Software Grant Agreement and the source code will be
> > > transitioned
> > > > onto ASF infrastructure. The code is already licensed under the
> Apache
> > > > Software License, version 2.0. We know of no legal encumberments that
> > > would
> > > > inhibit the transfer of source code to the ASF.
> > > >
> > > > == External Dependencies ==
> > > > All dependencies fall under the permissive licenses categories, or
> weak
> > > > copy left (http://www.apache.org/legal/resolved.html#category-b). We
> > > intend
> > > > to remove the dependencies on GPL licensed technologies on which APex
> > or
> > > > Malhar depend. These technologies are optional and have been marked
> as
> > > such.
> > > >
> > > > Embedded dependencies (relocated):
> > > >    * None
> > > >
> > > > Runtime dependencies:
> > > >    * activemq-client
> > > >    * ant
> > > >    * async-http-client
> > > >    * bval-jsr303
> > > >    * commons-beanutils
> > > >    * commons-codec
> > > >    * commons-lang3
> > > >    * commons-compiler
> > > >    * embassador
> > > >    * fastutil
> > > >    * guava
> > > >    * hadoop-common
> > > >    * hadoop-common-tests
> > > >    * hadoop-yarn-client
> > > >    * httpclient
> > > >    * jackson-core-asl
> > > >    * jackson-mapper-asl
> > > >    * javax.mail
> > > >    * jersey-apache-client4
> > > >    * jersey-client
> > > >    * jetty-servlet
> > > >    * jetty-websocket
> > > >    * jline
> > > >    * kryo
> > > >    * named-regexp
> > > >    * netlet
> > > >    * rhino (GPL 2.0, optional)
> > > >    * slf4j-api
> > > >    * slf4j-log4j12
> > > >    * validation-api
> > > >    * xbean-asm5-shaded
> > > >    * zip4j
> > > >
> > > > Module or optional dependencies
> > > >    * accumulo-core
> > > >    * aerospike-client
> > > >    * amqp-client
> > > >    * aws-java-sdk-kinesis
> > > >    * cassandra-driver-core
> > > >    * couchbase-client
> > > >    * CouchbaseMock
> > > >    * elasticsearch
> > > >    * geoip-api (LGPL, optional)
> > > >    * hbase
> > > >    * hbase-client
> > > >    * hbase-server
> > > >    * hive-exec
> > > >    * hive-service
> > > >    * hiveunit
> > > >    * javax.mail-api
> > > >    * jedis
> > > >    * jms-api
> > > >    * jri (GPL, optional)
> > > >    * jriengine (LGPL, optional)
> > > >    * jruby (LGPL, optional)
> > > >    * jython (PSF License, optional)
> > > >    * jzmq (LGPL, optional)
> > > >    * kafka_2.10
> > > >    * lettuce (GPL, optional)
> > > >    * libthrift
> > > >    * Memcached-Java-Client
> > > >    * mongo-java-driver
> > > >    * mqtt-client
> > > >    * mysql-connector-java (GPL2, optional)
> > > >    * org.ektorp
> > > >    * rengine (LGPL, optional)
> > > >    * rome
> > > >    * solr-core
> > > >    * solr-solrj
> > > >    * spymemcached
> > > >    * sqlite4java
> > > >    * super-csv
> > > >    * twitter4j-core
> > > >    * twitter4j-stream
> > > >    * uadetector-resources
> > > >    * org.apache.servicemix.bundles.splunk
> > > >
> > > > Build only dependencies:
> > > >    * None
> > > >
> > > > Test only dependencies:
> > > >    * activemq-broker
> > > >    * activemq-kahadb-store
> > > >    * greenmail
> > > >    * hadoop-yarn-server-tests
> > > >    * hsqldb
> > > >    * janino
> > > >    * junit
> > > >    * MockFtpServer
> > > >    * mockito-all
> > > >    * testng
> > > >
> > > > Cryptography N/A
> > > >
> > > > == Required Resources ==
> > > > === Mailing lists ===
> > > >    * private@apex.incubator.apache.org <javascript:;> (moderated
> subscriptions)
> > > >    * commits@apex.incubator.apache.org <javascript:;>
> > > >    * dev@apex.incubator.apache.org <javascript:;>
> > > >    * issues@apex.incubator.apache.org <javascript:;>
> > > >    * user@apex.incubator.apache.org <javascript:;>
> > > >
> > > > === Git Repository ===
> > > >    * https://git-wip-us.apache.org/repos/asf/incubator-apex.git
> > > >    * https://git-wip-us.apache.org/repos/asf/incubator-malhar.git
> > > >
> > > > === Issue Tracking ===
> > > >    * JIRA Project Apex (APEX)
> > > >    * JIRA Project Malhar (MALHAR)
> > > >
> > > > === Other Resources ===
> > > >    * Means of setting up regular builds for Apex on
> builds.apache.org
> > > >    * Means of setting up regular builds for Malhar on
> > builds.apache.org
> > > >
> > > > === Rationale for Malhar and Apex having separate git and jira ===
> > > > We managed Malhar and Apex as two repos and two jiras on purpose.
> Both
> > > code
> > > > bases are released under Apache 2.0 and are proposed for incubation.
> In
> > > > terms of our vision to enable innovation around a native YARN
> > > > data-in-motion that unifies stream processing as well as batch
> > processing
> > > > Malhar and Apex go hand in hand. Apex has base API that consists of
> > java
> > > > api (functional), and attributes (operability). Malhar is a
> > manifestation
> > > > of this api, but from user perspective, Malhar is itself an API to
> > > leverage
> > > > business logic. Over past three years we have found that the cadence
> of
> > > > release and api changes in Malhar is much rapid than Apex and it was
> > > > operationally much easier to separate them into their own repos. Two
> > > repos
> > > > will reflect clear separation of engine (Apex) and operators/business
> > > logic
> > > > (Malhar), and reflect different developer roles. It will allow or
> > > > independent release cycles (operator change independent of engine due
> > to
> > > > stable API). We however do not believe in two levels of committers.
> We
> > > > believe there should be one community that works across both and
> > > innovates
> > > > with ideas that Malhar and Apex combined provide the value
> proposition.
> > > We
> > > > are proposing that Apache incubation process help us to foster
> > > development
> > > > of one community (mailing list, committers), and a yet be ok with two
> > > > repos. We are proposing that this be taken up during incubation.
> > > Community
> > > > will learn if this works. The decision on whether to split them into
> > two
> > > > projects be taken after the learning curve during incubation.
> > > >
> > > > == Initial Committers ==
> > > >    * Roma Ahuja (rahuja at directv dot com)
> > > >    * Isha Arkatkar (isha at datatorrent dot com)
> > > >    * Raja Ali (raji at silverspringnet dot com)
> > > >    * Sunaina Chaudhary ( SChaudhary at directv dot com)
> > > >    * Bhupesh Chawda (bhupesh at datatorrent dot com)
> > > >    * Chaitanya Chelobu (chaitanya at datatorrent dot com)
> > > >    * Bright Chen (bright at datatorrent dot com)
> > > >    * Pradeep Dalvi (pradeep dot dalvi at datatorrent dot com)
> > > >    * Sandeep Deshmukh (sandeep at datatorrent dot com)
> > > >    * Yogi Devendra (yogi at datatorrent dot com)
> > > >    * Cem Ezberci (hasan dot ezberci at ge dot com)
> > > >    * Timothy Farkas (tim at datatorrent dot com)
> > > >    * Ilya Ganelin (ilya dot ganelin at capitalone dot com)
> > > >    * Parag Goradia (parag dot goradia at ge dot com)
> > > >    * Tushar Gosavi (tushar at datatorrent dot com)
> > > >    * Priyanka Gugale (priyanka at datatorrent dot com)
> > > >    * Gaurav Gupta (gaurav at datatorrent dot com)
> > > >    * Sandesh Hegde (sandesh at datatorrent dot com)
> > > >    * Siyuan Hua ( siyuan at datatorrent dot com)
> > > >    * Ajith Joseph (ajoseph at silverspring dot com)
> > > >    * Amol Kekre ( amol at datatorrent dot com)
> > > >    * Chinmay Kolhatkar ( chinmay at datatorrent dot com)
> > > >    * Pramod Immaneni ( pramod at datatorrent dot com)
> > > >    * Anuj Lal ( anuj dot lal at ge dot com)
> > > >    * Dongsu Lee (dlee3 at directv dot com)
> > > >    * Vitaly Li (blossom dot valley at gmail dot com)
> > > >    * Dean Lockgaard (dean  at datatorrent dot com)
> > > >    * Rohan Mehta (rohan_mehta at apple dot com)
> > > >    * Adi Mishra (apmishra at directv dot com, adi dot mishra at gmail
> > dot
> > > > com)
> > > >    * Chetan Narsude (chetan  at datatorrent dot com)
> > > >    * Darin Nee (dnee at silverspring dot com)
> > > >    * Alexander Parfenov (sasha at datatorrent dot com)
> > > >    * Andrew Perlitch (andy at datatorrent dot com)
> > > >    * Shubham Phatak (shubham at datatorrent dot com)
> > > >    * Ashwin Putta (ashwin at datatorrent dot com)
> > > >    * Rikin Shah (rikin dot shah at capitalone dot com)
> > > >    * Luis Ramos (l dot ramos at ge dot com)
> > > >    * Munagala Ramanath (ram at datatorrent dot com)
> > > >    * Vlad Rozov (vlad dot rozov at datatorrent dot com)
> > > >    * Atri Sharma (atri dot jiit at gmail dot com)
> > > >    * Chandni Singh (chandni at datatorrent dot com)
> > > >    * Venkatesh Sivasubramanian (venkateshs at ge dot com)
> > > >    * Aniruddha Thombare (aniruddha at datatorrent dot com)
> > > >    * Jessica Wang (jessica at datatorrent dot com)
> > > >    * Thomas Weise (thomas at datatorrent dot com)
> > > >    * David Yan (david at datatorrent dot com)
> > > >    * Kevin Yang (yang dot k at ge dot com)
> > > >    * Brennon York (brennon dot york at capitalone dot com)
> > > >
> > > > == Affiliations ==
> > > >    * Apple: Vitaly Li, Rohan Mehta
> > > >    * Barclays: Atri Sharma
> > > >    * Class Software: Justin Mclean
> > > >    * CapitalOne: Ilya Ganelin, Rikin Shah, Brennon York
> > > >    * DataTorrent: everyone else on this proposal
> > > >    * DirecTV: Roma Ahuja, Sunaina Chaudhary, Dongsu Lee, Adi Mishra
> > > >    * General Electric: Cem Ezberci, Parag Goradia, Anuj Lal, Luis
> > Ramos,
> > > > Venkatesh Sivasubramanian, Kevin Yang
> > > >    * Hortonworks: Alan Gates, Taylor Goetz, Chris Nauroth, Hitesh
> Shah
> > > >    * MapR: Ted Dunning
> > > >    * SilverSpring Networks: Raja Ali, Ajith Joseph, Darin Nee
> > > >
> > > > == Sponsors ==
> > > >
> > > > === Champion ===
> > > > Ted Dunning
> > > >
> > > > === Nominated Mentors ===
> > > >
> > > > The initial mentors are listed below:
> > > >    * Ted Dunning - Apache Member, MapR
> > > >    * Alan Gates - Apache Member, Hortonworks
> > > >    * Taylor Goetz - Apache Member, Hortonworks
> > > >    * Justin Mclean - Apache Member, Class Software
> > > >    * Chris Nauroth - Apache Member, Hortonworks
> > > >    * Hitesh Shah: Apache Member, Hortonworks
> > > >
> > > > === Sponsoring Entity ===
> > > >
> > > > We would like to propose Apache incubator to sponsor this project.
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> <javascript:;>
> > > For additional commands, e-mail: general-help@incubator.apache.org
> <javascript:;>
> > >
> > >
> >
>


-- 
Sent from My iPad, sorry for any misspellings.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message