incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject [VOTE] Accept Spot into the Apache Incubator
Date Tue, 20 Sep 2016 18:15:34 GMT
Following the discussion thread, I would like to call a vote on
accepting Spot into the Apache Incubator.

[] +1 Accept Spot into the Apache Incubator
[] +0 Abstain.
[] -1 Do not accept Spot into the Apache Incubator because ...

This vote will run for the usual 72 hours.

The proposal is attached, but you can also access it on the wiki:



= SpotProposal =

== Abstract ==

Spot is an open source platform for network telemetry (packet, flow,
and proxy at the moment) built on an open data model and Apache

== Proposal ==

Spot (formerly Open Network Insight, or ONI) is an open source
solution for network telemetry (packet, flow, and proxy at the moment)
built on an open data model and Apache Hadoop. It provides ingestion
and transformation of binary data, scalable machine learning, and
interactive visualization for identifying threats in network flows and
DNS packets.

Spot has a pluggable architecture that can accommodate multiple open
data models. Although cybersecurity/network-intrusion analysis is the
initial use case for Spot, we are actively encouraging the
contribution of new models that will enable other adjacent
applications, such as fraud detection or IT-operational analytics such
as performance and health monitoring. Because these models are open,
users maintain control of their own data.

More information on Spot can be found at the existing project website

== Background ==

It almost goes without saying that cybersecurity is an acute and
paramount concern globally, for organizations of all types and
sizes. Fortunately, thanks to the availability of massively scalable
(in the PBs) data infrastructure, security professionals can now make
authentically data-driven decisions about how they protect their
assets. For example, records of network traffic, captured as network
flows, are often stored and analyzed for use in network management,
and this same information can provide valuable insights into network

Cybersecurity is just one example, however: There are other examples
of adjacent use cases, such as user fraud detection or IT-operations
analytics, that would benefit from the combination of Spot
functionality and PB-scale data sets for analysis.

== Rationale ==

Although cybersecurity is its initial use case/data model, Spot is
intended to more generally tackle the dual challenges of facilitating
the development of big data-driven analytic solutions, while helping
vendors avoid having to create one/off infrastructure for each use
case. Spot will eliminate issues related to vendor data models that
create silos between solutions, and that make it difficult for users
to consume these innovations from multiple vendors. In summary, Spot
will accelerate the development of new massively scalable analytic
applications that give users more flexibility, and more choices.

As an initial effort, we are now seeking to build an ecosystem of
developers, data scientists, and security professionals to make Spot
the open, community-driven, cybersecurity platform standard it needs
to become. By bringing Spot to Apache, we hope to galvanize these
groups to cooperate in this highly matrixed effort, and to build a
global, and diverse, Spot community.

== Initial Goals ==

Move the existing codebase, website, documentation, and mailing lists
to Apache-hosted infrastructure Work with the infrastructure team to
implement and approve our build and testing workflows in the context
of the ASF Incremental development and releases per Apache guidelines

== Current Status ==

=== Releases ===

Spot has undergone one public release (1.0). This initial release was
not performed in the typical ASF fashion; we will adopt the ASF source
release process upon joining the incubator.

=== Source ===

Spot’s source, including core platform and associated submodules, is
currently hosted in several GitHub repositories under the indicated

 * Core (Apache License 2.0)
 * Oni-ingest (Apache License 2.0)
 * Oni-ml (Apache License 2.0
 * Oni-oa (BSD & MIT)
 * Oni-setup (Apache License 2.0)
 * Oni-nfdump (BSD)
 * Oni-lda-c (GNU General Public License version 2)

The repositories will be transitioned to Apache’s git hosting during
incubation.  Issues related to GPL code will be resolved during

=== Issue Tracking ===

Spot’s bug and feature tracking is hosted on Github at:


Issue tracking will be transitioned to Apache’s JIRA instance during incubation.

=== Code review ===

Spot maintainers currently use “LGTM” (Looks Good to Me) in comments
on the code review to indicate acceptance, with at least three LGTMs
required to approve the merge.

=== Community discussion ===

A Spot Slack channel is available at:

 * (Invites
request via

Community discussion options will be expanded considerably when mailing lists are available.

=== Meritocracy ===

We intend to adhere to a meritocratic approach to electing new
committers and PMC members. We also believe that contributions can
come in forms other than just code. We will encourage contributions
and participation of all types, and ensure that contributors are
appropriately recognized and that PMC memberships are appropriately

=== Community ===

Though Spot is a relatively new project, it has already seen promising adoption:

 * Intel is the original development sponsor for Spot.
 * Cloudera is strong advocate for open source cybersecurity solutions
and Apache Hadoop, and a supporter of Spot.
 * Cloudwick’s OAS cybersecurity solution is built on Spot.
 * Accenture’s Cyber Intelligence Platform solution is built on Spot.
 * Centrify has announced its intention to contribute identity-based
security features to Spot’s network-intrusion detection data model.
 * Webroot has announced its intention to contribute endpoint-security
 * Cybraics has announced its intention to contribute network-security
 * Jask has announced its intention to contribute network-security

As described in the “Rationale” section, we believe that building on
and expanding the Spot community will be a key aspect in its success.

=== Core Developers ===

Spot was initially developed as a project at Intel, and most of the
contributions to date have been from developers employed by that
company. By bringing Spot to Apache, we hope to diversify its
developer community more rapidly.

=== Alignment ===

Spot is built on Apache Hadoop, Apache Kafka, and Apache Spark, and as
more functionality is built out, integration with other Apache
projects is very likely.

== Known Risks ==

=== Orphaned products ===

The risk of Spot being abandoned is low. Intel has made substantial
investments already, Cloudera has publicly expressed the importance of
Spot as a “killer app” for Apache Hadoop, and Cloudwick and Accenture
both have offerings that are built on Spot/CDH.

=== Inexperience with Open Source ===

Most of Spot’s initial committers have experience in open source
development, although not necessarily within the ASF. Those Spot
developers who have little open source experience or are not Apache
committers are eager to learn ASF practices as a means of improving
project governance and diversifying the developer community.

=== Homogenous Developers ===

As mentioned previously, Intel developers are mostly responsible for
what Spot code exists, to date. As a benefit of ASF governance, we
hope to scale-up contributions from new developers and community
members and eventually, develop them into committers by adhering to
Apache’s meritocratic principles.

=== Reliance on Salaried Developers ===

To date, all Spot code has been written by salaried developers
(chiefly employed by Intel).

=== Relationships with Other Apache Products ===

Spot is currently related to the following other Apache projects:

 * Apache Hadoop
 * Apache Spark
 * Apache Kafka

We look forward to continuing to integrate and collaborate with these

=== A Excessive Fascination with the Apache Brand ===

Although most (not all) of the initial committers are not currently
Apache committers, they are resolved to learning, with the help of the
more experienced committers/project mentors/champion, the Apache
Way. We believe that adhering to these principles will be of great
value with respect to meeting long-term project goals, including
facilitating widespread adoption.

== Documentation ==

Spot functionality is divided into different repositories, with each
repository containing the relevant developer documentation:

 * oni-ingest
 * oni-ml
 * oni-oa
 * oni-setup
 * oni-nfdump
 * oni-lda-c

An Installation Guide is published in the project wiki:
The Spot (currently Open Network Insight) website is managed via a
Wordpress instance hosted by Bluehost:
A Docker-based demo is available via Docker Hub:

== Initial Source ==

The Spot codebase is currently hosted on GitHub and will be
transitioned to the ASF repositories during incubation. Spot and its
submodules are currently licensed under several different licenses.

No trademarks or domain names for Spot have been registered to date,
and it will be up to the ASF’s discretion to do so. The project’s
current website at will be redirected to during incubation.

Some portions of the code are imported from other open source projects
under the Apache 2.0, BSD, or MIT licenses.

== External Dependencies ==

The full set of dependencies and licenses are:
 * Jupyter: BSD
 * D3js: BSD
 * Nfdump: BSD
 * Wireshark: GNU General Public License version 2
 * Apache Hadoop: Apache License 2.0
 * Apache Spark: Apache License 2.0
 * JQuery: MIT
 * ReactJS: BSD
 * Bootstrap: MIT

Issues related to GPL dependencies will be resolved during incubation.

== Cryptography ==

Spot does not currently include any cryptography-related code.

== Required Resources ==

=== Developer and user mailing lists ===

 * (PMC)
 * (git push emails)
 * (JIRA issue feed)
 * (code reviews plus dev discussion)
 * (user questions)

=== Repository ===

 * git://

=== Issue Tracker ===

We would like to import our current JIRA project into the ASF JIRA,
such that our historical commit messages and code comments continue to
reference the appropriate bug numbers.

== Initial Committers ==

 * Grant Babb
 * Ricardo Barona
 * Cesar Berho
 * Jarek Jarcec Cecho
 * Michael Czerny
 * Nick Gamb
 * Sai Ganji
 * Gabriela Lima Garza
 * Victor Gonzalez
 * Mark Grover
 * Morris Hicks
 * Ritu Kama
 * Austin Leahy
 * Ashrith Mekala
 * Diego Ortiz
 * Sudharshan Rao PakalaSai
 * Srinivasa Reddy
 * Alan Ross
 * Everardo Lopez Sandoval
 * Nathan Segerlind
 * Vartika Singh
 * Nathanael Smith
 * Carlos Villavicencio

== Affiliations ==

 * Grant Babb: Jask
 * Ricardo Barona : Intel
 * Cesar Berho: Intel
 * Jarek Jarcec Cecho: StreamSets
 * Michael Czerny: Cybraics
 * Nick Gamb: Centrify
 * Sai Ganji: Cloudwick
 * Gabriela Lima Garza: Intel
 * Victor Gonzalez: Intel
 * Mark Grover: Cloudera
 * Morris Hicks: Cloudera
 * Ritu Kama: Intel
 * Austin Leahy: eBay
 * Ashrith Mekala: Cloudwick
 * Diego Ortiz: Intel
 * Sudharshan Rao PakalaSai: Cloudwick
 * Srinivasa Reddy: Cloudera
 * Alan Ross: Intel
 * Everardo Lopez Sandoval: Intel
 * Nathan Segerlind: Intel
 * Vartika Singh: Cloudera
 * Nathanael Smith: Intel
 * Carlos Villavicencio: Intel

== Sponsors ==

=== Champion ===

 * Doug Cutting - Cloudera

=== Nominated Mentors ===

 * Brock Noland - ASF Member, phData
 * Jarek Jarcec Cecho - ASF Member, StreamSets
 * Andrei Savu - Cloudera
 * Uma Maheswara Rao G - Intel

=== Sponsoring Entity ===

The Apache Incubator.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message