incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luciano Resende <>
Subject [DISCUSS] Marvin-AI Incubator Proposal
Date Wed, 15 Aug 2018 19:13:36 GMT
We would like to start a discussion on accepting Marvin-AI as an Apache
Incubator project.

The proposal is available at the incubator wiki, and also copied below:

As part of the initial due diligence, we have done a preliminary name
search and the results are available on the JIRA below:

We are also looking for two additional mentors.

Thanks in advance for your time reviewing and providing feedback.


= Marvin-AI =

== Abstract ==

Marvin-AI is an open-source artificial intelligence (AI) platform that
helps data scientists, prototype and productionalize complex solutions with
a scalable, low-latency, language-agnostic, and standardized architecture
while simplifies the process of exploration and modeling.

== Proposal ==

Marvin helps non-experienced developers create industry-grade AI
applications. It has three core components:  a development environment to
be used during data exploration and hypothesis validation (Toolbox), a
library which should be extended to create Marvin engines, and a Scala
application server which interprets engines (Engine Executor).
A basic premise of Marvin is that it should be language-agnostic, able to
interpret engines implemented in different programming languages.

== Background ==

The Marvin AI project was initiated as an internal project at B2W Digital
(Brazil), the largest e-commerce company in Latin America. Nowadays, it is
used by all data scientists within the B2W team. Oftentimes, data
scientists don't have an extensive background in software engineering, yet
are in charge of creating AI applications that need to scale to high
throughput and provide millisecond-level response times. At B2W, Marvin AI
plays an important role in this process, abstracting advanced software
engineering procedures, allowing data scientists to focus on their
knowledge domain.

== Rationale ==

With recent advances in computer architecture and a corresponding increase
in the amount of data generated by always-connected devices, AI algorithms
offer a solution to problems that have long troubled modern corporations.
Since AI developers come from various fields, such as statistics, physics,
and math, there exists a strong need for platforms which enable them to
move from prototypes to enterprise applications. Although some tools claim
to offer this service, in reality, there is no reliable open-source

== Initial Goals ==

The initial goals will most likely be to merge the existing codebase into a
single repository, migrate it to Apache, and then integrate with the Apache
development process. Furthermore, we plan for incremental development and
releases, as per Apache guidelines.

== Current Status ==

=== Meritocracy ===

Marvin already works under principles of meritocracy. Today, Marvin already
has some contributors that are part of other institutions. Although there
is no formal process defined to become a committer, contributors that make
major changes/improvements to the platform are naturally granted write
access to the repository.

=== Community ===

Acceptance into the Apache foundation would substantially boost both
Marvin's user and developer communities. The current community includes a
few experienced developers that have either academic or professional
experience with AI. The community is largely comprised of data scientists
working at B2W and other companies such as Cloudera, MIT, Qume Labs,, and CBYK. Also, there is a  meetup group of hundreds of users
who meet regularly to exchange ideas about Marvin and, more generally, AI.

Reference to the group:

=== Core Developers ===

The core developers for Marvin are listed in the contributor's list and
initial PPMC below. These lists include B2W employees, MIT students, UFSCAR
researchers, independent contributors, and some employees of other
companies like Cloudera, Qume Labs,, and CBYK.

=== Alignment ===

The initial committers strongly believe that by being part of the Apache
Software Foundation, Marvin AI will be part of a comprehensive suite for AI
applications that can process big data and enable enterprises to extract
value from their data lakes. Also, we hope that by integrating with other
Apache projects such as Apache Spark, Apache Hadoop; that this will foster
additional collaboration between these projects furthering the already
existing integration points and expanding the community of contributors.

== Known Risks ==

=== Orphaned products ===

Given the current maturity of Marvin and how well it has been received at
technical conferences, the risk of the project being abandoned is minimal.
AI is not academia-exclusive anymore, and as enterprises start to add
data-science pipelines to their applications, demand for Marvin will only

=== Inexperience with Open Source ===

Marvin AI has been an open-source project since October 2017. The project
was started in a company where open-source culture is foundational. B2W
Digital runs the largest e-commerce in Latin America on top of open-source

=== Reliance on Salaried Developers ===

Marvin AI receives substantial efforts from salaried developers -- a few of
which were hired by companies to work exclusively for the project -- but
the majority devote "after-hours" or spare time to this project. Some
developers are graduate students that contribute in their free time at

=== Relationships with Other Apache Products ===

Marvin integrates with several Apache products, such as Hadoop (HDFS) and
Spark. Marvin shares some similar features with PredictionIO, specifically
the model application server and a design pattern that was inspired by the
DASE. Despite these similarities, Marvin is catered towards a different
clientele (data scientists), and for that reason, it includes many critical
features that are not provided by PredictionIO.

=== An Excessive Fascination with the Apache Brand ===

While the ASF brand will undoubtedly help Marvin become a successful
project, Marvin is already gaining traction at companies around the globe.

== Documentation ==

== Initial Source ==

The current codebase is available at This is
practically the same code that will be migrating to the Apache Foundation,
the notable difference being that the multiple repositories will be merged
into a single repository (if necessary).

These are the main repositories and a very simplified explanation about
each one:

'''Main repositories'''

 * marvin-ai/marvin-python-toolbox - Data Science toolbox that helps in the
creation of new ML engines
 * marvin-ai/marvin-engine-executor - Component responsible for
interpreting, serving and managing Marvin engines
 * marvin-ai/marvin-public-engines - Marvin engine examples to help new
Marvin users to build engines
 * marvin-ai/marvin-platform-book - Documentation in GitHub book site format

'''Secondary repositories (Experimental and Initial)'''
 * marvin-ai/marvin-vagrant-dev - Development environment that uses
VirtualBox and vagrant to non mac and Linux users;
 * marvin-ai/marvin-paper - Source code (latex format) of the first Marvin
paper published in conference in Boston.
 * marvin-ai/marvin-cluster-admin - Admin module responsible to manage
Marvin cluster;
 * marvin-ai/marvin-automl - AutoML module responsible to help data
scientist to build machine learning models with a very simple visual

== External Dependencies ==

It is very likely that all our dependencies are using either the Apache or
MIT license. Upon acceptance to the incubator, we would begin a thorough
analysis of all transitive dependencies to verify this fact and introduce
license checking into the build and release process.

== Required Resources ==

=== Mailing lists ===

  * (with moderated subscriptions)

=== Git Repositories ===


=== Issue Tracking ===


== Initial Committers ==

 * Lucas Bonatto Miguel <> - Qume Labs (California -
 * Daniel Takabayashi <> - B2W Digital (São
Paulo - BR) / (California - USA)
 * Bruno Piraja <> - B2W Digital (São Paulo - BR)
 * Zhang Yifei <> - B2W Digital (São Paulo - BR)
 * Harrison Wang <> - MIT (USA)
 * Brody West <> - MIT (USA)
 * Rafael Novello <> - B2W Digital (São Paulo
- BR)
 * Willian Leite <> - CBYK (São Paulo - BR)
 * Danilo Nunes <> - Qume Labs (California - USA)
 * Alan Silva <> Cloudera (USA)
 * Jeremy Elster <> - B2W Digital (São Paulo -

== Sponsors ==

=== Champion ===

 * Luciano Resende - (lresende)

=== Nominated Mentors ===

 * Luciano Resende - (lresende)

=== Sponsoring Entity ===
We would like to propose the Apache Incubator to sponsor this project.

Luciano Resende

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message