incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Chan <si...@salesforce.com>
Subject Re: [DISCUSS] PredictionIO incubation proposal
Date Sun, 15 May 2016 04:14:49 GMT
Thanks Roman.

1. Apache Beam looks promising. I agree it can potentially be extremely
useful in, for example, Data Preparator of DASE-architecture engine of
PredictionIO so it can leverage Spark/Flink/Google Dataflow.  Look forward
to hearing more about it.

2. The integration with Apache Zeppelin is definitely a great suggestion.
In fact, Lee Moon Soo, an initial committer of Zeppelin is also listed as
committer in this proposal. Some works have been done previously (
https://docs.prediction.io/datacollection/analytics-zeppelin/) but I
anticipate a tighter collaboration with Apache Zeppelin after PredictionIO
becomes an Apache project.

Regards,
Simon

On Saturday, May 14, 2016, Andrew Purtell <andrew.purtell@gmail.com> wrote:

> Yikes, apologies for the formatting. It looked fine in Gmail when I sent
> it alas.
>
> I must let the proposers respond to the technical questions but I think I
> can make the general observation that would-be contributors proposing and
> performing work on new and better Apache ecosystem integrations would be
> excellent for the health of the new podling and the ecosystem at large.
>
>
> > On May 14, 2016, at 5:32 PM, Roman Shaposhnik <roman@shaposhnik.org
> <javascript:;>> wrote:
> >
> > Super excited to see this proposal! This will finally allow us to have
> > an ASF managed
> > backend for next generation data-driven apps that I see emerging quite
> rapidly.
> >
> > The proposal looks great to me (although I'd recommend calling Scala
> > as an implementation
> > language more prominently since it may attract additional developers
> > with affinity to it).
> >
> > I do have two questions about technology:
> >   1. do you think it would be possible to leverage Apache Beam
> (incubating)
> >       for abstracting away dependency on execution frameworks? My
> understanding
> >       is that PredictionIO currently only run on Spark.
> >   2. is there a potential integration with Apache Zeppelin possible?
> >
> > Thanks,
> > Roman.
> >
> >> On Fri, May 13, 2016 at 1:41 PM, Andrew Purtell <apurtell@apache.org
> <javascript:;>> wrote:
> >> Greetings,
> >>
> >> It is my pleasure to
> >>
> >> propose the PredictionIO project for incubation at the Apache Software
> >> Foundation.
> >>
> >> PredictionIO is a
> >> popular
> >> open
> >>
> >> source Machine Learning Server built on top of a state-of-the-art open
> >> source stack, including several Apache technologies, that
> >>
> >> enables developers to manage and deploy production-ready predictive
> >> services for various kinds of machine learning tasks
> >> , with more than 400 production deployments around the world and a
> growing
> >> contributor community.
> >>
> >>
> >> The text of the proposal is included below and is also available at
> >> https://wiki.apache.org/incubator/PredictionIO
> >>
> >> Best regards,
> >> Andrew Purtell
> >>
> >>
> >> = PredictionIO Proposal =
> >>
> >> === Abstract ===
> >> PredictionIO is an open source Machine Learning Server built on top of
> >> state-of-the-art open source stack, that enables developers to manage
> and
> >> deploy production-ready predictive services for various kinds of machine
> >> learning tasks.
> >>
> >> === Proposal ===
> >> The PredictionIO platform consists of the following components:
> >>
> >> * PredictionIO framework - provides the machine learning stack for
> >> building, evaluating and deploying engines with machine learning
> >> algorithms. It uses Apache Spark for processing.
> >>
> >> * Event Server - the machine learning analytics layer for unifying
> events
> >> from multiple platforms. It can use Apache HBase or any JDBC backends
> >> as its data store.
> >>
> >> The PredictionIO community also maintains a
> >>
> >> Template Gallery, a place to
> >> publish and download (free or proprietary) engine templates for
> different
> >> types of machine learning applications, and is a complemental part of
> the
> >> project. At this point we exclude the Template Gallery from the
> proposal,
> >> as it has a separate set of contributors and we’re not familiar with an
> >> Apache approved mechanism to maintain such a gallery.
> >>
> >> You can find the Template Gallery at https://templates.prediction.io/
> >>
> >> === Background ===
> >> PredictionIO was started with a mission to democratize and bring machine
> >> learning to the masses.
> >>
> >> Machine learning has traditionally been a luxury for big companies like
> >> Google, Facebook, and Netflix. There are ML libraries and tools lying
> >> around the internet but the effort of putting them all together as a
> >> production-ready infrastructure is a very resource-intensive task that
> is
> >> remotely reachable by individuals or small businesses.
> >>
> >> PredictionIO is a production-ready, full stack machine learning system
> that
> >> allows organizations of any scale to quickly deploy machine learning
> >> capabilities. It comes with official and community-contributed machine
> >> learning engine templates that are easy to customize.
> >>
> >> === Rationale ===
> >> As usage and number of contributors to PredictionIO has grown bigger and
> >> more diverse, we have sought for an independent framework for the
> project
> >> to keep thriving. We believe the Apache foundation is a great fit.
> Joining
> >> Apache would ensure that tried and true processes and procedures are in
> >> place for the growing number of organizations interested in contributing
> >> to PredictionIO. PredictionIO is also a good fit for the Apache
> foundation.
> >> PredictionIO was built on top of several Apache projects (HBase, Spark,
> >> Hadoop). We are familiar with the Apache process and believe that the
> >> democratic and meritocratic nature of the foundation aligns with the
> >> project goals.
> >>
> >> === Initial Goals ===
> >> The initial milestones will be to move the existing codebase to Apache
> and
> >> integrate with the Apache development process. Once this is
> accomplished,
> >> we plan for incremental development and releases that follow the Apache
> >> guidelines, as well as growing our developer and user communities.
> >>
> >> === Current Status ===
> >> PredictionIO has undergone nine minor releases and many patches.
> >> PredictionIO is being used in production by Salesforce.com as well as
> many
> >> other organizations and apps. The PredictionIO codebase is currently
> >> hosted at GitHub, which will form the basis of the Apache git
> repository.
> >>
> >> ==== Meritocracy ====
> >> We plan to invest in supporting a meritocracy. We will discuss the
> >> requirements in an open forum. We intend to invite additional developers
> >> to participate. We will encourage and monitor community participation so
> >> that privileges can be extended to those that contribute.
> >>
> >> ==== Community ====
> >> Acceptance into the Apache foundation would bolster the already strong
> >> user and developer community around PredictionIO. That community
> includes
> >> many contributors from various other companies, and an active mailing
> list
> >> composed of hundreds of users.
> >>
> >> ==== Core Developers ====
> >> The core developers of our project are listed in our contributors and
> >> initial PPMC below. Though many are employed at Salesforce.com, there
> are
> >> also engineers from ActionML, and independent developers.
> >>
> >> === Alignment ===
> >> The ASF is the natural choice to host the PredictionIO project as its
> goal
> >> is democratizing Machine Learning by making it more easily accessible to
> >> every user/developer. PredictionIO is built on top of several top level
> >> Apache projects as outlined above.
> >>
> >> === Known Risks ===
> >>
> >> ==== Orphaned products ====
> >> PredictionIO has a solid and growing community. It is deployed on
> >> production environments by companies of all sizes to run various kinds
> of
> >> predictive engines.
> >>
> >> In addition to the community contribution to PredictionIO framework, the
> >> community is also actively contributing new engines to the Template
> >> Gallery as well as SDKs and documentation for the project. Salesforce is
> >> committed to utilize and advance the PredictionIO code base and support
> >> its user community.
> >>
> >> ==== Inexperience with Open Source ====
> >> PredictionIO has existed as a healthy open source project for almost two
> >> years and is the most starred Scala project on GitHub. All of the
> proposed
> >> committers have contributed to ASF and Linux Foundation open source
> >> projects. Several current committers on Apache projects and Apache
> Members
> >> are involved in this proposal and intend to provide mentorship.
> >>
> >> ==== Homogeneous Developers ====
> >> The initial list of committers includes developers from several
> >> institutions, including Salesforce, ActionML, Channel4, USC as well as
> >> unaffiliated developers.
> >>
> >> ==== Reliance on Salaried Developers ====
> >> Like most open source projects, PredictionIO receives substantial
> support
> >> from salaried developers. PredictionIO development is partially
> supported
> >> by Salesforce.com, but there are many contributors from various other
> >> companies, and an active mailing list composed of hundreds of users. We
> >> will continue our efforts to ensure stewardship of the project to be
> >> independent of salaried developers by meritocratically promoting those
> >> contributors to committers.
> >>
> >> ==== Relationships with Other Apache Product ====
> >> PredictionIO relies heavily on top level apache projects such as Apache
> >> Spark, HBase and Hadoop. However it brings a distinguished
> functionality,
> >> rather than just an abstraction - Machine Learning in a plug-and-play
> >> fashion.
> >>
> >> Compared to Apache Mahout, which focuses on the development of a wide
> >> variety of algorithms, PredictionIO offers a platform to manage the
> whole
> >> machine learning workflow, including data collection, data preparation,
> >> modeling, deployment and management of predictive services in production
> >> environments.
> >>
> >> ==== An Excessive Fascination with the Apache Brand ====
> >> PredictionIO is already a widely known open source project. This
> proposal
> >> is not for the purpose of generating publicity. Rather, the primary
> >> benefits to joining Apache are those outlined in the Rationale section.
> >>
> >> === Documentation ===
> >> PredictionIO boasts rich and live documentation, included in the code
> repo
> >> (docs/manual directory), is built with Middleman, and publicly hosted at
> >> https://docs.prediction.io
> >>
> >> === Initial Source and Intellectual Property Submission Plan ===
> >> Currently, the PredictionIO codebase is distributed under the Apache 2.0
> >> License and hosted on GitHub:
> https://github.com/PredictionIO/PredictionIO
> >>
> >> === External Dependencies ===
> >> PredictionIO has the following external dependencies:
> >> * Apache Hadoop 2.4.0 (optional, required only if YARN and HDFS are
> needed)
> >> * Apache Spark 1.3.0 for Hadoop 2.4
> >> * Java SE Development Kit 8
> >> * and one of the following sets:
> >>
> >>   * PostgreSQL 9.1
> >>
> >>
> >> or
> >>
> >>
> >> * MySQL 5.1
> >>
> >> or
> >>
> >>
> >> * Apache HBase 0.98.6
> >>
> >>
> >> * Elasticsearch 1.4.0
> >>
> >> Upon acceptance to the incubator, we would begin a thorough analysis of
> >> all transitive dependencies to verify this information and introduce
> >> license checking into the build and release process by integrating with
> >> Apache RAT.
> >>
> >> === Cryptography ===
> >> PredictionIO does not include cryptographic code. We utilize standard
> >> JCE and JSSE APIs provided by the Java Runtime Environment.
> >>
> >> === Required Resources ===
> >> We request that following resources be created for the project to use
> >>
> >> ==== Mailing lists ====
> >>
> >> predictionio-private@incubator.apache.org <javascript:;> (with
> moderated subscriptions)
> >>
> >> predictionio-dev
> >>
> >> predictionio-user
> >>
> >> predictionio-commits
> >>
> >> We will migrate the existing PredictionIO mailing lists.
> >>
> >> ==== Git repository ====
> >> The PredictionIO team would like to use Git for source control, due to
> our
> >> current use of GitHub.
> >>
> >> git://git.apache.org/incubator-predictionio
> >>
> >> ==== Documentation ====
> >> https://predictionio.incubator.apache.org/docs/
> >>
> >> ==== JIRA instance ====
> >> PredictionIO currently uses the GitHub issue tracking system associated
> >> with its repository:
> https://github.com/PredictionIO/PredictionIO/issues.
> >> We will migrate to Apache JIRA.
> >>
> >> JIRA PREDICTIONIO
> >> https://issues.apache.org/jira/browse/PREDICTIONIO
> >>
> >> ==== Other Resources ====
> >> * TravisCI for builds and test running.
> >>
> >> * PredictionIO's documentation, included in the code repo (docs/manual
> >> directory), is built with Middleman and publicly hosted
> >> https://docs.prediction.io
> >>
> >> * A blog to drive adoption and excitement at https://blog.prediction.io
> >>
> >> === Initial Committers ===
> >>
> >> * Pat Ferrell
> >>
> >> * Tamas Jambor
> >>
> >> * Justin Yip
> >>
> >> * Xusen Yin
> >>
> >> * Lee Moon Soo
> >>
> >> * Donald Szeto
> >>
> >> * Kenneth Chan
> >>
> >> * Tom Chan
> >>
> >> * Simon Chan
> >>
> >> * Marco Vivero
> >>
> >> * Matthew Tovbin
> >>
> >> * Yevgeny Khodorkovsky
> >>
> >> * Felipe Oliveira
> >>
> >> * Vitaly Gordon
> >>
> >> === Affiliations ===
> >>
> >> * Pat Ferrell - ActionML
> >>
> >> * Tamas Jambor - Channel4
> >>
> >> * Justin Yip - independent
> >>
> >> * Xusen Yin - USC
> >>
> >> * Lee Moon Soo - NFLabs
> >>
> >> * Donald Szeto - Salesforce
> >>
> >> * Kenneth Chan - Salesforce
> >>
> >> * Tom Chan - Salesforce
> >>
> >> * Simon Chan - Salesforce
> >>
> >> * Marco Vivero - Salesforce
> >>
> >> * Matthew Tovbin - Salesforce
> >>
> >> * Yevgeny Khodorkovsky - Salesforce
> >>
> >> * Felipe Oliveira - Salesforce
> >>
> >> * Vitaly Gordon - Salesforce
> >>
> >> === Sponsors ===
> >>
> >> ==== Champion ====
> >>
> >> Andrew Purtell <apurtell at apache dot org>
> >>
> >> ==== Nominated Mentors ====
> >>
> >> * Andrew Purtell <apurtell at apache dot org>
> >>
> >> * James Taylor <jtaylor at apache dot org>
> >>
> >> * Lars Hofhansl <larsh at apache dot org>
> >>
> >> * Suneel Marthi <smarthi at apache dot org>
> >>
> >> * Xiangrui Meng <meng at apache dot org>
> >>
> >> * Luciano Resende <lresende at apache dot org>
> >>
> >> ==== Sponsoring Entity ====
> >>
> >> Apache Incubator PMC
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> <javascript:;>
> > For additional commands, e-mail: general-help@incubator.apache.org
> <javascript:;>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> <javascript:;>
> For additional commands, e-mail: general-help@incubator.apache.org
> <javascript:;>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message