incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adunuthula, Seshu" <sadunuth...@ebay.com>
Subject Re: [DISCUSS] SystemML Incubator Proposal
Date Sat, 24 Oct 2015 13:20:02 GMT
Hello Luciano,

Recently heard the presentation on SystemML at Apache BigData conference
and it sounds exciting. Looking forward to Apache Incubation.

Regards
Seshu Adunuthula


On 10/23/15, 5:34 PM, "Luciano Resende" <luckbr1975@gmail.com> wrote:

>On Fri, Oct 23, 2015 at 5:30 PM, Henry Saputra <henry.saputra@gmail.com>
>wrote:
>
>> Hi Luciano,
>>
>> Good proposal, but looks like
>> https://wiki.apache.org/incubator/SystemM does not exist?
>>
>
>Good catch, it's a typo on the original link and it's missing the L at the
>end, here is the correct link
>
>https://wiki.apache.org/incubator/SystemML
>
>
>
>>
>> Also, Reynold Xin and Patrick Wendell are not member of IPMCs so I
>> don't they could be mentors of this project, yet.
>>
>> They can ask to be member of IPMCs since both are already member of
>> ASF. But for now need to remove it from proposal.
>>
>>
>>
>Yes, they are aware of the requirement, and this will be fixed before we
>call a vote on the proposal.
>
>
>
>> - Henry
>>
>> On Fri, Oct 23, 2015 at 4:34 PM, Luciano Resende <luckbr1975@gmail.com>
>> wrote:
>> > We would like to start a discussion on accepting SystemML as an Apache
>> > Incubator project.
>> >
>> > The proposal is available at :
>> > https://wiki.apache.org/incubator/SystemM
>> >
>> > And it's contents is also copied below.
>> >
>> > Thanks in Advance for you time reviewing and providing feedback.
>> >
>> > ==============
>> >
>> > = SystemML =
>> >
>> > == Abstract ==
>> >
>> > SystemML provides declarative large-scale machine learning (ML) that
>>aims
>> > at flexible specification of ML algorithms and automatic generation of
>> > hybrid runtime plans ranging from single node, in-memory
>>computations, to
>> > distributed computations on Apache Hadoop and  Apache Spark. ML
>> algorithms
>> > are expressed in an R-like syntax, that includes linear algebra
>> primitives,
>> > statistical functions, and ML-specific constructs. This high-level
>> language
>> > significantly increases the productivity of data scientists as it
>> provides
>> > (1) full flexibility in expressing custom analytics, and (2) data
>> > independence from the underlying input formats and physical data
>> > representations. Automatic optimization according to data
>>characteristics
>> > such as distribution on the disk file system, and sparsity as well as
>> > processing characteristics in the distributed environment like number
>>of
>> > nodes, CPU, memory per node, ensures both efficiency and scalability.
>> >
>> > == Proposal ==
>> >
>> > The goal of SystemML is to create a commercial friendly, scalable and
>> > extensible machine learning framework for data scientists to create or
>> > extend machine learning algorithms using a declarative syntax. The
>> machine
>> > learning framework enables data scientists to develop algorithms
>>locally
>> > without the need of a distributed cluster, and scale up and scale out
>>the
>> > execution of these algorithms to distributed Hadoop or Spark clusters.
>> >
>> > == Background ==
>> >
>> > SystemML started as a research project in the IBM Almaden Research
>>Center
>> > around 2010 aiming to enable data scientists to develop machine
>>learning
>> > algorithms independent of data and cluster characteristics.
>> >
>> > == Rationale ==
>> >
>> > SystemML enables the specification of machine learning algorithms
>>using a
>> > declarative machine learning (DML) language. DML includes linear
>>algebra
>> > primitives, statistical functions, and additional constructs. This
>> > high-level language significantly increases the productivity of data
>> > scientists as it provides (1) full flexibility in expressing custom
>> > analytics and (2) data independence from the underlying input formats
>>and
>> > physical data representations.
>> >
>> > SystemML computations can be executed in a variety of different
>>modes. It
>> > supports single node in-memory computations and large-scale
>>distributed
>> > cluster computations. This allows the user to quickly prototype new
>> > algorithms in local environments but automatically scale to large data
>> > sizes as well without changing the algorithm implementation.
>> >
>> > Algorithms specified in DML are dynamically compiled and optimized
>>based
>> on
>> > data and cluster characteristics using rule-based and cost-based
>> > optimization techniques. The optimizer automatically generates hybrid
>> > runtime execution plans ranging from in-memory single-node execution
>>to
>> > distributed computations on Spark or Hadoop. This ensures both
>>efficiency
>> > and scalability. Automatic optimization reduces or eliminates the
>>need to
>> > hand-tune distributed runtime execution plans and system
>>configurations.
>> >
>> > == Initial Goals ==
>> >
>> > The initial goals to move SystemML to the Apache Incubator is to
>>broaden
>> > the community foster the contributions from data scientists to develop
>> new
>> > machine learning algorithms and enhance the existing ones. Ultimately,
>> this
>> > may lead to the creation of an industry standard in specifying machine
>> > learning algorithms.
>> >
>> > == Current Status ==
>> >
>> > The initial code has been developed at the IBM Almaden Research
>>Center in
>> > California and has recently been made available in GitHub under the
>> Apache
>> > Software License 2.0. The project currently supports a single node (in
>> > memory computation) as well as distributed computations utilizing
>>Hadoop
>> or
>> > Spark clusters.
>> >
>> > === Meritocracy ===
>> >
>> > We plan to invest in supporting a meritocracy. We will discuss the
>> > requirements in an open forum. Several companies have already
>>expressed
>> > interest in this project, and we intend to invite additional
>>developers
>> to
>> > participate. We will encourage and monitor community participation so
>> that
>> > privileges can be extended to those that contribute operating to the
>> > standard of meritocracy that Apache emphasizes.
>> >
>> > === Community ===
>> >
>> > The need for a generic scalable and declarative machine learning
>>approach
>> > in the open source is tremendous, so there is a potential for a very
>> large
>> > community. We believe that SystemML┬╣s extensible architecture,
>> declarative
>> > syntax, cost based optimizer and its alignment with Spark will further
>> > encourage community participation not only in enhancing the
>> infrastructure
>> > but also speed up the creation of algorithms for a wide range of use
>> > cases.  We expect that over time SystemML will attract a large
>>community.
>> >
>> > === Alignment ===
>> >
>> > The initial committers strongly believe that a generic scalable and
>> > declarative machine learning approach for machine learning will gain
>> > broader adoption as an open source, community driven project, where
>>the
>> > community can contribute not only to the core components, but also to
>>a
>> > growing collection of algorithms which will leverage the optimizations
>> and
>> > ease of scaling in SystemML. Our hope is that the Apache Spark, Apache
>> > Hadoop and other communities will find tremendous value in SystemML
>>and
>> > this will foster further collaboration between these projects
>>furthering
>> > the already existing integration points.
>> >
>> > == Known Risks ==
>> >
>> > To-date, development has been sponsored by IBM and coordinated mostly
>>by
>> > the core team of researchers at the IBM Almaden Research Center.
>> >
>> > For SystemML to fully transition to an "Apache Way" governance model,
>>it
>> > needs to start embracing the meritocracy-centric way of growing the
>> > community of contributors.
>> >
>> > === Orphaned Products ===
>> >
>> > The SystemML developers and previous sponsor have a long-term
>>interest in
>> > use and maintenance of the code and there is also hope that growing a
>> > diverse community around the project will become a guarantee against
>>the
>> > project becoming orphaned. We feel that it is also important to put
>> formal
>> > governance in place both for the project and the contributors as the
>> > project expands. We feel ASF is the best location for this.
>> >
>> > === Inexperience with Open Source ===
>> >
>> > The current SystemML set of contributors are very diverse regarding
>> > participation in Open Source. While some initial members are
>>experiencing
>> > an open source project for the first time, others have been
>>contributing
>> > and mentoring various Apache and non-Apache open source projects.
>> >
>> > === Reliance on Salaried Developers ===
>> >
>> > SystemML currently receives substantial support from salaried
>>developers.
>> > However, they are all passionate about the project, and we are
>>confident
>> > that the project will continue even if no salaried developers
>>contribute
>> to
>> > the project. We are committed to recruiting additional committers
>> including
>> > non-salaried developers.
>> >
>> > === Relationships with Other Apache Products ===
>> >
>> > Currently, SystemML integrates with Apache Hadoop and Apache Spark as
>> > underlying computational distributed runtimes.
>> >
>> > === An Excessive Fascination with the Apache Brand ===
>> >
>> > SystemML solves a real need for generic scalable and declarative
>>machine
>> > learning approach for machine learning in the Apache Hadoop and Spark
>> > ecosystems, something that has been addressed in a very ad hoc manner
>>so
>> > far by multiple Apache projects. Our rationale for developing
>>SystemML as
>> > an Apache project is detailed in the Rationale section. We believe
>>that
>> the
>> > Apache brand and community process will help us attract more
>>contributors
>> > to this project, and help establish ubiquitous APIs.
>> >
>> > == Documentation ==
>> >
>> > Documentation regarding SystemML is available in the current GitHub
>> > repository
>> https://github.com/SparkTC/systemml/tree/master/system-ml/docs.
>> >
>> > == Initial Source ==
>> >
>> > Initial source is available on GitHub under the Apache License 2.0
>> >
>> > https://github.com/SparkTC/systemml
>> >
>> > == Source and Intellectual Property Submission Plan ==
>> >
>> > We know of no legal encumbrances in the transfer of source code and
>> rights
>> > to Apache. In fact, given the internal IBM due diligence performed on
>>the
>> > source code during open sourcing, we expect the code base to be free
>>from
>> > any IP issues.
>> >
>> > == External Dependencies ==
>> >
>> > SystemML is written in Java and currently supports Apache Hadoop
>> MapReduce
>> > and Apache Spark runtimes.
>> >
>> > To the best of our knowledge, all dependencies of SystemML are
>> distributed
>> > under Apache compatible licenses. Upon acceptance to the incubator, we
>> > would begin a thorough analysis of all transitive dependencies to
>>verify
>> > this fact and introduce license checking into the build and release
>> process
>> > (for instance integrating Apache Rat).
>> >
>> > Cryptography
>> > N/A
>> >
>> > == Required Resources ==
>> >
>> > === Mailing lists ===
>> >       * private@sysml.incubator.apache.org (moderated subscriptions)
>> >       * commits@sysml.incubator.apache.org
>> >       * dev@sysml.incubator.apache.org
>> >
>> > === Git Repository ===
>> >       * https://git-wip-us.apache.org/repos/asf/incubator-sysml.git
>> >
>> > === Issue Tracking ===
>> >       * JIRA (SYSML)
>> >
>> > == Initial Committers ==
>> >
>> >  * Luciano Resende (lresende AT apache DOT org)
>> >  * Berthold Reinwald (reinwald AT us DOT ibm DOT com)
>> >  * Matthias Boehm (mboehm AT us DOT ibm DOT com)
>> >  * Shirish Tatikonda (statiko AT us DOT ibm DOT com)
>> >  * Niketan Pansare (npansar AT us DOT ibm DOT com)
>> >  * Prithviraj Sen (senp AT us DOT ibm DOT com)
>> >  * Alexandre V Evfimievski (evfimi AT us DOT ibm DOT com)
>> >  * Fred Reiss (frreiss AT us DOT ibm DOT com)
>> >  * Deron Eriksson (deron AT us DOT ibm DOT com)
>> >  * Arvind Surve (asurve AT us DOT ibm DOT com)
>> >  * Mike Dusenberry (mwdusenb AT us DOT ibm DOT com)
>> >  * Reynold Xin   (rxin AT apache DOT org)
>> >  * Xiangrui Meng (meng AT apache DOT org)
>> >  * Joseph Bradley (jkbradley AT apache DOT org)
>> >  * Patrick Wendell (pwendell AT apache DOT org)
>> >  * Holden Karau (holden AT apache DOT org)
>> >  * DB Tsai (dbtsai AT apache DOT org)
>> >
>> > == Affiliations ==
>> >
>> >  * DataBricks: Reynold Xin, Xiangrui Meng, Joseph Bradley, Patrick
>> Wendell
>> >  * Alpine: Holden Karau
>> >  * Netflix: DB Tsai
>> >  * IBM: Luciano Resende, Berthold Reinwald, Matthias Boehm, Shirish
>> > Tatikonda, Niketan Pansare, Prithviraj Sen, Alexandre V Evfimievski,
>>Fred
>> > Reiss, Deron Eriksson, Arvind Surve and Mike Dusenberry.
>> >
>> > == Sponsors ==
>> >
>> > === Champion ===
>> >  * Luciano Resende
>> >
>> > === Nominated Mentors ===
>> >  * Luciano Resende
>> >  * Reynold Xin
>> >  * Patrick Wendell
>> >  * Rich Bowen
>> >
>> > === Sponsoring Entity ===
>> > We would like to propose the Apache Incubator to sponsor this project.
>> >
>> >
>> > --
>> > Luciano Resende
>> > http://people.apache.org/~lresende
>> > http://twitter.com/lresende1975
>> > http://lresende.blogspot.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>
>
>
>-- 
>Luciano Resende
>http://people.apache.org/~lresende
>http://twitter.com/lresende1975
>http://lresende.blogspot.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message