Hi Henri,
I am Larry Tang, working with Minjie Wang (@jermainewang) on imperative programming part of
MXNet. Please add me to the list of committers for MXNet project. I will work intensively
on merging a NumPy interface into MXNet as its imperative subsystem in the next few months.
My GitHub ID is: lryta
Affiliation: University of Michigan.
Best,
Larry
On 2017-01-06 00:12 (-0500), Henri Yandell <b...@apache.org> wrote:
> Hello Incubator,>
>
> I'd like to propose a new incubator Apache MXNet podling.>
>
> The existing MXNet project (http://mxnet.io - 1.5 years old, 15 committers,>
> 200 contributors) is very interested in joining Apache. MXNet is an>
> open-source deep learning framework that allows you to define, train, and>
> deploy deep neural networks on a wide array of devices, from cloud>
> infrastructure to mobile devices.>
>
> The wiki proposal page is located here:>
>
> https://wiki.apache.org/incubator/MXNetProposal>
>
> I've included the text below in case anyone wants to focus on parts of it>
> in a reply.>
>
> Looking forward to your thoughts, and for lots of interested Apache members>
> to volunteer to mentor the project in addition to Sebastian and myself.>
>
> Currently the list of committers is based on the current active coders, so>
> we're also very interested in hearing from anyone else who is interested in>
> working on the project, be they current or future contributor!>
>
> Thanks,>
>
> Hen>
> On behalf of the MXNet project>
>
> --------->
>
> = MXNet: Apache Incubator Proposal =>
>
> == Abstract ==>
>
> MXNet is a Flexible and Efficient Library for Deep Learning>
>
> == Proposal ==>
>
> MXNet is an open-source deep learning framework that allows you to define,>
> train, and deploy deep neural networks on a wide array of devices, from>
> cloud infrastructure to mobile devices. It is highly scalable, allowing for>
> fast model training, and supports a flexible programming model and multiple>
> languages. MXNet allows you to mix symbolic and imperative programming>
> flavors to maximize both efficiency and productivity. MXNet is built on a>
> dynamic dependency scheduler that automatically parallelizes both symbolic>
> and imperative operations on the fly. A graph optimization layer on top of>
> that makes symbolic execution fast and memory efficient. The MXNet library>
> is portable and lightweight, and it scales to multiple GPUs and multiple>
> machines.>
>
> == Background ==>
>
> Deep learning is a subset of Machine learning and refers to a class of>
> algorithms that use a hierarchical approach with non-linearities to>
> discover and learn representations within data. Deep Learning has recently>
> become very popular due to its applicability and advancement of domains>
> such as Computer Vision, Speech Recognition, Natural Language Understanding>
> and Recommender Systems. With pervasive and cost effective cloud computing,>
> large labeled datasets and continued algorithmic innovation, Deep Learning>
> has become the one of the most popular classes of algorithms for machine>
> learning practitioners in recent years.>
>
> == Rational ==>
>
> The adoption of deep learning is quickly expanding from initial deep domain>
> experts rooted in academia to data scientists and developers working to>
> deploy intelligent services and products. Deep learning however has many>
> challenges. These include model training time (which can take days to>
> weeks), programmability (not everyone writes Python or C++ and like>
> symbolic programming) and balancing production readiness (support for>
> things like failover) with development flexibility (ability to program>
> different ways, support for new operators and model types) and speed of>
> execution (fast and scalable model training). Other frameworks excel on>
> some but not all of these aspects.>
>
>
> == Initial Goals ==>
>
> MXNet is a fairly established project on GitHub with its first code>
> contribution in April 2015 and roughly 200 contributors. It is used by>
> several large companies and some of the top research institutions on the>
> planet. Initial goals would be the following:>
>
> 1. Move the existing codebase(s) to Apache>
> 1. Integrate with the Apache development process/sign CLAs>
> 1. Ensure all dependencies are compliant with Apache License version 2.0>
> 1. Incremental development and releases per Apache guidelines>
> 1. Establish engineering discipline and a predictable release cadence of>
> high quality releases>
> 1. Expand the community beyond the current base of expert level users>
> 1. Improve usability and the overall developer/user experience>
> 1. Add additional functionality to address newer problem types and>
> algorithms>
>
>
> == Current Status ==>
>
> === Meritocracy ===>
>
> The MXNet project already operates on meritocratic principles. Today, MXNet>
> has developers worldwide and has accepted multiple major patches from a>
> diverse set of contributors within both industry and academia. We would>
> like to follow ASF meritocratic principles to encourage more developers to>
> contribute in this project. We know that only active and committed>
> developers from a diverse set of backgrounds can make MXNet a successful>
> project. We are also improving the documentation and code to help new>
> developers get started quickly.>
>
> === Community ===>
>
> Acceptance into the Apache foundation would bolster the growing user and>
> developer community around MXNet. That community includes around 200>
> contributors from academia and industry. The core developers of our project>
> are listed in our contributors below and are also represented by logos on>
> the mxnet.io site including Amazon, Baidu, Carnegie Mellon University,>
> Turi, Intel, NYU, Nvidia, MIT, Microsoft, TuSimple, University of Alberta,>
> University of Washington and Wolfram.>
>
> === Core Developers ===>
>
> (with GitHub logins)>
>
> * Tianqi Chen (@tqchen)>
> * Mu Li (@mli)>
> * Junyuan Xie (@piiswrong)>
> * Bing Xu (@antinucleon)>
> * Chiyuan Zhang (@pluskid)>
> * Minjie Wang (@jermainewang)>
> * Naiyan Wang (@winstywang)>
> * Yizhi Liu (@javelinjs)>
> * Tong He (@hetong007)>
> * Qiang Kou (@thirdwing)>
> * Xingjian Shi (@sxjscience)>
>
> === Alignment ===>
>
> ASF is already the home of many distributed platforms, e.g., Hadoop, Spark>
> and Mahout, each of which targets a different application domain. MXNet,>
> being a distributed platform for large-scale deep learning, focuses on>
> another important domain for which there still lacks a scalable,>
> programmable, flexible and super fast open-source platform. The recent>
> success of deep learning models especially for vision and speech>
> recognition tasks has generated interests in both applying existing deep>
> learning models and in developing new ones. Thus, an open-source platform>
> for deep learning backed by some of the top industry and academic players>
> will be able to attract a large community of users and developers. MXNet is>
> a complex system needing many iterations of design, implementation and>
> testing. Apache's collaboration framework which encourages active>
> contribution from developers will inevitably help improve the quality of>
> the system, as shown in the success of Hadoop, Spark, etc. Equally>
> important is the community of users which helps identify real-life>
> applications of deep learning, and helps to evaluate the system's>
> performance and ease-of-use. We hope to leverage ASF for coordinating and>
> promoting both communities, and in return benefit the communities with>
> another useful tool.>
>
> == Known Risks ==>
>
> === Orphaned products ===>
>
> Given the current level of investment in MXNet and the stakeholders using>
> it - the risk of the project being abandoned is minimal. Amazon, for>
> example, is in active development to use MXNet in many of its services and>
> many large corporations use it in their production applications.>
>
> === Inexperience with Open Source ===>
>
> MXNet has existed as a healthy open source project for more than a year.>
> During that time, the project has attracted 200+ contributors.>
>
> === Homogenous Developers ===>
>
> The initial list of committers and contributors includes developers from>
> several institutions and industry participants (see above).>
>
> === Reliance on Salaried Developers ===>
>
> Like most open source projects, MXNet receives a substantial support from>
> salaried developers. A large fraction of MXNet development is supported by>
> graduate students at various universities in the course of research degrees>
> - this is more a %u201Cvolunteer%u201D relationship, since in most cases students>
> contribute vastly more than is necessary to immediately support research.>
> In addition, those working from within corporations are devoting>
> significant time and effort in the project - and these come from several>
> organizations.>
>
> === A Excessive Fascination with the Apache Brand ===>
>
> We choose Apache not for publicity. We have two purposes. First, we hope>
> that Apache's known best-practices for managing a mature open source>
> project can help guide us. For example, we are feeling the growing pains>
> of a successful open source project as we attempt a major refactor of the>
> internals while customers are using the system in production. We seek>
> guidance in communicating breaking API changes and version revisions.>
> Also, as our involvement from major corporations increases, we want to>
> assure our users that MXNet will stay open and not favor any particular>
> platform or environment. These are some examples of the know-how and>
> discipline we're hoping Apache can bring to our project.>
>
> Second, we want to leverage Apache's reputation to recruit more developers>
> to create a diverse community.>
>
> === Relationship with Other Apache Products ===>
>
> Apache Mahout and Apache Spark's MLlib are general machine learning>
> systems. Deep learning algorithms can thus be implemented on these two>
> platforms as well. However, in practice, the overlap will be minimal. Deep>
> learning is so computationally intensive that it often requires specialized>
> GPU hardware to accomplish tasks of meaningful size. Making efficient use>
> of GPU hardware is complex because the hardware is so fast that the>
> supporting systems around it must be carefully optimized to keep the GPU>
> cores busy. Extending this capability to distributed multi-GPU and>
> multi-host environments requires great care. This is a critical>
> differentiator between MXNet and existing Apache machine learning systems.>
>
> Mahout and Spark ML-LIB follow models where their nodes run synchronously.>
> This is the fundamental difference to MXNet who follows the parameter>
> server framework. MXNet can run synchronously or asynchronously. In>
> addition, MXNet has optimizations for training a wide range of deep>
> learning models using a variety of approaches (e.g., model parallelism and>
> data parallelism) which makes MXNet much more efficient (near-linear>
> speedup on state of the art models). MXNet also supports both imperative>
> and symbolic approaches providing ease of programming for deep learning>
> algorithms.>
>
> Other Apache projects that are potentially complimentary:>
>
> Apache Arrow - read data in Apache Arrow%u2018s internal format from MXNet, that>
> would allow users to run ETL/preprocessing in Spark, save the results in>
> Arrow%u2019s format and then run DL algorithms on it.>
>
> Apache Singa - MXNet and Singa are both deep learning projects, and can>
> benefit from a larger deep learning community at Apache.>
>
> == Documentation ==>
>
> Documentation has recently migrated to http://mxnet.io. We continue to>
> refine and improve the documentation.>
>
> == Initial Source ==>
>
> We currently use Github to maintain our source code,>
> https://github.com/MXNet>
>
> == Source and Intellectual Property Submission Plan ==>
>
> MXNet Code is available under Apache License, Version 2.0. We will work>
> with the committers to get CLAs signed and review previous contributions.>
>
> == External Dependencies ==>
>
> * required by the core code base: GCC or CLOM, Clang, any BLAS library>
> (ATLAS, OpenBLAS, MKL), dmlc-core, mshadow, ps-lite (which requires>
> lib-zeromq), TBB>
> * required for GPU usage: cudnn, cuda>
> * required for python usage: Python 2/3>
> * required for R module: R, Rcpp (GPLv2 licensing)>
> * optional for image preparation and preprocessing: opencv>
> * optional dependencies for additional features: torch7, numba, cython (in>
> NNVM branch)>
>
> Rcpt and lib-zeromq are expected to be licensing discussions.>
>
> == Cryptography ==>
>
> Not Applicable>
>
> == Required Resources ==>
>
> === Mailing Lists ===>
>
> There is currently no mailing list.>
>
> === Issue Tracking ===>
>
> Currently uses GitHub to track issues. Would like to continue to do so.>
>
> == Committers and Affiliations ==>
>
> * Tianqi Chen (UW)>
> * Mu Li (AWS)>
> * Junyuan Xie (AWS)>
> * Bing Xu (Apple)>
> * Chiyuan Zhang (MIT)>
> * Minjie Wang (UYU)>
> * Naiyan Wang (Tusimple)>
> * Yizhi Liu (Mediav)>
> * Tong He (Simon Fraser University)>
> * Qiang Kou (Indiana U)>
> * Xingjian Shi (HKUST)>
>
> == Sponsors ==>
>
> === Champion ===>
>
> Henri Yandell (bayard at apache.org)>
>
> === Nominated Mentors ===>
>
> Sebastian Schelter (ssc@apache.org)>
>
>
> === Sponsoring Entity ===>
>
> We are requesting the Incubator to sponsor this project.>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org
|