incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thejas Nair <thejas.n...@gmail.com>
Subject Re: [VOTE] Accept Apache Singa as incubator project
Date Tue, 10 Mar 2015 18:49:08 GMT
Thanks for raising this issue. I agree that committer diversity is
important for long term success of a project. I think that should be a
criteria for graduation from incubator.
I think it is going to be more easier to find new contributors as an Apache
incubator project.


On Tue, Mar 10, 2015 at 9:09 AM, jan i <jani@apache.org> wrote:

>
> +0 I am really concerned about the diversity of the initial committers,
> what happens if the university pulls the plug. I know we all say it will
> never happen, but it could happen.
>
> rgds
> jan i.
>
>
> On 10 March 2015 at 16:20, Alan Gates <alanfgates@gmail.com> wrote:
>
>> +1
>>
>> Alan.
>>
>>   Thejas Nair <thejas.nair@gmail.com>
>>  March 10, 2015 at 7:33
>> The Singa Incubator Proposal document has been updated based on
>> feedback in the proposal thread.
>>
>> This vote is proposing the inclusion of Apache Singa as incubator project.
>> The vote will run for at least 72 hours.
>>
>> [ ] +1 Accept Apache Singa into the Incubator
>> [ ] +0 Don’t care.
>> [ ] -1 Don’t accept Apache Singa into the Incubator because..
>>
>> Please vote !
>>
>> Here is my +1 .
>>
>> Link to version of proposal being voted on :
>> https://wiki.apache.org/incubator/SingaProposal?action=recall&rev=10
>>
>> The text is below
>> ----------------------------------------------
>>
>> = Singa Incubator Proposal =
>> == Abstract ==
>> SINGA is a distributed deep learning platform.
>>
>> == Proposal ==
>> SINGA is an efficient, scalable and easy-to-use distributed platform
>> for training deep learning models, e.g., Deep Convolutional Neural
>> Network and
>> Deep Belief Network. It parallelizes the computation (i.e., training)
>> onto a
>> cluster of nodes by distributing the training data and model
>> automatically to
>> speed up the training. Built-in training algorithms like Back-Propagation
>> and
>> Contrastive Divergence are implemented based on common abstractions of
>> deep
>> learning models. Users can train their own deep learning models by simply
>> customizing these abstractions like implementing the Mapper and
>> Reducer in Hadoop.
>>
>> == Background ==
>> Deep learning refers to a set of feature (or representation) learning
>> models
>> that consist of multiple (non-linear) layers, where different layers learn
>> different levels of abstractions (representations) of the raw input data.
>> Larger (in terms of model parameters) and deeper (in terms of number of
>> layers)
>> models have shown better performance, e.g., lower image classification
>> error in
>> Large Scale Visual Recognition Challenge. However, a larger model
>> requires more
>> memory and larger training data to reduce over-fitting. Complex
>> numeric operations
>> make the training computation intensive. In practice, training large
>> deep learning
>> models takes weeks or months on a single node (even with GPU).
>>
>> == Rational ==
>> Deep learning has gained a lot of attraction in both academia and
>> industry due to
>> its success in a wide range of areas such as computer vision and
>> speech recognition.
>> However, training of such models is computationally expensive,
>> especially for large
>> and deep models (e.g., with billions of parameters and more than 10
>> layers). Both
>> Google and Microsoft have developed distributed deep learning systems
>> to make the
>> training more efficient by distributing the computations within a
>> cluster of nodes.
>> However, these systems are closed source softwares. Our goal is to
>> leverage the
>> community of open source developers to make SINGA efficient, scalable
>> and easy to
>> use. SINGA is a full fledged distributed platform, that could benefit the
>> community and also benefit from the community in their involvement in
>> contributing
>> to the further work in this area. We believe the nature of SINGA and our
>> visions
>> for the system fit naturally to Apache's philosophy and development
>> framework.
>>
>> == Initial Goals ==
>> We have developed a system for SINGA running on a commodity computer
>> cluster. The initial goals include,
>> * improving the system in terms of scalability and efficiency, e.g.,
>> using Infiniband for network communication and multi-threading for one
>> node computation. We would consider extending SINGA to GPU clusters
>> later.
>> * benchmarking with larger datasets (hundreds of millions of training
>> instances) and models (billions of parameters).
>> * adding more built-in deep learning models. Users can train the
>> built-in models on their datasets directly.
>>
>>
>> == Current Status ==
>> === Meritocracy ===
>> We would like to follow ASF meritocratic principles to encourage more
>> developers
>> to contribute in this project. We know that only active and excellent
>> developers
>> can make SINGA a successful project. The committer list and PMC will be
>> updated
>> based on developers' performance and commitment. We are also improving the
>> documentation and code to help new developers get started quickly.
>>
>> === Community ===
>> SINGA is currently being developed in the Database System Research Lab at
>> the
>> National University of Singapore (NUS) in collaboration with Zhejiang
>> University in China.
>> Our lab has extensive experience in building database related systems,
>> including
>> distributed systems. Six PhD students and research assistants (Jinyang
>> Gao,
>> Kaiping Zheng, Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a
>> research
>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian
>> Lee Tan)
>> have been working for a year on this project. We are open to recruiting
>> more
>> developers from diverse backgrounds.
>>
>> === Core Developers ===
>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have worked
>> on
>> distributed systems for more than 20 years. They have collaborated with
>> the
>> industry and have built various large scale systems. Anh Dinh's research
>> is also
>> on distributed systems, albeit with more focus on security aspects. Wei
>> Wang's
>> research is on deep learning problems including deep learning
>> applications and
>> large scale training. Sheng Wang and Jinyang are working on efficient
>> indexing,
>> querying of large scale data and machine learning. Kaiping, Zhaojing and
>> Zhongle
>> are new PhD students who jointed SINGA recently. They will work on this
>> project
>> for a longer time (next 4-5 years). While we share common research
>> interests,
>> each member also brings diverse expertise to the team.
>>
>> === Alignment ===
>> ASF is already the home of many distributed platforms, e.g., Hadoop,
>> Spark and
>> Mahout, each of which targets a different application domain. SINGA,
>> being a
>> distributed platform for large-scale deep learning, focuses on another
>> important
>> domain for which there still lacks a robust and scalable open-source
>> platform.
>> The recent success of deep learning models especially for vision and
>> speech
>> recognition tasks has generated interests in both applying existing
>> deep learning
>> models and in developing new ones. Thus, an open-source platform for deep
>> learning will be able to attract a large community of users and
>> developers.
>> SINGA is a complex system needing many iterations of design,
>> implementation and
>> testing. Apache's collaboration framework which encourages active
>> contribution
>> from developers will inevitably help improve the quality of the system,
>> as shown
>> in the success of Hadoop, Spark, etc.. Equally important is the community
>> of
>> users which helps identify real-life applications of deep learning, and
>> helps
>> to evaluate the system's performance and ease-of-use. We hope to
>> leverage ASF for
>> coordinating and promoting both communities, and in return benefit the
>> communities
>> with another useful tool.
>>
>> == Known Risks ==
>> === Orphaned products ===
>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave the
>> lab in two to four years time. It is possible that some of them may
>> not have enough
>> time to focus on this project after that. But, SINGA is part of our other
>> bigger
>> research projects on building an infrastructure for data intensive
>> applications,
>> which include health-care analytics and brain-inspired computing. Beng
>> Chin and
>> Kian Lee would continue working on it and getting more people
>> involved. For example,
>> three new developers (Kaiping, Zhaojing and Zhongle) joined us recently.
>> Individual developers are welcome to make SINGA a diverse community
>> that is robust and independent from any single developer.
>>
>> === Inexperience with Open Source ===
>> All the developers are active users and followers of open source
>> projects. Our
>> research lab has a strong commitment to open source, and has released the
>> source
>> code of several systems under open source license as a way of
>> contributing back
>> to the open source community. But we do not have much real experience
>> in open source
>> projects with large and well organized communities like those in Apache.
>> This is
>> one reason we choose Apache which is experienced in open source
>> project incubation.
>> We hope to get the help from Apache (e.g., champion and mentors) to
>> establish a
>> healthy path for SINGA.
>>
>> === Homogenous Developers ===
>> Although the current developers are researchers in the universities, they
>> have
>> different research interests and project experiences, as mentioned in
>> the section
>> that introduces the core developers. We know that a diverse community
>> is helpful.
>> Hence we are open to the idea of recruiting developers from other
>> regions and organizations.
>>
>> === Reliance on Salaried Developers ===
>> As a research project in the university, SINGA's current developing
>> community
>> consists of professors, PhD students, research assistants and
>> postdoctoral fellows.
>> They are driven by their interests to work on this project and have
>> contributed
>> actively since the start of the project. The research assistants and
>> fellows are
>> expected to leave when their contracts expire. However, they are keen
>> to continue
>> to work on the project voluntarily. Moreover, as a long term research
>> project, new
>> research assistants and fellows are likely to join the project.
>>
>> === A Excessive Fascination with the Apache Brand ===
>> We choose Apache not for publicity. We have two purposes. First, we want
>> to
>> leverage Apache's reputation to recruit more developers to make a diverse
>> community. Second, we hope that Apache can help us to establish a healthy
>> path
>> in developing SINGA. Beng Chin and Kian-Lee are established database and
>> distributed system researchers, and together with the other contributors,
>> they
>> sincerely believe that there is a need for a widely accepted open source
>> distributed deep learning platform. The field of deep learning is still
>> at its
>> infancy, and an open source platform will fuel the research in the
>> area. Moreover,
>> such a platform will enable researchers to develop new models and
>> algorithms,
>> rather than spending time implementing a deep learning system from
>> scratch.
>> Furthermore, the need for scalability for such a platform is obvious.
>>
>> === Relationship with Other Apache Products ===
>> Apache Mahout and Apache Spark's ML-LIB are general machine learning
>> systems. Deep
>> learning algorithm can thus be implemented on these two platforms as
>> well. However, the there are differences in training efficiency,
>> scalability and
>> usability. Mahout and Spark ML-LIB follow models where their
>> nodes run synchronously. This is the fundamental difference to Singa who
>> follows the parameter server framework (like Google Brain and Microsoft
>> Adam). Singa can run synchronously or asynchronously. The asynchronous
>> mode
>> is superior than the synchronous mode in terms of scalability. In
>> addition, Singa has some optimizations towards deep learning models
>> (e.g., model
>> parallelism, data parallelism and hybrid-parallelism) which make Singa
>> more efficient. We also provide ease of use programming model for deep
>> learning algorithms.
>>
>> There are also plans for integration with Apache Hadoop's HDFS as
>> storage, to handle large training data.
>> Specifically, we store the training data (e.g., images or raw features of
>> images) in HDFS, then (pre-)fetch them online.
>> We will also explore integration with Hadoop's Yarn and Apache Mesos
>> to do resource management.
>>
>>
>> == Documentation ==
>> The project is hosted at
>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
>> Documentations can be found at the Github Wiki Page:
>> https://github.com/nusinga/singa/wiki.
>> We continue to refine and improve the documentation.
>>
>> == Initial Source ==
>> We use Github to maintain our source code,
>> https://github.com/nusinga/singa
>>
>> == Source and Intellectual Property Submission Plan ==
>> We plan to make our code base be under Apache License, Version 2.0.
>>
>> == External Dependencies ==
>> * required by the core code base: glog, gflags, google protobuf,
>> open-blas, mpich, armci-mpi.
>> * required by data preparation and preprocessing: opencv, hdfs, python.
>>
>> == Cryptography ==
>> Not Applicable
>>
>> == Required Resources ==
>> === Mailing Lists ===
>> Currently, we use google group for internal discussion. The mailing
>> address is
>> nusinga@googlegroup.com. We will migrate the content to the apache
>> mailing
>> lists in the future.
>>
>> * singa-dev
>> * singa-user
>> * singa-commits
>> * singa-private (for private discussion within PCM)
>>
>> === Git Repository ===
>> We want to continue using git for version control. Hence, a git repo
>> is required.
>>
>> === Issue Tracking ===
>> JIRA Singa (SINGA)
>>
>> == Initial Committers ==
>> * Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>> * Kian Lee Tan (tankl @comp.nus.edu.sg)
>> * Gang Chen (cg @zju.edu.cn)
>> * Wei Wang (wangwei @comp.nus.edu.sg)
>> * Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>> * Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>> * Sheng Wang (wangsh @comp.nus.edu.sg)
>> * Kaiping Zheng (kaiping @comp.nus.edu.sg)
>> * Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>> * Zhongle Xie (zhongle @comp.nus.edu.sg)
>>
>> == Affiliations ==
>> * Beng Chin Ooi, National University of Singapore
>> * Kian Lee Tan, National University of Singapore
>> * Gang Chen, Zhejiang University
>> * Wei Wang, National University of Singapore
>> * Dinh Tien Tuan Anh, National University of Singapore
>> * Jinyang Gao, National University of Singapore
>> * Sheng Wang, National University of Singapore
>> * Kaiping Zheng, National University of Singapore
>> * Zhaojing Luo, National University of Singapore
>> * Zhongle Xie, National University of Singapore
>>
>> == Sponsors ==
>> === Champion ===
>> Thejas Nair (thejas at apache.org)
>>
>> === Nominated Mentors ===
>> * Thejas Nair (thejas at apache.org)
>> * Alan Gates (gates at apache dot org)
>> * Daniel Dai (daijy at apache dot org)
>> * Ted Dunning (tdunning at apache dot org)
>>
>> === Sponsoring Entity ===
>> We are requesting the Incubator to sponsor this project.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message