incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <>
Subject Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator
Date Thu, 05 Feb 2015 06:57:03 GMT
Several comments:
-) How many users already using this project? I would reccomend to
drop request for singa-user list at the beginning.
-) All the initial committers come from university and seemed like
some of them already ready to leave university. I am not too sure if
this project go survive if all of the inital committers are from
university as students.
-) Need to solicit more mentors if this project ever get to Apache incubator.

- Henry

On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <> wrote:
> The "Relationship with Other Apache Products" section has been
> updated. The reference to H2O in that section has been removed, and
> other projects have been added.
>  Thanks for the feedback!
> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <> wrote:
>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
>> apache project, I should have verified that.
>> I will edit that, and revisit that section along with the folks in
>> Singa community.
>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra <> wrote:
>>> Quick immediate comment that "Apache H2O" is not really Apache project.
>>> I assume you are referring to (or
>>> ?
>>> - Henry
>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <> wrote:
>>>> Hello everyone,
>>>> I would like to propose the inclusion of Singa as an Apache Incubator project.
>>>> Here is the proposal -
>>>> Please review the proposal and give feedback. I am planning to start a
>>>> vote after 7 days if the proposal looks good.
>>>> We are also seeking additional Apache mentors for the project.
>>>> Thanks,
>>>> Thejas
>>>> ==========================================================
>>>> Singa Incubator Proposal
>>>> Abstract
>>>> SINGA is a distributed deep learning platform.
>>>> Proposal
>>>> SINGA is an efficient, scalable and easy-to-use distributed platform
>>>> for training deep learning models, e.g., Deep Convolutional Neural
>>>> Network and Deep Belief Network. It parallelizes the computation
>>>> (i.e., training) onto a cluster of nodes by distributing the training
>>>> data and model automatically to speed up the training. Built-in
>>>> training algorithms like Back-Propagation and Contrastive Divergence
>>>> are implemented based on common abstractions of deep learning models.
>>>> Users can train their own deep learning models by simply customizing
>>>> these abstractions like implementing the Mapper and Reducer in Hadoop.
>>>> Background
>>>> Deep learning refers to a set of feature (or representation) learning
>>>> models that consist of multiple (non-linear) layers, where different
>>>> layers learn different levels of abstractions (representations) of the
>>>> raw input data. Larger (in terms of model parameters) and deeper (in
>>>> terms of number of layers) models have shown better performance, e.g.,
>>>> lower image classification error in Large Scale Visual Recognition
>>>> Challenge. However, a larger model requires more memory and larger
>>>> training data to reduce over-fitting. Complex numeric operations make
>>>> the training computation intensive. In practice, training large deep
>>>> learning models takes weeks or months on a single node (even with
>>>> GPU).
>>>> Rational
>>>> Deep learning has gained a lot of attraction in both academia and
>>>> industry due to its success in a wide range of areas such as computer
>>>> vision and speech recognition. However, training of such models is
>>>> computationally expensive, especially for large and deep models (e.g.,
>>>> with billions of parameters and more than 10 layers). Both Google and
>>>> Microsoft have developed distributed deep learning systems to make the
>>>> training more efficient by distributing the computations within a
>>>> cluster of nodes. However, these systems are closed source softwares.
>>>> Our goal is to leverage the community of open source developers to
>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
>>>> fledged distributed platform, that could benefit the community and
>>>> also benefit from the community in their involvement in contributing
>>>> to the further work in this area. We believe the nature of SINGA and
>>>> our visions for the system fit naturally to Apache's philosophy and
>>>> development framework.
>>>> Initial Goals
>>>> We have developed a system for SINGA running on a commodity computer
>>>> cluster. The initial goals include, * improving the system in terms of
>>>> scalability and efficiency, e.g., using Infiniband for network
>>>> communication and multi-threading for one node computation. We would
>>>> consider extending SINGA to GPU clusters later. * benchmarking with
>>>> larger datasets (hundreds of millions of training instances) and
>>>> models (billions of parameters). * adding more built-in deep learning
>>>> models. Users can train the built-in models on their datasets
>>>> directly.
>>>> Current Status
>>>> Meritocracy
>>>> We would like to follow ASF meritocratic principles to encourage more
>>>> developers to contribute in this project. We know that only active and
>>>> excellent developers can make SINGA a successful project. The
>>>> committer list and PMC will be updated based on developers'
>>>> performance and commitment. We are also improving the documentation
>>>> and code to help new developers get started quickly.
>>>> Community
>>>> SINGA is currently being developed in the Database System Research Lab
>>>> at the National University of Singapore (NUS) in collaboration with
>>>> Zhejiang University in China. Our lab has extensive experience in
>>>> building database related systems, including distributed systems. Six
>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian
>>>> Lee Tan) have been working for a year on this project. We are open to
>>>> recruiting more developers from diverse backgrounds.
>>>> Core Developers
>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
>>>> worked on distributed systems for more than 20 years. They have
>>>> collaborated with the industry and have built various large scale
>>>> systems. Anh Dinh's research is also on distributed systems, albeit
>>>> with more focus on security aspects. Wei Wang's research is on deep
>>>> learning problems including deep learning applications and large scale
>>>> training. Sheng Wang and Jinyang are working on efficient indexing,
>>>> querying of large scale data and machine learning. Kaiping, Zhaojing
>>>> and Zhongle are new PhD students who jointed SINGA recently. They will
>>>> work on this project for a longer time (next 4-5 years). While we
>>>> share common research interests, each member also brings diverse
>>>> expertise to the team.
>>>> Alignment
>>>> ASF is already the home of many distributed platforms, e.g., Hadoop,
>>>> Spark and Mahout, each of which targets a different application
>>>> domain. SINGA, being a distributed platform for large-scale deep
>>>> learning, focuses on another important domain for which there still
>>>> lacks a robust and scalable open-source platform. The recent success
>>>> of deep learning models especially for vision and speech recognition
>>>> tasks has generated interests in both applying existing deep learning
>>>> models and in developing new ones. Thus, an open-source platform for
>>>> deep learning will be able to attract a large community of users and
>>>> developers. SINGA is a complex system needing many iterations of
>>>> design, implementation and testing. Apache's collaboration framework
>>>> which encourages active contribution from developers will inevitably
>>>> help improve the quality of the system, as shown in the success of
>>>> Hadoop, Spark, etc.. Equally important is the community of users which
>>>> helps identify real-life applications of deep learning, and helps to
>>>> evaluate the system's performance and ease-of-use. We hope to leverage
>>>> ASF for coordinating and promoting both communities, and in return
>>>> benefit the communities with another useful tool.
>>>> Known Risks
>>>> Orphaned products
>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave
>>>> the lab in two to four years time. It is possible that some of them
>>>> may not have enough time to focus on this project after that. But,
>>>> SINGA is part of our other bigger research projects on building an
>>>> infrastructure for data intensive applications, which include
>>>> health-care analytics and brain-inspired computing. Beng Chin and Kian
>>>> Lee would continue working on it and getting more people involved. For
>>>> example, three new developers (Kaiping, Zhaojing and Zhongle) joined
>>>> us recently. Individual developers are welcome to make SINGA a diverse
>>>> community that is robust and independent from any single developer.
>>>> Inexperience with Open Source
>>>> All the developers are active users and followers of open source
>>>> projects. Our research lab has a strong commitment to open source, and
>>>> has released the source code of several systems under open source
>>>> license as a way of contributing back to the open source community.
>>>> But we do not have much real experience in open source projects with
>>>> large and well organized communities like those in Apache. This is one
>>>> reason we choose Apache which is experienced in open source project
>>>> incubation. We hope to get the help from Apache (e.g., champion and
>>>> mentors) to establish a healthy path for SINGA.
>>>> Homogenous Developers
>>>> Although the current developers are researchers in the universities,
>>>> they have different research interests and project experiences, as
>>>> mentioned in the section that introduces the core developers. We know
>>>> that a diverse community is helpful. Hence we are open to the idea of
>>>> recruiting developers from other regions and organizations.
>>>> Reliance on Salaried Developers
>>>> As a research project in the university, SINGA's current developing
>>>> community consists of professors, PhD students, research assistants
>>>> and postdoctoral fellows. They are driven by their interests to work
>>>> on this project and have contributed actively since the start of the
>>>> project. The research assistants and fellows are expected to leave
>>>> when their contracts expire. However, they are keen to continue to
>>>> work on the project voluntarily. Moreover, as a long term research
>>>> project, new research assistants and fellows are likely to join the
>>>> project.
>>>> A Excessive Fascination with the Apache Brand
>>>> We choose Apache not for publicity. We have two purposes. First, we
>>>> want to leverage Apache's reputation to recruit more developers to
>>>> make a diverse community. Second, we hope that Apache can help us to
>>>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
>>>> are established database and distributed system researchers, and
>>>> together with the other contributors, they sincerely believe that
>>>> there is a need for a widely accepted open source distributed deep
>>>> learning platform. The field of deep learning is still at its infancy,
>>>> and an open source platform will fuel the research in the area.
>>>> Moreover, such a platform will enable researchers to develop new
>>>> models and algorithms, rather than spending time implementing a deep
>>>> learning system from scratch. Furthermore, the need for scalability
>>>> for such a platform is obvious.
>>>> Relationship with Other Apache Products
>>>> Apache H2O implemented two simple deep learning models, namely the
>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
>>>> significant differences between H2O and SINGA. First, H2O adopts the
>>>> Map-Reduce framework which runs a set of computing nodes in parallel
>>>> againsts of the training set. Model parameters trained by all
>>>> computing nodes are averaged as the final model parameters. This
>>>> training algorithm is different from the distributed training
>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
>>>> synchronizes the parameters trained from different nodes. SINGA adopts
>>>> the parameter server framework to support a wide range of distributed
>>>> training algorithms and parallelization methods (e.g., data
>>>> parallelism, model parallelism and hybrid parallelism. H2O only
>>>> support data parallelism) . Second, in H2O, users are restricted to
>>>> use the two built-in models. In SINGA, we provide simple programming
>>>> model to let users implement their own deep learning models. A new
>>>> deep learning model can be implemented by customizing the base Layer
>>>> class for each layer involved in the model. It is similar to writing
>>>> Hadoop programs where users only need to override the base Mapper and
>>>> Reducer. We also provide built-in models for users to use directly.
>>>> Documentation
>>>> The project is hosted at
>>>> Documentations can be found at the Github Wiki Page:
>>>> We continue to refine and
>>>> improve the documentation.
>>>> Initial Source
>>>> We use Github to maintain our source code,
>>>> Source and Intellectual Property Submission Plan
>>>> We plan to make our code base be under Apache License, Version 2.0.
>>>> External Dependencies
>>>> required by the core code base: glog, gflags, google protobuf,
>>>> open-blas, mpich, armci-mpi.
>>>> required by data preparation and preprocessing: opencv, hdfs, python.
>>>> Cryptography
>>>> Not Applicable
>>>> Required Resources
>>>> Mailing Lists
>>>> Currently, we use google group for internal discussion. The mailing
>>>> address is We will migrate the content to the
>>>> apache mailing lists in the future.
>>>> singa-dev
>>>> singa-user
>>>> singa-commits
>>>> singa-private (for private discussion within PCM)
>>>> Git Repository
>>>> We want to continue using git for version control. Hence, a git repo
>>>> is required.
>>>> Issue Tracking
>>>> JIRA Singa (SINGA)
>>>> Initial Committers
>>>> Beng Chin Ooi (ooibc
>>>> Kian Lee Tan (tankl
>>>> Gang Chen (cg
>>>> Wei Wang (wangwei
>>>> Dinh Tien Tuan Anh (dinhtta
>>>> Jinyang Gao (jinyang.gao
>>>> Sheng Wang (wangsh
>>>> Kaiping Zheng (kaiping
>>>> Zhaojing Luo (zhaojing
>>>> Zhongle Xie (zhongle
>>>> Affiliations
>>>> Beng Chin Ooi, National University of Singapore
>>>> Kian Lee Tan, National University of Singapore
>>>> Gang Chen, Zhejiang University
>>>> Wei Wang, National University of Singapore
>>>> Dinh Tien Tuan Anh, National University of Singapore
>>>> Jinyang Gao, National University of Singapore
>>>> Sheng Wang, National University of Singapore
>>>> Kaiping Zheng, National University of Singapore
>>>> Zhaojing Luo, National University of Singapore
>>>> Zhongle Xie, National University of Singapore
>>>> Sponsors
>>>> Champion
>>>> Thejas Nair (thejas at - Hortonworks
>>>> Nominated Mentors
>>>> Thejas Nair (thejas at - Hortonworks
>>>> Alan Gates (gates at apache dot org) - Hortonworks
>>>> (Seeking more volunteers!)
>>>> Sponsoring Entity
>>>> We are requesting the Incubator to sponsor this project.
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:
>>>> For additional commands, e-mail:
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message