incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <henry.sapu...@gmail.com>
Subject Re: [Fwd: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator]
Date Fri, 27 Feb 2015 05:33:08 GMT
I am strongly suggest you solicit more (diverse) mentors before start the VOTE.

All initial committers are from same org and all initial mentors are
from same company (HW).

I am not sure this is a good start for Apache podling.


- Henry

On Thu, Feb 26, 2015 at 9:12 AM, Thejas Nair <thejas.nair@gmail.com> wrote:
> The incubator proposal has been updated with the feedback so far.
> We have 3 mentors now, but I think it would be good to have additional
> mentors. Please let me know if anyone is able to help mentor this
> project.
>
> I am planning to start a vote on the proposal in a day or two.
>
>
> On Fri, Feb 6, 2015 at 5:21 PM,  <ooibc@comp.nus.edu.sg> wrote:
>>
>> Regarding the number of users using this project -- at this moment, the
>> community is not big.  A few local start-ups have been trying to use it
>> (mainly due to announcement in our seminar list), eg. one is using it for
>> image recognition (given a phone snapped by a user, it wants to be return
>> the same the product, and a list of similar products, such as a luxury bag
>> on a passerby).  Researchers from outside of NUS may have been using it
>> since we published an application paper on cross domain/modal retrieval in
>> VLDB 2014.
>>
>> We have not announced the project to the outside community yet -- we would
>> announce it in dbworld etc in due course.
>>
>> Thanks and have a good weekend.
>>
>> regards
>> beng chin
>>
>>>
>>> Thanks for the comments and suggestions.
>>> With permission from Thejas, I would like to respond to point 2.
>>>
>>> We have a huge team down at NUS (National University of Singapore) --
>>> we have about seven database/data mining data professors (not including
>>> those in systems, networking, and machine learning).
>>> I myself have nine PhD students in a steady state, and I have a few large
>>> grants, with a total budget of about 15 million S$ (~12 million USD), that
>>> allows me to hire a number of research fellows and research assistants for
>>> the next few years.  In a constant state, I have about 20 people (PhD
>>> students/RA/RF) working with me alone.  Other professors have their own
>>> grants (unlike other countries, it is relatively easy to get large grants
>>> in Singapore; many overseas Universities, including UIUC, MIT, ETH etc
>>> have research labs funded by Singapore Research Foundation [equivalent of
>>> NSF]).
>>>
>>> SINGA is a long term project for us -- while it is a platform as it is, we
>>> are using it for healthcare predictive analytics (by working with a
>>> hospital associated with the University).  Therefore, we will be working
>>> on SINGA, not solely as a distributed DL platform, but as a tool that will
>>> enable us to do data analytics on some business domains (eg. healthcase,
>>> consumer etc)
>>>
>>> For the initial set of committers, three are tenured professors, five are
>>> students, with 2-5 years to go before they complete their PhD.  Quite
>>> often, some would stay back as a research fellow for a couple of years
>>> before they start looking for a job outside.  We will work with mentors
>>> and new developers (from outside of NUS or Zhejiang University) in
>>> enhancing the system.
>>>
>>> The project should survive in that sense.
>>>
>>> (I have an on-going project CIIDAA that has been around since 2008; it was
>>> started as another project, epiC,  with a different grant, and then we
>>> continue the development with a new grant for CIIDAA --
>>> http://www.comp.nus.edu.sg/~ciidaa/
>>> )
>>>
>>> Thanks.
>>>
>>> regards
>>> beng chin
>>> ps: i am not sure if my email will get through to the group.
>>>
>>>
>>> ---------------------------- Original Message ----------------------------
>>> Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator
>>> From:    "Henry Saputra" <henry.saputra@gmail.com>
>>> Date:    Thu, February 5, 2015 2:57 pm
>>> To:      "general@incubator.apache.org" <general@incubator.apache.org>
>>> Cc:      ooibc@comp.nus.edu.sg
>>> --------------------------------------------------------------------------
>>>
>>> Several comments:
>>> -) How many users already using this project? I would reccomend to
>>> drop request for singa-user list at the beginning.
>>> -) All the initial committers come from university and seemed like
>>> some of them already ready to leave university. I am not too sure if
>>> this project go survive if all of the inital committers are from
>>> university as students.
>>> -) Need to solicit more mentors if this project ever get to Apache
>>> incubator.
>>>
>>> - Henry
>>>
>>> On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <thejas.nair@gmail.com> wrote:
>>>> The "Relationship with Other Apache Products" section has been
>>>> updated. The reference to H2O in that section has been removed, and
>>>> other projects have been added.
>>>>  Thanks for the feedback!
>>>>
>>>>
>>>> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <thejas.nair@gmail.com>
>>> wrote:
>>>>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
>>>>> apache project, I should have verified that.
>>>>> I will edit that, and revisit that section along with the folks in
>>>>> Singa community.
>>>>>
>>>>>
>>>>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra
>>> <henry.saputra@gmail.com> wrote:
>>>>>> Quick immediate comment that "Apache H2O" is not really Apache
>>>>>> project.
>>>>>>
>>>>>> I assume you are referring to https://github.com/h2oai/h2o (or
>>>>>> https://github.com/h2oai/h2o-dev) ?
>>>>>>
>>>>>> - Henry
>>>>>>
>>>>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <thejas.nair@gmail.com>
>>> wrote:
>>>>>>> Hello everyone,
>>>>>>>
>>>>>>> I would like to propose the inclusion of Singa as an Apache Incubator
>>> project.
>>>>>>>
>>>>>>> Here is the proposal -
>>>>>>> https://wiki.apache.org/incubator/SingaProposal
>>>>>>>
>>>>>>> Please review the proposal and give feedback. I am planning to
start
>>>>>>> a
>>>>>>> vote after 7 days if the proposal looks good.
>>>>>>> We are also seeking additional Apache mentors for the project.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Thejas
>>>>>>> ==========================================================
>>>>>>> Singa Incubator Proposal
>>>>>>>
>>>>>>> Abstract
>>>>>>>
>>>>>>> SINGA is a distributed deep learning platform.
>>>>>>>
>>>>>>> Proposal
>>>>>>>
>>>>>>> SINGA is an efficient, scalable and easy-to-use distributed platform
>>>>>>> for training deep learning models, e.g., Deep Convolutional Neural
>>>>>>> Network and Deep Belief Network. It parallelizes the computation
>>>>>>> (i.e., training) onto a cluster of nodes by distributing the
training
>>>>>>> data and model automatically to speed up the training. Built-in
>>>>>>> training algorithms like Back-Propagation and Contrastive Divergence
>>>>>>> are implemented based on common abstractions of deep learning
models.
>>>>>>> Users can train their own deep learning models by simply customizing
>>>>>>> these abstractions like implementing the Mapper and Reducer in
>>>>>>> Hadoop.
>>>>>>>
>>>>>>> Background
>>>>>>>
>>>>>>> Deep learning refers to a set of feature (or representation)
learning
>>>>>>> models that consist of multiple (non-linear) layers, where different
>>>>>>> layers learn different levels of abstractions (representations)
of
>>>>>>> the
>>>>>>> raw input data. Larger (in terms of model parameters) and deeper
(in
>>>>>>> terms of number of layers) models have shown better performance,
>>>>>>> e.g.,
>>>>>>> lower image classification error in Large Scale Visual Recognition
>>>>>>> Challenge. However, a larger model requires more memory and larger
>>>>>>> training data to reduce over-fitting. Complex numeric operations
make
>>>>>>> the training computation intensive. In practice, training large
deep
>>>>>>> learning models takes weeks or months on a single node (even
with
>>>>>>> GPU).
>>>>>>>
>>>>>>> Rational
>>>>>>>
>>>>>>> Deep learning has gained a lot of attraction in both academia
and
>>>>>>> industry due to its success in a wide range of areas such as
computer
>>>>>>> vision and speech recognition. However, training of such models
is
>>>>>>> computationally expensive, especially for large and deep models
>>>>>>> (e.g.,
>>>>>>> with billions of parameters and more than 10 layers). Both Google
and
>>>>>>> Microsoft have developed distributed deep learning systems to
make
>>>>>>> the
>>>>>>> training more efficient by distributing the computations within
a
>>>>>>> cluster of nodes. However, these systems are closed source softwares.
>>>>>>> Our goal is to leverage the community of open source developers
to
>>>>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
>>>>>>> fledged distributed platform, that could benefit the community
and
>>>>>>> also benefit from the community in their involvement in contributing
>>>>>>> to the further work in this area. We believe the nature of SINGA
and
>>>>>>> our visions for the system fit naturally to Apache's philosophy
and
>>>>>>> development framework.
>>>>>>>
>>>>>>> Initial Goals
>>>>>>>
>>>>>>> We have developed a system for SINGA running on a commodity computer
>>>>>>> cluster. The initial goals include, * improving the system in
terms
>>>>>>> of
>>>>>>> scalability and efficiency, e.g., using Infiniband for network
>>>>>>> communication and multi-threading for one node computation. We
would
>>>>>>> consider extending SINGA to GPU clusters later. * benchmarking
with
>>>>>>> larger datasets (hundreds of millions of training instances)
and
>>>>>>> models (billions of parameters). * adding more built-in deep
learning
>>>>>>> models. Users can train the built-in models on their datasets
>>>>>>> directly.
>>>>>>>
>>>>>>> Current Status
>>>>>>>
>>>>>>> Meritocracy
>>>>>>>
>>>>>>> We would like to follow ASF meritocratic principles to encourage
more
>>>>>>> developers to contribute in this project. We know that only active
>>>>>>> and
>>>>>>> excellent developers can make SINGA a successful project. The
>>>>>>> committer list and PMC will be updated based on developers'
>>>>>>> performance and commitment. We are also improving the documentation
>>>>>>> and code to help new developers get started quickly.
>>>>>>>
>>>>>>> Community
>>>>>>>
>>>>>>> SINGA is currently being developed in the Database System Research
>>>>>>> Lab
>>>>>>> at the National University of Singapore (NUS) in collaboration
with
>>>>>>> Zhejiang University in China. Our lab has extensive experience
in
>>>>>>> building database related systems, including distributed systems.
Six
>>>>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
>>>>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
>>>>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen,
>>>>>>> Kian
>>>>>>> Lee Tan) have been working for a year on this project. We are
open to
>>>>>>> recruiting more developers from diverse backgrounds.
>>>>>>>
>>>>>>> Core Developers
>>>>>>>
>>>>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who
have
>>>>>>> worked on distributed systems for more than 20 years. They have
>>>>>>> collaborated with the industry and have built various large scale
>>>>>>> systems. Anh Dinh's research is also on distributed systems,
albeit
>>>>>>> with more focus on security aspects. Wei Wang's research is on
deep
>>>>>>> learning problems including deep learning applications and large
>>>>>>> scale
>>>>>>> training. Sheng Wang and Jinyang are working on efficient indexing,
>>>>>>> querying of large scale data and machine learning. Kaiping, Zhaojing
>>>>>>> and Zhongle are new PhD students who jointed SINGA recently.
They
>>>>>>> will
>>>>>>> work on this project for a longer time (next 4-5 years). While
we
>>>>>>> share common research interests, each member also brings diverse
>>>>>>> expertise to the team.
>>>>>>>
>>>>>>> Alignment
>>>>>>>
>>>>>>> ASF is already the home of many distributed platforms, e.g.,
Hadoop,
>>>>>>> Spark and Mahout, each of which targets a different application
>>>>>>> domain. SINGA, being a distributed platform for large-scale deep
>>>>>>> learning, focuses on another important domain for which there
still
>>>>>>> lacks a robust and scalable open-source platform. The recent
success
>>>>>>> of deep learning models especially for vision and speech recognition
>>>>>>> tasks has generated interests in both applying existing deep
learning
>>>>>>> models and in developing new ones. Thus, an open-source platform
for
>>>>>>> deep learning will be able to attract a large community of users
and
>>>>>>> developers. SINGA is a complex system needing many iterations
of
>>>>>>> design, implementation and testing. Apache's collaboration framework
>>>>>>> which encourages active contribution from developers will inevitably
>>>>>>> help improve the quality of the system, as shown in the success
of
>>>>>>> Hadoop, Spark, etc.. Equally important is the community of users
>>>>>>> which
>>>>>>> helps identify real-life applications of deep learning, and helps
to
>>>>>>> evaluate the system's performance and ease-of-use. We hope to
>>>>>>> leverage
>>>>>>> ASF for coordinating and promoting both communities, and in return
>>>>>>> benefit the communities with another useful tool.
>>>>>>>
>>>>>>> Known Risks
>>>>>>>
>>>>>>> Orphaned products
>>>>>>>
>>>>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang)
may
>>>>>>> leave
>>>>>>> the lab in two to four years time. It is possible that some of
them
>>>>>>> may not have enough time to focus on this project after that.
But,
>>>>>>> SINGA is part of our other bigger research projects on building
an
>>>>>>> infrastructure for data intensive applications, which include
>>>>>>> health-care analytics and brain-inspired computing. Beng Chin
and
>>>>>>> Kian
>>>>>>> Lee would continue working on it and getting more people involved.
>>>>>>> For
>>>>>>> example, three new developers (Kaiping, Zhaojing and Zhongle)
joined
>>>>>>> us recently. Individual developers are welcome to make SINGA
a
>>>>>>> diverse
>>>>>>> community that is robust and independent from any single developer.
>>>>>>>
>>>>>>> Inexperience with Open Source
>>>>>>>
>>>>>>> All the developers are active users and followers of open source
>>>>>>> projects. Our research lab has a strong commitment to open source,
>>>>>>> and
>>>>>>> has released the source code of several systems under open source
>>>>>>> license as a way of contributing back to the open source community.
>>>>>>> But we do not have much real experience in open source projects
with
>>>>>>> large and well organized communities like those in Apache. This
is
>>>>>>> one
>>>>>>> reason we choose Apache which is experienced in open source project
>>>>>>> incubation. We hope to get the help from Apache (e.g., champion
and
>>>>>>> mentors) to establish a healthy path for SINGA.
>>>>>>>
>>>>>>> Homogenous Developers
>>>>>>>
>>>>>>> Although the current developers are researchers in the universities,
>>>>>>> they have different research interests and project experiences,
as
>>>>>>> mentioned in the section that introduces the core developers.
We know
>>>>>>> that a diverse community is helpful. Hence we are open to the
idea of
>>>>>>> recruiting developers from other regions and organizations.
>>>>>>>
>>>>>>> Reliance on Salaried Developers
>>>>>>>
>>>>>>> As a research project in the university, SINGA's current developing
>>>>>>> community consists of professors, PhD students, research assistants
>>>>>>> and postdoctoral fellows. They are driven by their interests
to work
>>>>>>> on this project and have contributed actively since the start
of the
>>>>>>> project. The research assistants and fellows are expected to
leave
>>>>>>> when their contracts expire. However, they are keen to continue
to
>>>>>>> work on the project voluntarily. Moreover, as a long term research
>>>>>>> project, new research assistants and fellows are likely to join
the
>>>>>>> project.
>>>>>>>
>>>>>>> A Excessive Fascination with the Apache Brand
>>>>>>>
>>>>>>> We choose Apache not for publicity. We have two purposes. First,
we
>>>>>>> want to leverage Apache's reputation to recruit more developers
to
>>>>>>> make a diverse community. Second, we hope that Apache can help
us to
>>>>>>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
>>>>>>> are established database and distributed system researchers,
and
>>>>>>> together with the other contributors, they sincerely believe
that
>>>>>>> there is a need for a widely accepted open source distributed
deep
>>>>>>> learning platform. The field of deep learning is still at its
>>>>>>> infancy,
>>>>>>> and an open source platform will fuel the research in the area.
>>>>>>> Moreover, such a platform will enable researchers to develop
new
>>>>>>> models and algorithms, rather than spending time implementing
a deep
>>>>>>> learning system from scratch. Furthermore, the need for scalability
>>>>>>> for such a platform is obvious.
>>>>>>>
>>>>>>> Relationship with Other Apache Products
>>>>>>>
>>>>>>> Apache H2O implemented two simple deep learning models, namely
the
>>>>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
>>>>>>> significant differences between H2O and SINGA. First, H2O adopts
the
>>>>>>> Map-Reduce framework which runs a set of computing nodes in parallel
>>>>>>> againsts of the training set. Model parameters trained by all
>>>>>>> computing nodes are averaged as the final model parameters. This
>>>>>>> training algorithm is different from the distributed training
>>>>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
>>>>>>> synchronizes the parameters trained from different nodes. SINGA
>>>>>>> adopts
>>>>>>> the parameter server framework to support a wide range of distributed
>>>>>>> training algorithms and parallelization methods (e.g., data
>>>>>>> parallelism, model parallelism and hybrid parallelism. H2O only
>>>>>>> support data parallelism) . Second, in H2O, users are restricted
to
>>>>>>> use the two built-in models. In SINGA, we provide simple programming
>>>>>>> model to let users implement their own deep learning models.
A new
>>>>>>> deep learning model can be implemented by customizing the base
Layer
>>>>>>> class for each layer involved in the model. It is similar to
writing
>>>>>>> Hadoop programs where users only need to override the base Mapper
and
>>>>>>> Reducer. We also provide built-in models for users to use directly.
>>>>>>>
>>>>>>> Documentation
>>>>>>>
>>>>>>> The project is hosted at
>>>>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
>>>>>>> Documentations can be found at the Github Wiki Page:
>>>>>>> https://github.com/nusinga/singa/wiki. We continue to refine
and
>>>>>>> improve the documentation.
>>>>>>>
>>>>>>> Initial Source
>>>>>>>
>>>>>>> We use Github to maintain our source code,
>>> https://github.com/nusinga/singa
>>>>>>>
>>>>>>> Source and Intellectual Property Submission Plan
>>>>>>>
>>>>>>> We plan to make our code base be under Apache License, Version
2.0.
>>>>>>>
>>>>>>> External Dependencies
>>>>>>>
>>>>>>> required by the core code base: glog, gflags, google protobuf,
>>>>>>> open-blas, mpich, armci-mpi.
>>>>>>> required by data preparation and preprocessing: opencv, hdfs,
python.
>>>>>>>
>>>>>>> Cryptography
>>>>>>>
>>>>>>> Not Applicable
>>>>>>>
>>>>>>> Required Resources
>>>>>>>
>>>>>>> Mailing Lists
>>>>>>>
>>>>>>> Currently, we use google group for internal discussion. The mailing
>>>>>>> address is nusinga@googlegroup.com. We will migrate the content
to
>>>>>>> the
>>>>>>> apache mailing lists in the future.
>>>>>>>
>>>>>>> singa-dev
>>>>>>> singa-user
>>>>>>> singa-commits
>>>>>>> singa-private (for private discussion within PCM)
>>>>>>>
>>>>>>> Git Repository
>>>>>>>
>>>>>>> We want to continue using git for version control. Hence, a git
repo
>>>>>>> is required.
>>>>>>>
>>>>>>> Issue Tracking
>>>>>>>
>>>>>>> JIRA Singa (SINGA)
>>>>>>>
>>>>>>> Initial Committers
>>>>>>>
>>>>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>>>>>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
>>>>>>> Gang Chen (cg @zju.edu.cn)
>>>>>>> Wei Wang (wangwei @comp.nus.edu.sg)
>>>>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>>>>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>>>>>>> Sheng Wang (wangsh @comp.nus.edu.sg)
>>>>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
>>>>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>>>>>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
>>>>>>>
>>>>>>> Affiliations
>>>>>>>
>>>>>>> Beng Chin Ooi, National University of Singapore
>>>>>>> Kian Lee Tan, National University of Singapore
>>>>>>> Gang Chen, Zhejiang University
>>>>>>> Wei Wang, National University of Singapore
>>>>>>> Dinh Tien Tuan Anh, National University of Singapore
>>>>>>> Jinyang Gao, National University of Singapore
>>>>>>> Sheng Wang, National University of Singapore
>>>>>>> Kaiping Zheng, National University of Singapore
>>>>>>> Zhaojing Luo, National University of Singapore
>>>>>>> Zhongle Xie, National University of Singapore
>>>>>>>
>>>>>>> Sponsors
>>>>>>>
>>>>>>> Champion
>>>>>>>
>>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>>>>>
>>>>>>> Nominated Mentors
>>>>>>>
>>>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>>>>> Alan Gates (gates at apache dot org) - Hortonworks
>>>>>>> (Seeking more volunteers!)
>>>>>>>
>>>>>>> Sponsoring Entity
>>>>>>>
>>>>>>> We are requesting the Incubator to sponsor this project.
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>
>>>
>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message