incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan D. Cabrera" <l...@toolazydogs.com>
Subject Re: [VOTE] Accept Concerted into the Apache Incubator
Date Sun, 11 Oct 2015 20:33:46 GMT
I’m not sure this needs to be resolved before the polling can be accepted into the Incubator.


Regards,
Alan


> On Oct 9, 2015, at 2:01 PM, Julian Hyde <jhyde@apache.org> wrote:
> 
> I have agreed to be a mentor to Concerted and I think it is an
> interesting idea. I am inclined to vote for it entering the incubator.
> 
> However since the project has not released any source code yet, there
> are a couple of questions I'd like to get answered for the record:
> 
> 1. How many lines of existing code are there? What is their approximate age?
> 
> 2. Concerted is in C/C++ but you mention interfacing with JVM-based
> products like Hive. How you would interface with other languages? Is
> it a goal of the project to create APIs to other languages such as
> Java? Would access from those languages be as efficient as native
> access?
> 
> I apologize that I didn't bring these up in the discussion thread.
> 
> Julian
> 
> 
> On Fri, Oct 9, 2015 at 11:53 AM, Ayrton Gomesz <com.ayrton@gmail.com> wrote:
>> +1
>> @henry.saputra thanks man
>> On Oct 9, 2015 5:50 PM, "Henry Saputra" <henry.saputra@gmail.com> wrote:
>> 
>>> +1 (binding)
>>> Good luck guys!
>>> 
>>> On Fri, Oct 9, 2015 at 8:55 AM, Atri Sharma <atri@apache.org> wrote:
>>>> Hi all,
>>>> 
>>>> Following the discussion about Concerted I would like to call a vote for
>>>> accepting Concerted as a new incubator project.
>>>> 
>>>> The proposal text is included below, and available on the wiki:
>>>> 
>>>> https://wiki.apache.org/incubator/ConcertedProposal
>>>> 
>>>> The vote is open for 72 hours:
>>>> 
>>>> [ ] +1 accept Concerted in the Incubator
>>>> [ ] ±0
>>>> [ ] -1 (please give reason)
>>>> 
>>>> Regards,
>>>> 
>>>> Atri
>>>> 
>>>> = Abstract =
>>>> 
>>>> Concerted is an in memory write less read more engine aimed to provide
>>>> extreme read performance with very high degree of concurrency and
>>>> scalability and focus on minimizing own resource footprint.
>>>> 
>>>> = Proposal =
>>>> Concerted is built on the principal that a new type of workload is
>>>> dominating the scene and is now needed to be supported. These are the
>>> large
>>>> data set analytical workloads being analyzed or used on large clusters or
>>>> high power machines. Large analytical workloads depend on the ability to
>>>> query large data sets efficiently and in high concurrency while
>>> maintaining
>>>> semantics such as immediate consistency. An in memory engine designed to
>>>> support extreme read queries while providing support for aggregation
>>>> through various features (such as multidimensional representation of
>>>> tuples) will accelerate many usecases around large scale analytics.
>>>> 
>>>> Concerted believes that best understanding of user application lies with
>>>> user application developer. The need for massive read scaling should be
>>> on
>>>> demand and should be flexible to the level that user can decide as to
>>> which
>>>> representation and access of data suits his/her current requirements.
>>>> Hence, Concerted is not built in a traditional client/server model.
>>>> Concerted provides users with an API which can be used to load, read,
>>>> update and delete data. User chooses which data structure has to be used
>>>> for his current requirements. All API access is covered by Concerted's
>>>> internal systems like lock manager, transaction manager and cache manager
>>>> which ensure that reads scale to high level in every API call.
>>>> 
>>>> Concerted is a Do It Yourself in memory platform for making in memory
>>>> supporting engines. The use case we think of is supporting big data
>>>> warehouses like Hive, but there are endless use cases for a custom,
>>> highly
>>>> scalable in memory platform.
>>>> 
>>>> The goal of this proposal is to leverage an existing code base available
>>> on
>>>> Github and licensed under the Apache License 2.0 to build a community
>>>> around the project. Currently the community consists of existing hackers
>>> of
>>>> Concerted as well as people who have been following and associated with
>>> the
>>>> project since a while as well as database experts who are excited about
>>>> building a project like this. We are hoping that entering into Apache
>>> would
>>>> help us attract more contributors as well as connect with existing big
>>> data
>>>> projects like Apache Hive, Apache HAWQ, Apache Storm, Apache Tajo, Apache
>>>> Spark, Apache Geode to leverage their community base while assisting in
>>>> their use cases with Concerted. We had a discussion with founders of
>>> Apache
>>>> Tajo and they showed interest in using Concerted for some of their use
>>>> cases.
>>>> = Background =
>>>> Relational databases were built with the cost of physical memory in mind.
>>>> The cost is no longer very relevant and physical memory is now available
>>> on
>>>> demand. Another driving factor behind Concerted is that there is a
>>> paradigm
>>>> shift with big data coming into picture. Disk IO speeds are more of a
>>>> bottleneck than ever before. Combining the read dominance of analytical
>>>> workload with the speed of in memory structures, Concerted fits the
>>> current
>>>> scene. Also, supporting OLAP workloads with in memory support for faster
>>>> read constant queries and joins will be useful.
>>>> 
>>>> = Rationale =
>>>> As explained above, large analytical workloads need an in memory
>>>> lightweight engine which supports massive read concurrency, ground level
>>>> support for aggregations and analytics, extreme scalability and high read
>>>> performance, along with the engine being very light itself. Concerted
>>> aims
>>>> to solve these needs. Concerted is designed and built with three goals as
>>>> objectives:
>>>> 
>>>> 
>>>> Performance
>>>>    To provide high performance access to data from a large number of
>>> rows,
>>>> Concerted uses efficient representation and in memory indexing of data
>>>> coupled with high performance transactions, custom transactions and
>>>> lightweight locking and lockless techniques and an intelligent locking
>>>> manager.
>>>> 
>>>> Scalability
>>>>    Concerted is built with extreme concurrency and scalability in mind.
>>>> 
>>>> Efficiency
>>>>    Concerted aims to give expected performance under vast variety of
>>>> workloads and aims to have as low footprint as possible.
>>>> 
>>>> = Initial Goals =
>>>> The initial goal is to leverage an existing code base and invest in
>>>> building a community around the project. We anticipate a lot of initial
>>>> restructuring of the existing code so that it becomes easier to include
>>> new
>>>> contributors and minimize ramp up time. We plan to approach this
>>>> refactoring in a fully transparent, community-driven way thus starting to
>>>> practice the "Apache Way" governance model from the get go.
>>>> 
>>>> Various contributors are getting individual changes into branches in
>>> github
>>>> repository and our initial major goal will be to merge in all those
>>> changes
>>>> in master repository.
>>>> 
>>>> = Current Status =
>>>> Concerted is currently under restructuring to suit the needs of an open
>>>> source project. Current source is available at
>>>> https://github.com/atris/Concerted (Please note that updated codebase is
>>>> not yet present on github) Concerted is currently being licensed under
>>>> Apache License 2.0. Most of the code base is implemented in C and C++ and
>>>> has external dependencies listed later.
>>>> 
>>>> == Meritocracy ==
>>>> 
>>>> We plan to drive the technical roadmap and implementation in a fully
>>>> transparent, community-driven way soliciting feedback from all of the
>>>> community members and building a consensus-driven approach to evolving
>>> the
>>>> code base and the community itself. Users and new contributors will be
>>>> treated with respect and welcomed. By participating in the community and
>>>> providing quality patches/support that move the project forward,
>>>> contributors will earn merit. They also will be encouraged to provide
>>>> non-code contributions (documentation, events, community management,
>>> etc.)
>>>> and will gain merit for doing so. Those with a proven support and quality
>>>> track record will be encouraged to become committers.
>>>> 
>>>> == Community ==
>>>> In memory is the new cutting edge thing and a new community around
>>>> performance oriented systems and enhancing relational database
>>> performance
>>>> by having complete in memory OLTP engines will greatly benefit
>>> performance.
>>>> So we expect data warehousing projects and communities as well as
>>> projects
>>>> and companies looking for high performance OLTP performance. In addition,
>>>> Ingenium Data Systems is building products around Concerted and will have
>>>> salaried developers contribute to the project as part of job
>>> responsibility.
>>>> 
>>>> == Core Developers ==
>>>> Core developers are a diverse group of developers, many of which are very
>>>> experienced in open source and the Apache Hadoop ecosystem. Specifically,
>>>> Atri is an Apache Apex committer and Atri and Pavel are major
>>> contributors
>>>> to PostgreSQL project.Atri is also committer for other open source
>>> projects.
>>>> 
>>>> * Amrish <amrishs AT ingeniumsys DOT com>
>>>> * Nupur S <nupurs AT ingeniumsys DOT com>
>>>> * Pavel Stehule <pavel DOT stehule AT gmail.com>
>>>> * Atri Sharma <atri AT apache DOT org>
>>>> * Nishith Singhal <nishsinghal AT gmail DOT com>
>>>> * Michael Down <michael AT dowuk DOT com>
>>>> * Vijayakumar Ramdoss <vijayakumar DOT ramdoss AT emc DOT com>
>>>> * Wang Albert <albertwang87 AT gmail DOT com>
>>>> * Hans-Jurgen Schonig <postgres AT cybertec DOT at>
>>>> * Kris Popat <krispopat AT apache DOT org>
>>>> * Ayrton Gomesz <com DOT ayrton AT gmail DOT com>
>>>> 
>>>> == Alignment ==
>>>> Concerted will be helpful to systems like Tajo which can benefit with in
>>>> memory structures optimized for heavy reads and joins (dimension tables).
>>>> In addition Concerted will benefit projects looking for in memory
>>>> relational database as a metadata store, which is the case for most of
>>> the
>>>> Apache Big Data projects. We expect Apache HAWQ (incubating), Apache
>>> Hive,
>>>> Apache Storm, Apache Tajo to be utilizing Concerted as a supporting
>>> engine.
>>>> For eg, a data warehouse built on HAWQ, Hive or Tajo can utilize
>>> Concerted
>>>> as an in memory engine for querying and joining dimensional tables.
>>>> 
>>>> = Known Risks =
>>>> 
>>>> == Orphaned Products ==
>>>> Most of the code is developed by a small group of core developers and
>>> this
>>>> may be a risk for orphaned product. However, the code base is simple as
>>>> compared to other open source projects and the interest level in
>>> Concerted
>>>> has risen exponentially over the years with many computer professionals
>>>> expressing interest in the project and doing some use cases of the
>>>> same.Specifically, there were some projects done around Concerted in
>>> JIIT,
>>>> Noida (an engineering school) and Wang is a student in Lehigh University
>>>> who has been following Concerted's progress over many years. The core
>>>> developers are aligned with this project and since the code base is
>>> simple,
>>>> future committers will have a quick ramp up and the risk shall be
>>>> mitigated. Besides, Ingenium Data Systems is launching a product based on
>>>> Concerted and will be having all its salaried developers contribute to
>>>> Concerted as a part of their job functions.
>>>> 
>>>> == Inexperience with Open Source ==
>>>> Most of the initial committers have experience working on open source
>>>> projects. In particular, Atri is an active member of many open source
>>>> projects.
>>>> 
>>>> == Homogeneous Developers ==
>>>> Although initial core developers were based out of India, community now
>>>> consists of computer professionals from various parts of the world hence
>>>> diversity should not be an issue. In addition, we will be documenting
>>>> internals of the project in public facing documents and it shall allow
>>> more
>>>> contributors to join in.
>>>> 
>>>> == Reliance on Salaried Developers ==
>>>> It is expected that Concerted development will occur on both salaried
>>> time
>>>> and on volunteer time. Nupur and Amrish belong to Ingenium and are
>>>> committed to building this project along with their team. Atri, as the
>>>> originator of this project, will be actively working on the project and
>>> is
>>>> now pushing Concerted into major data warehousing projects, since he is
>>>> involved in architecture of data platforms. Developers are expected to be
>>>> contributing in their volunteer time. In addition, we will be working
>>> with
>>>> various open source projects which will be benefited by Concerted and
>>> will
>>>> be involving those communities into Concerted's development as well. For
>>>> eg, Apache Tajo has shown interest and will be supporting development of
>>>> the project.
>>>> 
>>>> == Relationships with Other Apache Products ==
>>>> Concerted has some overlapping function with Apache Geode(Incubating).
>>>> However, Geode is an in memory key value store whereas Concerted is a
>>> write
>>>> less read many engine. Concerted will complement Geode and increase the
>>> use
>>>> cases Geode can support with Concerted's help.
>>>> 
>>>> A major objective for Concerted is supporting OLAP workloads and data
>>>> warehouses with in memory performance and highly performant reads and
>>>> joins. Concerted will be collaborating with many open source projects
>>> such
>>>> as Apache HAWQ (incubating), Apache Hive, Apache Tajo etc to support
>>> their
>>>> OLAP workloads hence enabling them to support larger set of usecases
>>> with a
>>>> better throughput. For eg, a star schema in Hive will benefit from having
>>>> dimension tables in Concerted with highly efficient and scalable reads
>>> and
>>>> joins will be very fast. Similar workload for Tajo.
>>>> 
>>>> Concerted will fit in many other use cases in Apache spectrum as well.
>>> For
>>>> eg, Concerted can be used with Apache Geode for in memory aggregation
>>>> indexing. Concerted can also be used with Apache Flink for streaming real
>>>> time data into in memory, perform in memory aggregation and then
>>> performing
>>>> batch processing for efficiency.
>>>> 
>>>> 
>>>> == A Excessive Fascination with the Apache Brand ==
>>>> We believe that the "Apache Way" governance model will provide additional
>>>> help to us in finding contributors and growing the community. The
>>> community
>>>> and development process will make this project more stable and help
>>>> establish ubiquitous APIs. In addition, Concerted is looking to support
>>>> multiple Apache projects in their use cases and accelerate their
>>>> performance while soliciting their support in development of the project.
>>>> We will not be using Apache brand for excessive branding or with any
>>>> commercial aspects of Concerted. Apache brand will primarily be used for
>>>> community building.
>>>> 
>>>> = Documentation =
>>>> Public documents are currently in development and will be published soon.
>>>> 
>>>> = Initial Source =
>>>> The initial source is written in C++ and is heavily in development. It
>>> will
>>>> be restructured and released publicly.
>>>> We understand that there might be concerns around github source being
>>>> developed by only a single person and development not happening after
>>> 2013.
>>>> The source on github is only the source initially developed as an
>>>> independent project hence the limitation. However, due to reason that
>>>> project has been present on github for a while now, it has attracted
>>>> attention and people have been using and developing it locally. For eg,
>>>> Ingenium Data System took an interest in the project and locally
>>> developed
>>>> it and used it in an upcoming product they are going to release soon. The
>>>> project now wants to accumulate all independent development efforts and
>>>> help attract people to grow the community and project. We are currently
>>> in
>>>> process of updating github repository and making branches for all local
>>>> development efforts.
>>>> 
>>>> = Source and Intellectual Property Submission Plan =
>>>> 
>>>> We intend the entire code base to be licensed under the Apache License,
>>>> Version 2.0.
>>>> 
>>>> = External Dependencies =
>>>> Currently, Concerted only depends on g++ compiler and pthreads. pthreads
>>>> will be replaced by Boost in next release.
>>>> 
>>>> = Cryptography =
>>>> 
>>>> N/A
>>>> 
>>>> = Required Resources =
>>>> == Mailling List ==
>>>> *private@concerted.incubator.apache.org (moderated subscriptions)
>>>> *commits@concerted.incubator.apache.org
>>>> *dev@concerted.incubator.apache.org
>>>> *issues@concerted.incubator.apache.org
>>>> 
>>>> == Git Repository ==
>>>> 
>>>> https://git-wip-us.apache.org/repos/asf/incubator-concerted.git
>>>> 
>>>> == Issue Tracking ==
>>>> Jira Concerted (CONCERTED)
>>>> 
>>>> == Other Resources ==
>>>> * Continuous Integration
>>>>  * Jenkins
>>>> * Wiki
>>>>  * cwiki.apache.org/confluence/display/CONCERTED
>>>> 
>>>> = Initial Committers =
>>>> * Roman Shaposhnik <rvs AT apache DOT org>
>>>> * Daniel Dai <daijy AT apache DOT org>
>>>> * Jake Farrell <jfarrell AT apache DOT org>
>>>> * Lars Hofhansl <larsh AT apache DOT org>
>>>> * Julian Hyde <jhyde AT apache DOT org>
>>>> * Chris Nauroth <cnauroth AT hortonworks DOT com>
>>>> * Pavel Stehule <pavel DOT stehule AT gmail.com>
>>>> * Amrish <amrishs AT ingeniumsys DOT com>
>>>> * Nupur S <nupurs AT ingeniumsys DOT com>
>>>> * Atri Sharma <atri AT apache DOT org>
>>>> * Nishith Singhal <nishsinghal AT gmail DOT com>
>>>> * Michael Down <michael AT dowuk DOT com>
>>>> * Vijayakumar Ramdoss <vijayakumar DOT ramdoss AT emc DOT com>
>>>> * Wang Albert <albertwang87 AT gmail DOT com>
>>>> * Hans-Jurgen Schonig <postgres AT cybertec DOT at>
>>>> * Kris Popat <krispopat AT apache DOT org>
>>>> * Ayrton Gomesz <com DOT ayrton AT gmail DOT com>
>>>> 
>>>> = Affiliations =
>>>> * Roman Shaposhnik (Pivotal)
>>>> * Daniel Dai (HortonWorks)
>>>> * Jake Farrell (Acquia)
>>>> * Lars Hofhansl (Salesforce)
>>>> * Julian Hyde (HortonWorks)
>>>> * Chris Nauroth (HortonWorks)
>>>> * Pavel Stehule (GoodData)
>>>> * Amrish (Ingenium Data Systems)
>>>> * Nupur S (Ingenium Data Systems)
>>>> * Atri Sharma (Barclays)
>>>> * Nishith Singhal (Wipro)
>>>> * Michael Down (Barclays)
>>>> * Vijayakumar Ramdoss (EMC)
>>>> * Wang Albert (Lehigh University)
>>>> * Hans- Jurgen Schonig (CyberTec)
>>>> * Kris Popat (CETIS LLP)
>>>> * Ayrton Gomesz (IQLabs)
>>>> 
>>>> The nominated mentors are employees of HortonWorks, Acquia, and
>>> Salesforce.
>>>> 
>>>> * Daniel Dai (HortonWorks)
>>>> * Jake Farrell (Acquia)
>>>> * Lars Hofhansl (Salesforce)
>>>> * Julian Hyde (HortonWorks)
>>>> * Chris Nauroth (HortonWorks)
>>>> 
>>>> = Sponsors =
>>>> 
>>>> == Champion ==
>>>> 
>>>> * Roman Shaposhnik (rvs AT apache DOT org)
>>>> 
>>>> == Nominated Mentors ==
>>>> 
>>>> * Daniel Dai <daijy AT apache DOT org>
>>>> * Jake Farrell <jfarrell AT apache DOT org>
>>>> * Lars Hofhansl <larsh AT apache DOT org>
>>>> * Julian Hyde <jhyde AT apache DOT org>
>>>> * Chris Nauroth <cnauroth AT hortonworks DOT com>
>>>> 
>>>> == Sponsoring Entity ==
>>>> Apache Incubator
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>> 
>>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message