incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Debo Dutta (dedutta)" <dedu...@cisco.com>
Subject Re: [VOTE] Accept DistributedLog into the Apache Incubator
Date Tue, 21 Jun 2016 07:05:20 GMT
+1




On 6/20/16, 10:11 PM, "Sijie Guo" <sijie@apache.org> wrote:

>Hello All,
>
>Following the discussion thread, I would like to call a VOTE on accepting
>DistributedLog into the Apache Incubator.
>
>[] +1 Accept DistributedLog into the Apache Incubator
>[] +0 Abstain.
>[] -1 Do not accept DistributedLog into the Apache Incubator because ...
>
>This vote will be open for at least 72 hours.
>
>The proposal follows, you can also access the wiki page:
>https://wiki.apache.org/incubator/DistributedLogProposal
>
>Here is my +1.
>
>Thanks,
>Sijie
>
>= Abstract =
>DistributedLog is a high-performance replicated log service. It offers
>durability, replication and strong consistency, which provides a
>fundamental building block for building reliable distributed systems, e.g
>replicated-state-machines, general pub/sub systems, distributed databases,
>distributed queues and etc.
>
>See “Building Distributedlog - Twitter’s high performance replicated log
>service” for details:
>https://blog.twitter.com/2015/building-distributedlog-twitter-s-high-performance-replicated-log-service
>
>= Proposal =
>We propose to contribute DistributedLog codebase and associated artifacts
>(e.g. documentation, web-site content etc.) to the Apache Software
>Foundation with the intent of forming a productive, meritocratic and open
>community around DistributedLog’s continued development, according to the
>‘Apache Way’.
>
>= Background =
>Engineers at Twitter began developing DistributedLog in early 2013.
>DistributedLog is described in a Twitter engineering blog post and
>presented at the Messaging Meetup in Sep 2015. It has been released as an
>Apache-licensed open-source project on GitHub in May 2016.
>
>DistributedLog is a high-performance replicated log service, which provides
>simple stream-oriented abstractions over log-segments and offers
>durability, replication and strong consistency for building reliable
>distributed systems. The features offered by DistributedLog includes:
>
> * Simple high-level, stream oriented interface
> * Naming and metadata scheme for managing streams and other entities
> * Log data management policies, include data segmentation and data
>retention
> * Fast write pipeline leveraging batching and compression
> * Fast read mechanism leveraging long-poll and read-ahead caching
> * Service tiers supporting writer fan-in and reader fan-out
> * Geo-replicated logs
>
>DistributedLog’s most important benefit is high-performance with a strong
>durability guarantee, making it extremely appropriate for running different
>workloads from distributed database journaling to real-time stream
>computing. Its modern, layered architecture makes it easy to run the
>service tiers in multi-tenant datacenter environments such as Apache Mesos
>or cloud environments such as EC2.
>
>= Rationale =
>DistributedLog is designed to provide core fundamental features like
>high-performance, durability and strong consistency to anyone who is
>building reliable distributed systems, in a simple and efficient way.
>
>We believe that the ASF is the right venue to foster an open-source
>community around DistributedLog’s development. We expect that
>DistributedLog will benefit from collaboration with related Apache
>projects, and under the auspices of the ASF will attract talented
>contributors who will push DistributedLog’s development forward at a faster
>pace.
>
>We believe that the timing is right for DistributedLog’s development to
>move to the ASF: DistributedLog has already run in production at Twitter
>for 3 years and served various workloads including a distributed database
>journal, reliable cross datacenter replication, search ingestion,
>andgeneral pub/sub messaging. The project is stable. We are excited to see
>where an ASF-based community can take DistributedLog.
>
>= Current Status =
>DistributedLog is a stable project that has been used in production at
>Twitter for 3 years. The source code is public at github.com/twitter, which
>will seed the Apache git repository.
>
>= Meritocracy =
>We understand the central importance of meritocracy to the Apache Way. We
>will work to establish a welcoming, fair and meritocratic community.
>Several companies have already expressed interest in this project, and we
>intend to invite additional developers to participate. We look forward to
>growing a rich user and developer community.
>
>= Community =
>There is a large need for a performant replicated log service for
>applications such as distributed databases, distributed transactional
>systems, replicated-state-machines and pub/sub messaging/queuing. We want
>to attract more developers to the project, and we believe that the ASF’s
>open and meritocratic philosophy will help us with this. We note the
>success of other similar projects already part of the ASF, like Kafka.
>
>= Core Developers =
>DistributedLog is actively developed within Twitter. Most of the developers
>are from Twitter. Many of them are committers or PMC members of Apache
>BookKeeper. Others aren’t currently affiliated with ASF so they will
>require new ICLAs.
>
>= Alignment =
>DistributedLog is related to several other Apache projects:
>
> * DistributedLog stores log segments as Ledgers in Apache BookKeeper.
> * DistributedLog uses Apache ZooKeeper for naming and metadata management
>and tracking the ownership of logs.
> * DistributedLog uses Apache Thrift as its RPC and serialization framework.
> * In the long-term, DistributedLog’s data will be stored in Apache Hadoop
>clusters powered by HDFS filesystem for archives and backup.
>
>= Known Risks =
>== Orphaned Products ==
>DistributedLog is used as the fundamental messaging infrastructure at
>Twitter. It has been serving production traffic for online database
>systems, search ingestion and a general pub/sub system. Twitter remains
>committed to developing and supporting the project. Twitter has a strong
>track record in standing behind projects that were contributed to the ASF
>by its employees, including Apache Mesos, Apache Aurora, Apache BookKeeper,
>Apache Hadoop. There are many companies are interested in using it in
>production.
>
>== Inexperience with Open Source ==
>The core developers of DistributedLog are committers of Apache BookKeeper.
>Although other committers on the initial list are committers or have less
>experience with the ASF, they already are active in Apache BookKeeper
>community. We are confident that the project can be run in accordance with
>Apache principles on an ongoing basis.
>
>== Homogeneous Developers ==
>The initial committers are from Twitter. We hope to encourage contributions
>from other developers and grow them into committers after they have had
>time to continue their contributions.
>
>== Reliance on Salaried Developers ==
>Many of DistributedLog’s initial set of committers work full-time on
>DistributedLog, and are paid to do so. However, as mentioned elsewhere, we
>anticipate growth in the developer community which we hope will include
>people from industry, hobbyists, and academics who have an interested in
>distributed messaging systems.
>
>== Relationships with Other Apache Products ==
>DistributedLog uses Apache BookKeeper to store log segments and Apache
>ZooKeeper to store log metadata and manage log namespaces. It provides an
>end-to-end solution for replicated logs, to make building reliable
>distributed systems much easier. Unlike Kafka or ActiveMQ, DistributedLog
>is not a full-fledged pub/sub, queuing or messaging system.  Instead, it is
>targeting on providing a fundamental building block for other distributed
>systems, offering durability, replication and consistency. So it could be
>used by other distributed systems, such as transactional log for replicated
>state machines (e.g., HDFS NameNode), WAL for distributed databases (e.g.
>HBase), Journal for in-memory services (e.g., Kestrel) and even storage
>backend for a full-fledged messaging system.
>
>== An Excessive Fascination with the Apache Brand ==
>DistributedLog builds on two existing top-level projects, Apache BookKeeper
>and Apache ZooKeeper. Some of the core developers actively participate in
>both projects and understand well the implications of being hosted by
>Apache. We would like this project to build on the same core values of ASF
>and to grow a community based on meritocracy. Also, there are several other
>projects already hosted by ASF in this space of reliable messaging and that
>overlap with DistributedLog in interests and scope. Consequently, the
>combination of all these observations makes us believe that DistributedLog
>should be hosted by the ASF.
>
>= Documentation =
>Building DistributedLog: Twitter’s high performance replicated log service (
>https://blog.twitter.com/2015/building-distributedlog-twitter-s-high-performance-replicated-log-service
>)
>
>Documentation located in http://distributedlog.io.
>
>= Initial Source =
>DistributedLog’s initial source contribution will come from
>http://github.com/twitter/distributedlog/.
>
>= External Dependencies =
>DistributedLog depends upon a number of third-party libraries, which we
>list below.
>
> * Apache BookKeeper (Apache Software License v2.0)
> * Apache Commons (Apache Software License v2.0)
> * Apache Maven (Apache Software License v2.0)
> * Apache Thrift (Apache Software License v2.0)
> * Apache ZooKeeper (Apache Software License v2.0)
> * Google Guava (Apache Software License v2.0)
> * Mockito (MIT License)
> * Junit (Eclipse Public License 1.0)
> * LZ4-java (Apache Software License v2.0)
> * SLF4J (MIT License)
> * Twitter Finagle (Apache Software License v2.0)
> * Twitter Scrooge (Apache Software License v2.0)
> * Twitter Util (Apache Software License v2.0)
>
>= Required Resources =
>We request that following resources be created for the project to use:
>
>== Mailing lists ==
> * private@distributedlog.incubator.apache.org (moderated subscriptions)
> * commits@distributedlog.incubator.apache.org
> * dev@distributedlog.incubator.apache.org
> * user@distributedlog.incubator.apache.org
>
>== Git repository ==
>https://git.apache.org/distributedlog.git
>
>== JIRA instance ==
>JIRA project DLOG (DLOG or DL)
>
>= Initial Committers =
> * Sijie Guo (Apache BookKeeper Committer, Twitter)
> * Robin Dhamankar (Apache BookKeeper Committer)
> * Leigh Stewart (Twitter)
> * Dave Rusek (Twitter)
> * Honggang Zhang (Twitter)
> * Jordan Bull (Twitter)
> * Satish Kotha (Twitter)
> * Aniruddha Laud
> * Franck Cuny (Twitter)
> * Eitan Adler (Twitter)
>
>== Affiliations ==
>Most of the initial committers are employees of Twitter, except Robin
>Dhamankar and Aniruddha Laud.
>
>= Sponsors =
>== Champion ==
>Flavio Junqueira
>
>== Nominated Mentors ==
> * Flavio Junqueira
> * Chris Nauroth
> * Henry Saputra
>
>= Sponsoring Entity =
>We ask that the Apache Incubator PMC to sponsor this proposal.
Mime
View raw message