incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gavin Li <lyo.ga...@gmail.com>
Subject Re: [PROPOSAL]Pistachio
Date Mon, 22 Jun 2015 18:35:55 GMT
Wiki has been created for the proposal:
https://wiki.apache.org/incubator/PistachioProposal.

The comments here has been addressed and reflected in the wiki.

Thanks,
Gavin Li

On Fri, Jun 19, 2015 at 11:30 AM, Gavin Li <lyo.gavin@gmail.com> wrote:

> Henry,
>
> Thanks for the suggestion.
>
> We agree that at early stage we'd better shunt the user discussion to dev
> list to help developing the community. I'll update the proposal on the wiki
> once I have write access on wiki.
>
> THanks,
> Gavin Li
>
> On Fri, Jun 19, 2015 at 10:51 AM, Henry Saputra <henry.saputra@gmail.com>
> wrote:
>
>> Since it is mostly used in Yahoo do you need pistachio-user list for now?
>>
>> Usually incubator project should focus all communications in dev@ list
>> to avoid distractions of emails.
>>
>>
>> - Henry
>>
>> On Thu, Jun 18, 2015 at 10:17 AM, Gavin Li <lyo.gavin@gmail.com> wrote:
>> > Hi,
>> >
>> > I want to propose project Pistachio to enter Apache Incubator.
>> >
>> > Below please find the proposal.
>> >
>> > Thanks,
>> > Gavin Li
>> >
>> >
>> >
>> > = Pistachio =
>> >
>> > == Abstract ==
>> >
>> > Pistachio is a fault-tolerant low latency distributed storage system
>> which
>> > enables simple embedding the computation to the storage layer to achieve
>> > best data locality. It evolves from Yahoo’s global user profile storage
>> > system.
>> >
>> > == Proposal ==
>> >
>> > Pistachio is a distributed key value store system with fault tolerance
>> and
>> > consistency guarantee. It supports multiple local storage engine
>> including
>> > in-memory, kyoto cabinet, rocks DB etc. Pistachio is being used as the
>> user
>> > profile storage for massive scale global ads products in Yahoo storing
>> 10+
>> > billion user profiles. The performance and reliability has been well
>> proven
>> > on production.
>> >
>> > Pistachio can easily embed computation to the storage layer to achieve
>> the
>> > best data locality to improve the computation performance significantly
>> > which is an innovative model comparing with the normal ways where the
>> > storage and compute are independent to each other.
>> >
>> > == Background ==
>> >
>> > Pistachio is originally designed and optimized for Yahoo’s large scale
>> > global open RTB(real-time bidding) use cases where latency is
>> critical(the
>> > whole request needs to be finished within 100ms including network round
>> > trips). It stores 10+ billion user profiles in 8 data centers.
>> >
>> > Then because of the great performance and the flexibility of local
>> storage
>> > choices, we evolved it to do distributed compute. Rich call back
>> interfaces
>> > are added to supports easy compute directly on top of the storage system
>> > local to the data partition. This model is totally different from the
>> > traditional distributed computation model where the storage and compute
>> are
>> > separated and independent. In the new model we found data locality can
>> be
>> > improved significantly and lots of data access round trips can be
>> reduced
>> > in computation, and the performance can be improved significantly.
>> >
>> > It was publicly announced in April 2015 and currently being hosted in
>> > Github.
>> >
>> > == Rationale ==
>> >
>> > As a key value store system Pistachio is unique in terms of low latency
>> > access with fault tolerance and consistency guarantee. The reliability,
>> > scalability, fault tolerance and performance has been well proven in
>> global
>> > large scale revenue supporting production system in Yahoo.
>> >
>> > As a distributed computation system, it’s an innovative model where the
>> > compute layer is introduced on top of the storage layer natively and
>> > naturally to optimize the data locality of computation.
>> >
>> > Operating the project in “apache way” greatly aligns with the long-term
>> > vision of this project and can greatly help the development of the
>> > community.
>> >
>> > == Current Status ==
>> >
>> > Pistachio was open-sourced and announced in April 2015 and currently
>> being
>> > hosted in Github, it was mainly being developed by the team from Yahoo
>> and
>> > already attracted lots of external developers (20+ watches and forks on
>> > github).
>> >
>> > == Meritocracy ==
>> >
>> > We plan to build an environment following the Apache meritocracy
>> > principles. Many companies including Linkedin, GF securities, Microsoft
>> and
>> > open source communities like deeplearning4j have already expressed
>> > interests or accepted the invitations to participate in this project.
>> >
>> > == Community ==
>> >
>> > Since the announcement of Pistachio we received lots of interests. And
>> the
>> > concept of embedding computation to storage also got lots of
>> recognitions.
>> > We also started to work with other communities like deeplearning4j to
>> build
>> > more application use cases with Pistachio. We believe the community will
>> > grow fast.
>> >
>> > == Core Developers ==
>> >
>> > This project is created by Gavin Li. Core developers are currently
>> mainly
>> > in Yahoo.
>> >
>> > == Alignment ==
>> >
>> > Pistachio depends on many Apache projects and dependencies including
>> Kafka,
>> > Helix, Zookeeper, Curator, Apache Commons, etc.
>> >
>> > == Known Risks ==
>> >
>> > === Orphaned Products ===
>> >
>> > The risk of Pistachio being orphaned is small because Yahoo heavily
>> > invested in this system. It’s the internal storage standard for Yahoo’s
>> > global ads products and still being expanded. Migration cost from this
>> > project is very high. We are also working with external communities like
>> > deeplearning4j and other companies to expand the applications.
>> >
>> > === Inexperience with Open Source ===
>> >
>> > Core developers are experienced open source contributors in many
>> projects
>> > including Druid, Spark, Storm, etc. Pistachio committers will be guided
>> by
>> > the mentors with strong Apache open source project backgrounds.
>> >
>> > === Homogeneous Developers ===
>> >
>> > The initial committers include developers from several institutions
>> > including Microsoft, GF Securities, Linkedin and Yahoo.
>> >
>> > === Reliance on Salaried Developers ===
>> >
>> > We work on Pistachio on both salaried time and after hours. Many
>> developers
>> > from other institutions already accepted the invitation to volunteer
>> > working on Pistachio.
>> >
>> > === Relationships with Other Apache Products ===
>> >
>> > As mentioned earlier, Pistachio depends on apache kafka, helix,
>> zookeeper,
>> > curator, etc.
>> >
>> > === A Excessive Fascination with the Apache Brand ===
>> >
>> > Generating publicity is not the purpose of this proposal. We mainly
>> want to
>> > join the ASF in order to increase our contacts and visibility in the
>> open
>> > source world to attract great developers.
>> >
>> > == Document ==
>> >
>> > Current documentation can be found here:
>> https://github.com/yahoo/Pistachio.
>> >
>> > == Initial source ==
>> >
>> > Initial source can be found here in the Github repo:
>> > https://github.com/yahoo/Pistachio.
>> >
>> > == External dependencies ==
>> >
>> > To the best of our knowledge, here is the list of dependencies:
>> > Rocks DB
>> > ICU4j
>> > Apache Curator
>> > netty
>> > google http client
>> > codahale.metrics
>> > apache helix
>> > apache zookeeper
>> > apache commons
>> > apache thrift
>> > apache kafka
>> > kyoto cabinet (GNU GPL)
>> > google protocol buffer
>> > kryo
>> > slf4j
>> >
>> > To the best of our knowledge, except kyoto cabinet others are all
>> > distributed under Apache compatible licenses:
>> > BSD
>> > ICU
>> > Apache License 2.0
>> > MIT
>> >
>> > Kytoto cabinet is under GNU GPL, but it is not a hard necessary
>> dependency
>> > to Pistachio, it’s an optional pluggable storage engine. It’s designed
>> in
>> > the way that it’s totally plugable and very loosely coupled. We can
>> easily
>> > remove it in graduation.
>> >
>> > == Required Resources ==
>> >
>> > Mailing Lists
>> >
>> > pistachio-user
>> > pistachio-dev
>> > pistachio-commits
>> > pistachio-private (for private PMC discussions)
>> >
>> > Git
>> >
>> > The Pistachio team prefers Git for source version control: git://
>> > git.apache.org/pistachio
>> >
>> > Issue Tracking
>> >
>> > JIRA Pistachio (PISTACHIO)
>> >
>> > Other Resources
>> >
>> > Jenkins continuous integration testing
>> >
>> > == Initial Committers ==
>> >
>> > Gavin Li <lyo.gavin at gmail dot com>
>> > Lie Yang <lyang at yahoo-inc dot com>
>> > Jay Kim <pitecus at yahoo-inc dot com>
>> > Flavio Junqueira <fpj at apache dot org>
>> > Chihong Liang<chihong.liang at gmail dot com>
>> > Yong Liu<ly7110 at gmail dot com>
>> > Shengwu Yang <yangshengwu at gmail dot com>
>> >
>> > == Affiliations ==
>> >
>> > Gavin Li - Yahoo
>> > Flavio Junqueira - Microsoft
>> > Chihong Liang - GF securities
>> > Yong Liu - Yingmi Asset Management Corp.
>> > Lie Yang - Yahoo
>> > Jay Kim - Yahoo
>> > Shengwu Yang - Linkedin China
>> >
>> > == Sponsors ==
>> >
>> > === Champion ===
>> >
>> > Flavio Junqueira <fpj at apache dot org>
>> >
>> > === Nominated Mentors ===
>> >
>> > === Sponsoring Entity ===
>> >
>> > The Apache Incubator
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message