incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John D. Ament" <johndam...@apache.org>
Subject Re: [PROPOSAL]Pistachio
Date Tue, 23 Jun 2015 02:45:09 GMT
On Mon, Jun 22, 2015 at 10:26 PM Andrew Purtell <apurtell@apache.org> wrote:

> > Pistachio can easily embed computation to the storage layer to achieve
> the
> > best data locality to improve the computation performance significantly
> > which is an innovative model comparing with the normal ways where the
> > storage and compute are independent to each other.
>
> Have you heard of something called Hadoop?
>

Regardless of whether he has or not - what's your point? The ASF has
historically not denied the entry of new projects just because their domain
intersects with another project's.


>
>
> On Thu, Jun 18, 2015 at 10:17 AM, Gavin Li <lyo.gavin@gmail.com> wrote:
>
> > Hi,
> >
> > I want to propose project Pistachio to enter Apache Incubator.
> >
> > Below please find the proposal.
> >
> > Thanks,
> > Gavin Li
> >
> >
> >
> > = Pistachio =
> >
> > == Abstract ==
> >
> > Pistachio is a fault-tolerant low latency distributed storage system
> which
> > enables simple embedding the computation to the storage layer to achieve
> > best data locality. It evolves from Yahoo’s global user profile storage
> > system.
> >
> > == Proposal ==
> >
> > Pistachio is a distributed key value store system with fault tolerance
> and
> > consistency guarantee. It supports multiple local storage engine
> including
> > in-memory, kyoto cabinet, rocks DB etc. Pistachio is being used as the
> user
> > profile storage for massive scale global ads products in Yahoo storing
> 10+
> > billion user profiles. The performance and reliability has been well
> proven
> > on production.
> >
> > Pistachio can easily embed computation to the storage layer to achieve
> the
> > best data locality to improve the computation performance significantly
> > which is an innovative model comparing with the normal ways where the
> > storage and compute are independent to each other.
> >
> > == Background ==
> >
> > Pistachio is originally designed and optimized for Yahoo’s large scale
> > global open RTB(real-time bidding) use cases where latency is
> critical(the
> > whole request needs to be finished within 100ms including network round
> > trips). It stores 10+ billion user profiles in 8 data centers.
> >
> > Then because of the great performance and the flexibility of local
> storage
> > choices, we evolved it to do distributed compute. Rich call back
> interfaces
> > are added to supports easy compute directly on top of the storage system
> > local to the data partition. This model is totally different from the
> > traditional distributed computation model where the storage and compute
> are
> > separated and independent. In the new model we found data locality can be
> > improved significantly and lots of data access round trips can be reduced
> > in computation, and the performance can be improved significantly.
> >
> > It was publicly announced in April 2015 and currently being hosted in
> > Github.
> >
> > == Rationale ==
> >
> > As a key value store system Pistachio is unique in terms of low latency
> > access with fault tolerance and consistency guarantee. The reliability,
> > scalability, fault tolerance and performance has been well proven in
> global
> > large scale revenue supporting production system in Yahoo.
> >
> > As a distributed computation system, it’s an innovative model where the
> > compute layer is introduced on top of the storage layer natively and
> > naturally to optimize the data locality of computation.
> >
> > Operating the project in “apache way” greatly aligns with the long-term
> > vision of this project and can greatly help the development of the
> > community.
> >
> > == Current Status ==
> >
> > Pistachio was open-sourced and announced in April 2015 and currently
> being
> > hosted in Github, it was mainly being developed by the team from Yahoo
> and
> > already attracted lots of external developers (20+ watches and forks on
> > github).
> >
> > == Meritocracy ==
> >
> > We plan to build an environment following the Apache meritocracy
> > principles. Many companies including Linkedin, GF securities, Microsoft
> and
> > open source communities like deeplearning4j have already expressed
> > interests or accepted the invitations to participate in this project.
> >
> > == Community ==
> >
> > Since the announcement of Pistachio we received lots of interests. And
> the
> > concept of embedding computation to storage also got lots of
> recognitions.
> > We also started to work with other communities like deeplearning4j to
> build
> > more application use cases with Pistachio. We believe the community will
> > grow fast.
> >
> > == Core Developers ==
> >
> > This project is created by Gavin Li. Core developers are currently mainly
> > in Yahoo.
> >
> > == Alignment ==
> >
> > Pistachio depends on many Apache projects and dependencies including
> Kafka,
> > Helix, Zookeeper, Curator, Apache Commons, etc.
> >
> > == Known Risks ==
> >
> > === Orphaned Products ===
> >
> > The risk of Pistachio being orphaned is small because Yahoo heavily
> > invested in this system. It’s the internal storage standard for Yahoo’s
> > global ads products and still being expanded. Migration cost from this
> > project is very high. We are also working with external communities like
> > deeplearning4j and other companies to expand the applications.
> >
> > === Inexperience with Open Source ===
> >
> > Core developers are experienced open source contributors in many projects
> > including Druid, Spark, Storm, etc. Pistachio committers will be guided
> by
> > the mentors with strong Apache open source project backgrounds.
> >
> > === Homogeneous Developers ===
> >
> > The initial committers include developers from several institutions
> > including Microsoft, GF Securities, Linkedin and Yahoo.
> >
> > === Reliance on Salaried Developers ===
> >
> > We work on Pistachio on both salaried time and after hours. Many
> developers
> > from other institutions already accepted the invitation to volunteer
> > working on Pistachio.
> >
> > === Relationships with Other Apache Products ===
> >
> > As mentioned earlier, Pistachio depends on apache kafka, helix,
> zookeeper,
> > curator, etc.
> >
> > === A Excessive Fascination with the Apache Brand ===
> >
> > Generating publicity is not the purpose of this proposal. We mainly want
> to
> > join the ASF in order to increase our contacts and visibility in the open
> > source world to attract great developers.
> >
> > == Document ==
> >
> > Current documentation can be found here:
> > https://github.com/yahoo/Pistachio.
> >
> > == Initial source ==
> >
> > Initial source can be found here in the Github repo:
> > https://github.com/yahoo/Pistachio.
> >
> > == External dependencies ==
> >
> > To the best of our knowledge, here is the list of dependencies:
> > Rocks DB
> > ICU4j
> > Apache Curator
> > netty
> > google http client
> > codahale.metrics
> > apache helix
> > apache zookeeper
> > apache commons
> > apache thrift
> > apache kafka
> > kyoto cabinet (GNU GPL)
> > google protocol buffer
> > kryo
> > slf4j
> >
> > To the best of our knowledge, except kyoto cabinet others are all
> > distributed under Apache compatible licenses:
> > BSD
> > ICU
> > Apache License 2.0
> > MIT
> >
> > Kytoto cabinet is under GNU GPL, but it is not a hard necessary
> dependency
> > to Pistachio, it’s an optional pluggable storage engine. It’s designed in
> > the way that it’s totally plugable and very loosely coupled. We can
> easily
> > remove it in graduation.
> >
> > == Required Resources ==
> >
> > Mailing Lists
> >
> > pistachio-user
> > pistachio-dev
> > pistachio-commits
> > pistachio-private (for private PMC discussions)
> >
> > Git
> >
> > The Pistachio team prefers Git for source version control: git://
> > git.apache.org/pistachio
> >
> > Issue Tracking
> >
> > JIRA Pistachio (PISTACHIO)
> >
> > Other Resources
> >
> > Jenkins continuous integration testing
> >
> > == Initial Committers ==
> >
> > Gavin Li <lyo.gavin at gmail dot com>
> > Lie Yang <lyang at yahoo-inc dot com>
> > Jay Kim <pitecus at yahoo-inc dot com>
> > Flavio Junqueira <fpj at apache dot org>
> > Chihong Liang<chihong.liang at gmail dot com>
> > Yong Liu<ly7110 at gmail dot com>
> > Shengwu Yang <yangshengwu at gmail dot com>
> >
> > == Affiliations ==
> >
> > Gavin Li - Yahoo
> > Flavio Junqueira - Microsoft
> > Chihong Liang - GF securities
> > Yong Liu - Yingmi Asset Management Corp.
> > Lie Yang - Yahoo
> > Jay Kim - Yahoo
> > Shengwu Yang - Linkedin China
> >
> > == Sponsors ==
> >
> > === Champion ===
> >
> > Flavio Junqueira <fpj at apache dot org>
> >
> > === Nominated Mentors ===
> >
> > === Sponsoring Entity ===
> >
> > The Apache Incubator
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message