incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <>
Subject Re: [PROPOSAL] Stratosphere
Date Mon, 07 Apr 2014 13:47:52 GMT
Thanks Sebastian, always love to see project from academic setting to be
materialized as an Apache project

On Monday, April 7, 2014, Sebastian Schelter <> wrote:

> You're very welcome to join as a mentor, Henry!
> On 04/06/2014 07:34 PM, Henry Saputra wrote:
> Hi Guys,
> The proposal looks great and I would love to help to sign up as a
> Mentor if you guys still have space for one.
> - Henry
> On Sun, Mar 30, 2014 at 12:14 AM, Alan Gates <>
> wrote:
> I would like to propose Stratosphere as an Apache Incubator project.  I
> have posted the proposal to
> incubator/StratosphereProposal and posted the text of the proposal below.
> Alan.
> = Stratosphere =
> == Abstract ==
> Stratosphere is an open source system for parallel data analysis.
> Stratosphere deeply integrates MapReduce and database technologies to
> provide expressive and optimizable programming interfaces and at the same
> time efficient and scalable execution.
> == Proposal ==
> Stratosphere is an open source system for expressive, declarative, fast,
> and efficient data analysis. Stratosphere combines the scalability and
> programming flexibility of distributed MapReduce-like platforms with the
> efficiency, out-of-core execution, and query optimization capabilities
> found in parallel databases.
> == Background ==
> There is currently a need for general-purpose cluster computing platforms
> that are compatible with the Hadoop ecosystem, are more efficient, easier
> to use, and can support more applications than Hadoop MapReduce, but are
> not restricted to a specific data model and language (such as the
> relational model and a variant of SQL). Stratosphere fulfils these needs.
> Stratosphere exposes expressive APIs in Java and Scala (conceptually
> similar to Spark, Cascading, Scalding) that allow arbitrary user-defined
> functions in the same language and data model that the program is written
> in. Stratosphere programs pass through a cost-based optimizer that finds
> the best execution path for these programs depending on the data and
> cluster characteristics. The design and implementation of Stratosphere is
> based on research that generalizes query optimizers in relational
> databases. Stratosphere has a distributed runtime that is architected upon
> the principles of parallel databases, providing true pipelining (a basis
> for stream processing) and efficient out-of-core algorithms for grouping,
> sorting, joining, and aggregating data. Stratosphere provides first-class
> support for iterative algorithms via a built-in iterate operator, covering
> Machine Learning and graph analysis use cases. It achieves performance
> similar to Apache Giraph without being a specialized gr
> a
> ph processing system.
> Stratosphere has undergone three major releases (v0.1, v0.2, v0.4) and
> some minor ones.
> == Rationale ==
> Stratosphere started out in 2008 as a research project by the Technical
> University of Berlin, the Humboldt University of Berlin, and the Hasso
> Plattner Institute, and has received subsequent funding from the German
> Research Council, the European Institute of Innovation and Technology, the
> European Commision, and industry.
> The traction of Stratosphere has by far exceeded our initial expectations,
> and we are therefore seeking an organizational long-term home for
> Stratosphere beyond the University walls that will house and further
> encourage contributors from companies and other organizations that are
> interested in Stratosphere. We believe that the Apache Software Foundation
> is the ideal home for Stratosphere. Stratosphere integrates with several
> existing Apache projects, such as HDFS, YARN, HBase, and Avro. The team is
> familiar with the Apache processes and fully subscribes to the Apache
> mission. One of the proposing members is a long-time Apache contributor and
> PMC member.
> == Initial Goals ==
>   * Move the existing codebase to Apache
>   * Integrate with the Apache development process
>   * Ensure all dependencies are compliant with Apache License version 2.0
>   * Incremental development and releases per Apache guidelines
> == Current Status ==
> === Meritocracy ===
> Stratosphere operated on meritocratic principles from the get go. The
> initial project proposal submitted to the German Research Council
> in 2008 stated that all code developed in the project will be released as
> open source under the Apache 2 license. Currently, all the
> discussions pertaining to Stratosphere development are public on [[
>|GitHub]]  and our [[<!forum/stratosphere-dev%7Cmailing>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message