incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <>
Subject Re: [PROPOSAL] Stratosphere
Date Mon, 07 Apr 2014 14:49:38 GMT
I added you to the mentors list in the proposal.

On 04/07/2014 03:47 PM, Henry Saputra wrote:
> Thanks Sebastian, always love to see project from academic setting to be
> materialized as an Apache project
> On Monday, April 7, 2014, Sebastian Schelter <> wrote:
>> You're very welcome to join as a mentor, Henry!
>> On 04/06/2014 07:34 PM, Henry Saputra wrote:
>> Hi Guys,
>> The proposal looks great and I would love to help to sign up as a
>> Mentor if you guys still have space for one.
>> - Henry
>> On Sun, Mar 30, 2014 at 12:14 AM, Alan Gates <>
>> wrote:
>> I would like to propose Stratosphere as an Apache Incubator project.  I
>> have posted the proposal to
>> incubator/StratosphereProposal and posted the text of the proposal below.
>> Alan.
>> = Stratosphere =
>> == Abstract ==
>> Stratosphere is an open source system for parallel data analysis.
>> Stratosphere deeply integrates MapReduce and database technologies to
>> provide expressive and optimizable programming interfaces and at the same
>> time efficient and scalable execution.
>> == Proposal ==
>> Stratosphere is an open source system for expressive, declarative, fast,
>> and efficient data analysis. Stratosphere combines the scalability and
>> programming flexibility of distributed MapReduce-like platforms with the
>> efficiency, out-of-core execution, and query optimization capabilities
>> found in parallel databases.
>> == Background ==
>> There is currently a need for general-purpose cluster computing platforms
>> that are compatible with the Hadoop ecosystem, are more efficient, easier
>> to use, and can support more applications than Hadoop MapReduce, but are
>> not restricted to a specific data model and language (such as the
>> relational model and a variant of SQL). Stratosphere fulfils these needs.
>> Stratosphere exposes expressive APIs in Java and Scala (conceptually
>> similar to Spark, Cascading, Scalding) that allow arbitrary user-defined
>> functions in the same language and data model that the program is written
>> in. Stratosphere programs pass through a cost-based optimizer that finds
>> the best execution path for these programs depending on the data and
>> cluster characteristics. The design and implementation of Stratosphere is
>> based on research that generalizes query optimizers in relational
>> databases. Stratosphere has a distributed runtime that is architected upon
>> the principles of parallel databases, providing true pipelining (a basis
>> for stream processing) and efficient out-of-core algorithms for grouping,
>> sorting, joining, and aggregating data. Stratosphere provides first-class
>> support for iterative algorithms via a built-in iterate operator, covering
>> Machine Learning and graph analysis use cases. It achieves performance
>> similar to Apache Giraph without being a specialized gr
>> a
>> ph processing system.
>> Stratosphere has undergone three major releases (v0.1, v0.2, v0.4) and
>> some minor ones.
>> == Rationale ==
>> Stratosphere started out in 2008 as a research project by the Technical
>> University of Berlin, the Humboldt University of Berlin, and the Hasso
>> Plattner Institute, and has received subsequent funding from the German
>> Research Council, the European Institute of Innovation and Technology, the
>> European Commision, and industry.
>> The traction of Stratosphere has by far exceeded our initial expectations,
>> and we are therefore seeking an organizational long-term home for
>> Stratosphere beyond the University walls that will house and further
>> encourage contributors from companies and other organizations that are
>> interested in Stratosphere. We believe that the Apache Software Foundation
>> is the ideal home for Stratosphere. Stratosphere integrates with several
>> existing Apache projects, such as HDFS, YARN, HBase, and Avro. The team is
>> familiar with the Apache processes and fully subscribes to the Apache
>> mission. One of the proposing members is a long-time Apache contributor and
>> PMC member.
>> == Initial Goals ==
>>    * Move the existing codebase to Apache
>>    * Integrate with the Apache development process
>>    * Ensure all dependencies are compliant with Apache License version 2.0
>>    * Incremental development and releases per Apache guidelines
>> == Current Status ==
>> === Meritocracy ===
>> Stratosphere operated on meritocratic principles from the get go. The
>> initial project proposal submitted to the German Research Council
>> in 2008 stated that all code developed in the project will be released as
>> open source under the Apache 2 license. Currently, all the
>> discussions pertaining to Stratosphere development are public on [[
>>|GitHub]]  and our [[<!forum/stratosphere-dev%7Cmailing>

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message