incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: [PROPOSAL] Superset Proposal for Apache Incubator
Date Thu, 13 Apr 2017 16:59:26 GMT
Hi Jean-Baptiste,

We are indeed looking for more mentors.

Should I update the wiki and replace all references to PMC by PPMC?

Thanks,

Max

On Wed, Apr 12, 2017 at 12:51 PM, Jean-Baptiste Onofré <jb@nanthrax.net>
wrote:

> Hi Maxime,
>
> The proposal looks interesting.
>
> Just a note,  it's PPMC (not PMC) during incubation.
>
> Are you seeking for other mentor (I see you only have one mentor and one
> champion for now) ?
>
> Regards
> JB
>
>
> On 04/12/2017 09:41 PM, Maxime Beauchemin wrote:
>
>> Hi all,
>>
>> We would love feedback on the proposal. Do the veterans on this mailing
>> list think that the proposal is ready for a vote!?
>>
>> Thanks,
>>
>> Max
>>
>> On Tue, Apr 4, 2017 at 5:26 PM, Luke Han <luke.hq@gmail.com> wrote:
>>
>> Hi Jeff,
>>>     This is great project which have been mentioned many times in
>>> community. It looks cool and fun for data works.
>>>
>>>     Thanks to proposal Superset to be Apache Incubator Project, please
>>> let
>>> me know if there's anything I could help.
>>>
>>>     Thanks.
>>> Luke
>>>
>>>
>>> Best Regards!
>>> ---------------------
>>>
>>> Luke Han
>>>
>>> On Sun, Apr 2, 2017 at 7:45 AM, Jeff Feng <jeff.feng@airbnb.com.invalid>
>>> wrote:
>>>
>>> Dear Apache Incubator Community,
>>>>
>>>> We are excited to share our proposal for discussion and feedback for
>>>> entering Apache Incubation.  Superset is an enterprise-ready web
>>>> application for data exploration, data visualization and dashboarding.
>>>>
>>>> Our Incubation proposal is at the following Wiki as well as copied in
>>>> the
>>>> email below:
>>>>
>>>> https://wiki.apache.org/incubator/SupersetProposal
>>>>
>>>> We have an active Superset community including 400+ members and nearly
>>>>
>>> 200
>>>
>>>> topics.  The Google Group can be found below.  We plan to move the
>>>> discussion to the ASF:
>>>>
>>>> https://groups.google.com/forum/#!forum/airbnb_superset
>>>>
>>>> Thank you and look forward to the discussion!
>>>>
>>>> Jeff, Max & Alanna
>>>>
>>>>
>>>>
>>>>
>>>> = Superset =
>>>>
>>>> == Abstract ==
>>>>
>>>> Superset is an enterprise-ready web application for data exploration,
>>>>
>>> data
>>>
>>>> visualization and dashboarding.
>>>>
>>>> == Proposal ==
>>>>
>>>> Superset is business intelligence (BI) software that helps modern
>>>> organizations visualize and interact with their data. Superset enables
>>>> users explore data from a variety of databases, assemble beautiful
>>>> dashboards and share their findings.  Superset works neatly with all
>>>>
>>> modern
>>>
>>>> SQL-speaking databases, and integrates with Druid.io to provide
>>>>
>>> real-time,
>>>
>>>> interactive, blazing fast data access to large datasets.
>>>>
>>>> == Background ==
>>>>
>>>> Data is mission critical. To succeed in this era, organizations need to
>>>> provide low-friction, intuitive and interactive access to data. It is
>>>> paramount for knowledge workers to be capable of answering their own
>>>> questions by querying, exploring and visualizing data.
>>>>
>>>> The entire business intelligence industry has pivoted from a model of
>>>> centralized top-down platforms driven by IT organizations to
>>>> self-service
>>>> analytics and agile workflows by any user.  This shift unblocks
>>>>
>>> centralized
>>>
>>>> service bottlenecks for creating data visualizations while also creating
>>>>
>>> an
>>>
>>>> environment that is iterative and fast-moving.  This means that business
>>>> intelligence software must also be easy and delightful to use.
>>>> Self-service analytics doesn’t mean that admin and governance features
>>>>
>>> are
>>>
>>>> not needed.
>>>>
>>>> Modern BI tools provide fine-grain access controls and auditing
>>>> capabilities to understand how data is being used.  Superset is a
>>>>
>>> solution
>>>
>>>> that delivers on all of these vectors.
>>>>
>>>> The technology stack is also constantly morphing - vendors are
>>>> struggling
>>>> to provide cheap, quick and easy solutions to access data.  Business
>>>> intelligence users are finding existing solutions lacking as these
>>>>
>>> software
>>>
>>>> products either disregard or react slowly to recent game-changing
>>>> technologies like Druid.io, PrestoDB, Apache Drill, Apache Kylin, d3.js,
>>>> React.js and iPython’s Jupyter for instance.
>>>>
>>>> == Rationale ==
>>>>
>>>> Business intelligence is more relevant today than at any other point in
>>>> history.  Organizations are currently very limited in options for open
>>>> source data visualization solutions, especially solutions that are both
>>>> self-service and enterprise-ready.  Every company informing their
>>>>
>>> decisions
>>>
>>>> with data needs a BI tool.
>>>>
>>>> We believe that Superset will be a strong compliment to existing Apache
>>>> Software Foundation technologies by offering scalable user interactions
>>>>
>>> to
>>>
>>>> distributed storage and computation solutions.  Users will often find
>>>>
>>> that
>>>
>>>> Superset can act as a catalyst for tooling that can visualize the
>>>>
>>> byproduct
>>>
>>>> of data and computation infrastructure.
>>>>
>>>> Superset has many key design elements that help fill a gap in current
>>>> solutions for organizations:
>>>>
>>>> * Easy, low friction access to data through a simple, web-based data
>>>> exploration interface.  Composing charts and dashboards are intuitive.
>>>> Eliminating the need to write code or SQL empowers anyone to use it.
>>>>
>>>> * Access to a wide array of rich, interactive data visualization types.
>>>>
>>>> * Enterprise-ready: Integration with different authentication mechanisms
>>>> and granular permissions centered around actions and data access.
>>>>
>>>> * Realtime & fast: Superset provides realtime analytics at the speed
of
>>>> thought on very large datasets when integrated with Druid.io.
>>>>
>>>> * Broad data access: Consume data out of any SQL-speaking relational
>>>> database.
>>>>
>>>> * Extensible: Can be extended to talk to many noSQL databases like
>>>> Apache
>>>> Drill, Elastic Search, and other popular database engines.
>>>>
>>>> * Fast loading dashboards with configurable web-scale caching.
>>>>
>>>> * Plug-in framework that enables organizations to build custom
>>>> analytical
>>>> applications with new UI/UX interfaces.
>>>>
>>>> * SQL Lab, a state-of-the-art SQL IDE that empowers SQL-speaking users
>>>>
>>> with
>>>
>>>> more flexibility.  SQL Lab integrates with the visualization engine
>>>> seamlessly.
>>>>
>>>> == Initial Goals ==
>>>>
>>>> The initial goals of the Superset project are several-fold:
>>>>
>>>> Move the existing codebase to Apache and integrate with the Apache
>>>> development process.
>>>>
>>>> Redesign the user interface and interaction model for creating
>>>> visualizations/dashboards and connecting to data sources
>>>>
>>>> Build robust support for security and governance of the tool including
>>>> popular authorization modules (including Apache Ranger and Apache
>>>> Sentry)
>>>> and a more sophisticated permissions system
>>>>
>>>> Grow the extensibility of the project both in terms of enhanced
>>>> connectivity to NoSQL-based data sources and creating a plug-in
>>>> framework
>>>> that enables organizations to build custom analytical applications which
>>>> require a new UI/UX
>>>>
>>>> == Current Status ==
>>>>
>>>> By many standards, Superset is already a successful open source project.
>>>>
>>> As
>>>
>>>> of March 2017, Superset is officially used in production at about a
>>>> dozen
>>>> companies, has received contributions from over one hundred contributors
>>>>
>>> on
>>>
>>>> Github, 1500+ forks, and 12k+ stars.
>>>>
>>>> Sizeable companies like Airbnb, Yahoo! and Hortonworks have made
>>>> significant contributions, and expressed their commitment to the
>>>> project.
>>>> The product is feature complete and has been viable for months. It
>>>>
>>> already
>>>
>>>> serves as the main interface for consuming data at many companies of
>>>> different sizes.
>>>>
>>>> While the product is usable, there’s room for improvement across the
>>>>
>>> board,
>>>
>>>> starting with providing a smoother user experience around content
>>>>
>>> creation,
>>>
>>>> making sure all features work out-of-the-box on more platforms and
>>>> databases, providing better user training guides and videos, having a
>>>> predictable release process, and increasing the overall quality of the
>>>> Superset releases.
>>>>
>>>> === Meritocracy ===
>>>>
>>>> We plan to invest in supporting a meritocracy. We will discuss the
>>>> requirements in an open forum. Several companies have expressed interest
>>>>
>>> in
>>>
>>>> this project, and we intend to invite additional developers to
>>>>
>>> participate.
>>>
>>>> We will encourage and monitor community participation so that privileges
>>>> can be extended to those that contribute.
>>>>
>>>> === Community ===
>>>>
>>>> The need for an enterprise-ready data visualization and exploration
>>>> platform in the open source community is tremendous.  While Superset is
>>>> fairly well known, recognized and used within the Druid.io community,
>>>> adoption is currently limited outside of that niche. There is a huge
>>>> opportunity to grow the community to hundreds if not thousands of
>>>> organizations, and we are hoping that embracing “the Apache way” will
>>>> accelerate the growth of our community.
>>>>
>>>> We have already been active at seeking and inviting contributions, and
>>>>
>>> are
>>>
>>>> planning to scale the project by investing time and growing the support
>>>> structure to grow the community.
>>>>
>>>> === Core Developers ===
>>>>
>>>> The initial committers for Superset include experienced full stack,
>>>> front-end and data engineers:
>>>>
>>>> * Maxime Beauchemin (Airbnb)
>>>>
>>>> * Alanna Scott (Airbnb)
>>>>
>>>> * Bogdan Kyryliuk (Airbnb)
>>>>
>>>> * Vera Liu  (Airbnb)
>>>>
>>>> * Jeff Feng (Airbnb)
>>>>
>>>> * Ashutosh Chauhan (Hortonworks)
>>>>
>>>> * Nishant Bangarwa (Hortonworks)
>>>>
>>>> * Slim Bouguerra (Hortonworks)
>>>>
>>>> * Priyank Shah (Hortonworks)
>>>>
>>>> * Sriharsha Chintalapani (Hortonworks)
>>>>
>>>> * Daniel Dai (Hortonworks)
>>>>
>>>> We realize that additional employer diversity is needed, and we will
>>>> work
>>>> aggressively to recruit developers from additional companies.
>>>>
>>>> === Alignment ===
>>>>
>>>> The initial committers strongly believe that a system for interactive
>>>> visualization of data will gain broader adoption as an open source,
>>>> community driven project, where the community can contribute not only to
>>>> the core components, but also to a growing collection of connectors,
>>>> visualizations and improving integration a all potential data sources.
>>>> Superset already integrates closely with Apache Hive, the Hive
>>>> metastore,
>>>> as well as most SQL-speaking databases found in modern data ecosystems.
>>>>
>>>> == Known Risks ==
>>>>
>>>> === Orphaned Products ===
>>>>
>>>> Superset is a vital component for both visualizing, accessing and
>>>> democratizing data at Airbnb.  Also at Hortonworks, Superset is a core
>>>> component of the DataFlow product offering.  Thus, the risk of the
>>>>
>>> project
>>>
>>>> being orphaned is relatively low.  The project could be at risk if
>>>> Airbnb
>>>> changes their approach for democratizing data or if Hortonworks changes
>>>> their strategy in the market.  In such an event, the committers plan to
>>>> continue working on the project on their own time, thought the progress
>>>> will likely be slower.  We plan to mitigate this risk by recruiting
>>>> additional committers.
>>>>
>>>> === Inexperience with Open Source ===
>>>>
>>>> The initial committers include veteran Apache members (committers and
>>>> PMC
>>>> members) and other developers who have varying degrees of experience
>>>> with
>>>> open source projects. All have been involved with source code that has
>>>>
>>> been
>>>
>>>> released under an open source license, and several also have experience
>>>> developing code with an open source development process.
>>>>
>>>> === Homogenous Developers ===
>>>>
>>>> The initial committers are employed by Airbnb Inc., and Hortonworks. We
>>>>
>>> are
>>>
>>>> committed to recruiting additional committers from other companies.
>>>>
>>>> === Reliance on Salaried Developers ===
>>>>
>>>> It is expected that Superset development will occur on both salaried
>>>> time
>>>> and on volunteer time, after hours. The majority of initial committers
>>>>
>>> are
>>>
>>>> paid by their employer to contribute to this project. However, they are
>>>>
>>> all
>>>
>>>> passionate about the project, and we are confident that the project will
>>>> continue even if no salaried developers contribute to the project. We
>>>> are
>>>> committed to recruiting additional committers including non-salaried
>>>> developers.
>>>>
>>>> === Relationships with Other Apache Products ===
>>>>
>>>> To the knowledge of the Initial Committers, there are no direct
>>>>
>>> competitors
>>>
>>>> to Superset within the Apache Software Foundation.  That said, Apache
>>>> Zeppelin is an indirect competitor, but it solves a different use case.
>>>>
>>>> Apache Zeppelin is a web-based notebook that enables interactive data
>>>> analytics. It enables the creation of beautiful data-driven, interactive
>>>> and collaborative documents with SQL, Scala and more.  Although a user
>>>>
>>> can
>>>
>>>> create data visualizations using this project, it leverages a notebook
>>>> style user interfaces and it is geared towards the Spark community where
>>>> Scala and SQL co-exist
>>>>
>>>> We look forward to collaborating with those communities, as well as
>>>> other
>>>> Apache communities.
>>>>
>>>> === An Excessive Fascination with the Apache Brand ===
>>>>
>>>> Superset is solving two huge challenges:
>>>>
>>>> The challenge of enabling every knowledge worker to make data informed
>>>> decisions, particularly those who are not deeply skilled at writing SQL.
>>>>
>>>> The challenge of visualizing huge amounts of data interactively and in
>>>> real-time
>>>>
>>>> Superset was first developed as a data visualization solution for
>>>>
>>> Druid.io
>>>
>>>> as a way to visualize billions of rows of data.  Since then, usage of
>>>> Superset has expanded to address data visualization use cases across SQL
>>>> speaking data sources as well.
>>>>
>>>> Our rationale for developing Superset as an Apache project is detailed
>>>> in
>>>> the Rationale Section.  We believe that the Apache brand and community
>>>> process will help us attract more contributors to this project, and help
>>>> grow the footprint of the project through usage at other organizations
>>>>
>>> and
>>>
>>>> within other applications.  Establishing consensus among users and
>>>> developers will result in a more valuable tool for everyone.
>>>>
>>>> == Documentation ==
>>>>
>>>> References to further reading material:
>>>>
>>>> * [[http://airbnb.io/superset/|Superset Documentation]]
>>>>
>>>> * [[https://medium.com/airbnb-engineering/caravel-airbnb-s-dat
>>>> a-exploration-platform-15a72aa610e5#.npqmmbu25|Blog Post:  Superset:
>>>> Airbnb’s Data Exploration Platform]]
>>>>
>>>> * [[https://medium.com/airbnb-engineering/superset-scaling-dat
>>>> a-access-and-visual-insights-at-airbnb-3ce3e9b88a7f#.a505zvb1t|Blog
>>>>
>>> Post:
>>>
>>>>  Superset: Scaling Data Access & Visual Insights at Airbnb]]
>>>>
>>>> == Initial Source ==
>>>>
>>>> The origin of the proposed code base can be found at
>>>> https://github.com/airbnb/superset.  The code base is primarily in
>>>>
>>> Python.
>>>
>>>>
>>>> == Source and Intellectual Property Submission Plan ==
>>>>
>>>> We do not expect any complications for the submission of the Superset
>>>>
>>> code
>>>
>>>> base.  Our code is already in Github and there is only a single code
>>>>
>>> base.
>>>
>>>>
>>>> == External Dependencies ==
>>>>
>>>> List of Python packages, from the Python Package Index (Pypi):
>>>>
>>>> * boto3
>>>>
>>>> * celery
>>>>
>>>> * cryptography
>>>>
>>>> * flask-appbuilder
>>>>
>>>> * flask-cache
>>>>
>>>> * flask-migrate
>>>>
>>>> * flask-script
>>>>
>>>> * flask-sqlalchemy
>>>>
>>>> * flask-testing
>>>>
>>>> * humanize
>>>>
>>>> * gunicorn
>>>>
>>>> * markdown
>>>>
>>>> * pandas
>>>>
>>>> * parsedatetime
>>>>
>>>> * pydruid
>>>>
>>>> * PyHive
>>>>
>>>> * python-dateutil
>>>>
>>>> * requests
>>>>
>>>> * simplejson
>>>>
>>>> * six
>>>>
>>>> * sqlalchemy
>>>>
>>>> * sqlalchemy-utils
>>>>
>>>> * sqlparse
>>>>
>>>> * thrift
>>>>
>>>> * thrift-sasl
>>>>
>>>> * werkzeug
>>>>
>>>> List of Javascript packages, from NPM:
>>>>
>>>> * autobind-decorator
>>>>
>>>> * bootstrap
>>>>
>>>> * bootstrap-datepicker
>>>>
>>>> * brace
>>>>
>>>> * brfs
>>>>
>>>> * cal-heatmap
>>>>
>>>> * classnames
>>>>
>>>> * d3
>>>>
>>>> * d3-cloud
>>>>
>>>> * d3-sankey
>>>>
>>>> * d3-scale
>>>>
>>>> * d3-tip
>>>>
>>>> * datamaps
>>>>
>>>> * datatables-bootstrap3-plugin
>>>>
>>>> * datatables.net-bs
>>>>
>>>> * font-awesome
>>>>
>>>> * gridster
>>>>
>>>> * immutability-helper
>>>>
>>>> * immutable
>>>>
>>>> * jquery
>>>>
>>>> * lodash.throttle
>>>>
>>>> * mapbox-gl
>>>>
>>>> * moment
>>>>
>>>> * moments
>>>>
>>>> * mustache
>>>>
>>>> * nvd3
>>>>
>>>> * react
>>>>
>>>> * react-ace
>>>>
>>>> * react-bootstrap
>>>>
>>>> * react-bootstrap-table
>>>>
>>>> * react-dom
>>>>
>>>> * react-draggable
>>>>
>>>> * react-gravatar
>>>>
>>>> * react-grid-layout
>>>>
>>>> * react-map-gl
>>>>
>>>> * react-redux
>>>>
>>>> * react-resizable
>>>>
>>>> * react-select
>>>>
>>>> * react-syntax-highlighter
>>>>
>>>> * reactable
>>>>
>>>> * redux
>>>>
>>>> * redux-localstorage
>>>>
>>>> * redux-thunk
>>>>
>>>> * shortid
>>>>
>>>> * style-loader
>>>>
>>>> * supercluster
>>>>
>>>> * topojson
>>>>
>>>> * victory
>>>>
>>>> * viewport-mercator-project
>>>>
>>>> == Cryptography ==
>>>>
>>>> The proposal does not include cryptographic code.
>>>>
>>>> == Required Resources ==
>>>>
>>>> === Mailing List ===
>>>>
>>>> There is a current mailing list as a Google Group “airbnb_superset” that
>>>>
>>> we
>>>
>>>> are planning on deprecating as the Apache.org become ready to serve our
>>>> community.
>>>>
>>>> * superset-private
>>>>
>>>> * superset-dev
>>>>
>>>> * superset-user
>>>>
>>>> === Subversion Directory ===
>>>>
>>>> Git is the preferred source control system.
>>>>
>>> http://svn.apache.org/repos/as
>>>
>>>> f/incubator/superset
>>>>
>>>> == Git Repository ==
>>>>
>>>> Git is the preferred source control system, we’re assuming
>>>> https://github.com/apache/incubator-superset based on the naming scheme
>>>>
>>>> == Issue Tracking ==
>>>>
>>>> JIRA Superset (SUPERSET). If possible, we’d like to use Github issues &
>>>>
>>> PRs
>>>
>>>> to manage our project as much as possible. It’s been said that there are
>>>> ways to keep Github’s issues in sync with Jira, allowing us to get best
>>>>
>>> of
>>>
>>>> both worlds. If that is not possible, we will comply to using Jira.
>>>>
>>>> == Other Resources ==
>>>>
>>>> We currently use a set of Github integrated services that are free to
>>>> the
>>>> open source community, like Travis-ci, Code Climate, Coveralls,
>>>> Landscape.io, Requires.io, david-dm and Gitter. We would like to keep
>>>>
>>> using
>>>
>>>> these services as they allow us to scale contributions and optimize our
>>>> development flows. These services require some elevated rights on the
>>>> Github repository in order to set up or tune and we would like for the
>>>> committers to have the required rights.
>>>>
>>>>
>>>> == Initial Committers ==
>>>>
>>>> * Maxime Beauchemin <maxime.beauchemin@airbnb.com> - PMC & Committer
>>>>
>>>> * Alanna Scott <alanna.scott@airbnb.com> - PMC & Committer
>>>>
>>>> * Bogdan Kyryliuk <b.kyryliuk@gmail.com> - PMC & Committer
>>>>
>>>> * Vera Liu <vera.liu@airbnb.com> - Committer
>>>>
>>>> * Jeff Feng <jeff.feng@airbnb.com> - PMC & Committer
>>>>
>>>> * Ashutosh Chauhan <hashutosh@apache.org> - Mentor & Committer
>>>>
>>>> * Nishant Bangarwa <nbangarwa@hortonworks.com> - PMC & Committer
>>>>
>>>> * Slim Bouguerra <sbouguerra@hortonworks.com> - Committer
>>>>
>>>> * Priyank Shah <pshah@hortonworks.com> - Committer
>>>>
>>>> * Harsha Chintalapani <schintalapani@hortonworks.com> - Committer
>>>>
>>>> * Daniel Dai <daijy@apache.org> - Champion & Committer
>>>>
>>>> == Affiliations ==
>>>>
>>>> The initial committers are employees of Airbnb Inc. and Hortonworks.
>>>>
>>>> == Sponsors ==
>>>>
>>>> === Champion ===
>>>>
>>>> Daniel Dai <daijy@apache.org>
>>>>
>>>> === Nominated Mentors ===
>>>>
>>>> Ashutosh Chauhan <hashutosh@apache.org>
>>>>
>>>> === Sponsoring Entity ===
>>>>
>>>> Incubator PMC
>>>>
>>>>
>>>> --
>>>>
>>>> *Jeff Feng*
>>>> Product Manager
>>>> m: (949)-610-5108 <(949)%20610-5108>
>>>> twitter: @jtfeng
>>>>
>>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message