incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Barber <...@spicule.co.uk>
Subject Re: [VOTE] Accept Science Data Analytics Platform (SDAP) into Apache Incubator WAS Re: [DISCUSS] Accept Science Data Analytics Platform (SDAP) into Apache Incubator
Date Tue, 17 Oct 2017 21:14:18 GMT
+1

Happy to help out if the vote passes.

On Tue, Oct 17, 2017 at 10:07 PM, Madhawa Kasun Gunasekara <
madhawa30@gmail.com> wrote:

> Here is my +1
>
> Thanks,
> Madhawa
>
> Madhawa
>
> On Tue, Oct 17, 2017 at 4:04 PM, lewis john mcgibbney <lewismc@apache.org>
> wrote:
>
> > Hi Folks,
> > Having secured a mentorship team consisting of the following IPMC
> Members,
> > I am happy to open a formal VOTE thread on accepting the Science Data
> > Analytics Platform (SDAP) into Apache Incubator.
> >
> >    - Lewis John McGibbney (lewismc@apache.org)
> >    - Raphael Bircher (bircher at apace dot org)
> >    - Suneel Marthi (smarthi at apache dot org)
> >
> > Thank you to both Raphael and Suneel for coming forward. :)
> > The VOTE will be open for at least 72 hours.
> >
> > [ ] +1 Accept Science Data Analytics Platform (SDAP) into Apache
> Incubator
> > [ ] +/-0 ... just because
> > [ ] -1 Do NOT Accept Science Data Analytics Platform (SDAP) into Apache
> > Incubator... because
> >
> > Thanks in advance to all participants.
> > Lewis
> >
> > P.S. Here is a binding +1 from me
> >
> > On Wed, Oct 11, 2017 at 11:22 AM, lewis john mcgibbney <
> lewismc@apache.org
> > >
> > wrote:
> >
> > > Hi Folks,
> > > I would like to open a DISCUSS thread on the topic of accepting the
> > > Science Data Analytics Platform (SDAP) <https://wiki.apache.org/
> > > incubator/SDAPProposal> Project into the Incubator.
> > > I am CC'ing Thomas Huang from NASA JPL who I have been working with to
> > > build community around a kick-ass set of software projects under the
> SDAP
> > > umbrella.
> > > At this stage we would very much appreciate critical feedback from
> > general@
> > > community. We are also open to mentors who may have an interest in the
> > > project proposal.
> > > The proposal is pasted below.
> > > Thanks in advance,
> > > Lewis
> > >
> > > = Abstract =
> > > The Science Data Analytics Platform (SDAP) establishes an integrated
> data
> > > analytic center for Big Science problems. It focuses on technology
> > > integration, advancement and maturity.
> > >
> > > = Proposal =
> > > SDAP currently represents a collaboration between NASA Jet Propulsion
> > > Laboratory (JPL), Florida State University (FSU), the National Center
> for
> > > Atmospheric Research (NCAR), and George Mason University (GMU). SDAP
> > brings
> > > together a number of big data technologies including a NASA funded
> > > OceanXtremes (Anomaly detection and ocean science), NEXUS (Deep data
> > > analytic platform), DOMS (Distributed in-situ to satellite matchup),
> > MUDROD
> > > (Search relevancy and discovery) and VQSS (Virtualized Quality
> Screening
> > > Service) under a single umbrella. Within the original Incubator
> proposal,
> > > VQSS will not be included however it is anticipated that a future
> source
> > > code donation will cover VQSS.
> > >
> > > = Background and Rationale =
> > > SDAP is a technology software solution currently geared to better
> enable
> > > scientists involved in advancing the study of the Earth's physical
> > > oceanography. With increasing global temperature, warming of the ocean,
> > and
> > > melting ice sheets and glaciers, the impacts can be observed from
> changes
> > > in anomalous ocean temperature and circulation patterns, to increasing
> > > extreme weather events and stronger/more frequent hurricanes, sea level
> > > rise and storm surges affecting coastlines, and may involve drastic
> > changes
> > > and shifts in marine ecosystems. Ocean science communities are relying
> on
> > > data distributed through data centers such as the JPL's Physical
> > > Oceanographic Data Active Archive Center (PO.DAAC) to conduct their
> > > research. In typical investigations, oceanographers follow a
> traditional
> > > workflow for using datasets: search, evaluate, download, and apply
> tools
> > > and algorithms to look for trends. While this workflow has been working
> > > very well historically for the oceanographic community, it cannot scale
> > if
> > > the research involves massive amount of data. NASA's Surface Water and
> > > Ocean Topography (SWOT) mission, scheduled to launch in April of 2021,
> is
> > > expected to generate over 20PB data for a nominal 3-year mission. This
> > will
> > > challenge all existing NASA Earth Science data archival/distribution
> > > paradigms. It will no longer be feasible for Earth scientists to
> download
> > > and analyze such volumes of data. SDAP was therefore developed
> primarily
> > as
> > > a Web-service platform for big ocean data science at the PO.DAAC with
> > open
> > > source solutions used to enable fast analysis of oceanographic data.
> SDAP
> > > has been developed collaboratively between JPL, FSU, NCAR, and GMU and
> is
> > > rapidly maturing to become the generic platform for the next generation
> > of
> > > big science data solutions. The platform is an orchestration of several
> > > previously funded NASA big ocean data solutions using cloud technology,
> > > which include data analysis (NEXUS), anomaly detection (OceanXtremes),
> > > matchup (DOMS), subsetting, discovery (MUDROD), and visualization
> (VQSS).
> > > SDAP will enable web-accessible, fast data analysis directly on huge
> > > scientific data archives to minimize data movement and provide access,
> > > including subset, only to the relevant data.
> > >
> > > = Science Data Analytics Platform Project Overview =
> > > SDAP consists of several loosely coupled, independently functioning
> > > sub-projects. The graphic below displays an overview of how these
> > > sub-projects fuse together. N.B., although the graphic uses terminology
> > > relating to OceanWorks, essentially the SDAP architecture is identical.
> > >
> > > {{attachment:sdap.png}}
> > >
> > > == OceanXtremes ==
> > > Oceanographic Data-Intensive Anomaly Detection and Analysis Portal. An
> > > application that allows you to view imagery and perform analysis on sea
> > > level rise data.
> > >
> > > '''Objective'''
> > > Develop an anomaly detection system which identifies items, events or
> > > observations which do not conform to an expected pattern.
> > >  * Mature and test domain-specific, multi-scale anomaly and feature
> > > detection algorithms.
> > >  * Identify unexpected correlations between key measured variables.
> > >
> > > Demonstrate value of technologies in this service:
> > >  * Adapted Map-Reduce data mining.
> > >  * Algorithm profiling service.
> > >  * Shared discovery and exploration search tools.
> > >  * Automatic notification of events of interest.
> > >
> > > == NEXUS ==
> > > NEXUS is an emerging technology developed at JPL
> > >  * A Cloud-based/Cluster-based data platform that performs scalable
> > > handling of observational parameters analysis designed to scale
> > horizontally
> > >  * Leveraging high-performance indexed, temporal, and geospatial search
> > > solution
> > >  * Breaks data products into small chunks and stores them in a
> > Cloud-based
> > > data store
> > >
> > > ''Data Volumes Exploding''
> > >  * SWOT mission is coming
> > >  * File I/O is slow
> > >
> > > ''Scalable Store & Compute is Available''
> > >  * NoSQL cluster databases
> > >  * Parallel compute, in-memory map-reduce
> > >  * Bring Compute to Highly-Accessible Data (using Hybrid Cloud)
> > >
> > > ''Pre-Chunk and Summarize Key Variables''
> > >  * Easy statistics instantly (milliseconds)
> > >  * Harder statistics on-demand (in seconds)
> > >  * Visualize original data (layers) on a map quickly
> > >
> > > == DOMS ==
> > > The Distributed Oceanographic Match-Up Service
> > > DOMS is designed to reconcile satellite and in situ datasets in support
> > of
> > > NASA's Earth Science mission. The service will provide a mechanism for
> > > users to input a series of geospatial references for satellite
> > observations
> > > and receive the in situ observations that are matched to the satellite
> > data
> > > within a selectable temporal and spatial domain. DOMS includes several
> > > characteristic in situ and satellite observation datasets - with an
> > initial
> > > focus on salinity, sea temperature, and winds. DOMS will be used by the
> > > marine and satellite research communities to support a range of
> > activities
> > > and several use cases will be described. The service is designed to
> > provide
> > > a community-accessible tool that dynamically delivers matched data and
> > > allows the scientist to only work with the subset of data where the
> > matches
> > > exist.
> > >
> > > == MUDROD ==
> > > Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to
> > > Improve Data Discovery and Access
> > > Data discovery accuracy is a challenging topic for both Earth science
> and
> > > other domains. It is especially true for scientific data sets that are
> > not
> > > as popular as Amazon or Google data. MUDROD is focused on mining
> oceanic
> > > knowledge from the PO.DAAC user log files to improve the end user data
> > > discovery experience at PO.DAAC. There are three steps in the research:
> > a)
> > > the oceanographic semantics were extracted from three resources of
> SWEET,
> > > GCMD ontology, and the keywords used by end users for searching PO.DAAC
> > > datasets, b) mining the linkage among different vocabularies based on
> > user
> > > data discvoery sessions, and c) build the linkage among vocabularies
> > based
> > > on a comprehensive approach by considering domain de facto standard,
> > e.g.,
> > > SWEET and GCMD, and the knowledge mined from the log files. The
> semantics
> > > is used to improve data discovery for ranking results, navigating among
> > > vocabularies, and recommending data based on user searchers.
> > >
> > > = Current Status =
> > > All components of SDAP were originally designed and developed under
> > grants
> > > from the NASA-funded Advanced Information Systems and Technologies
> (AIST)
> > > program. The initiative to bring them the components together under the
> > > SDAP umbrella was granted through an AIST-funded follow-on grant which
> > will
> > > run for another ~18 or so months.
> > > Currently no projects have made official releases so outside of
> community
> > > building, this will be our primary Incubating goal. All SDAP source
> code
> > is
> > > currently publicly available and licensed under the ALv2.0.
> > >
> > > = Meritocracy =
> > > The current developers are familiar with meritocratic open source
> > > development at Apache. The SDAP team consumes Apache products heavily
> > with
> > > members being part of several Apache user communities. SDAP itself has
> > > critical dependencies upon Apache products. Lewis McGibbney (JPL
> > employee),
> > > a Member of the ASF and V.P. of Apache Any23, Gora PMC Nutch, Tika,
> OODT,
> > > OCW, etc., is championing the effort to bring SDAP into and through the
> > > Apache Incubator and has been evangelizing the Apache Way to the
> current
> > > SDAP contributors such that the meritocratic process is well understood
> > and
> > > followed. Apache was chosen specifically because we want to encourage
> > this
> > > style of community development for the project and for it to sustain
> SDAP
> > > forward to become the generic platform for the next generation of big
> > > science data solutions
> > >
> > > = Community =
> > > The SDAP project is a fairly new effort and our community is not yet
> > > fully/firmly established. Initial committers comprising the SDAP roster
> > > have only recently fully come together as a unified team however there
> > is a
> > > large degree of synergy between constituent members at JPL, FSU, NCAR,
> > and
> > > GMU. Therefore, community building and publicity continues to be a
> major
> > > thrust. With the activity and exposure regularly attained by several
> > > community members, we hope to grow the SDAP presence in and across
> > several
> > > (scientific) forums. The SDAP technology is generating interest within
> > > communities such as the Earth Science Information Partnership (ESIP),
> > > American Geophysical Union (AGU) and plethora or science meetings
> around
> > > the globe. This in effect, we hope, will further contribute towards the
> > > possibility of SDAP being used across Government Agencies such as NASA,
> > > NOAA, USGS, EPA, DOI, etc. as well as by researchers and students in
> > > academic institutions around the globe.
> > > During incubation, we will explicitly seek to increase our adoption,
> with
> > > SDAP already being featured on the agenda for several high profile
> > globally
> > > significant scientific conferences and meetings.
> > >
> > > = Core Developers =
> > > The current set of core developers is relatively small, including
> > > full-time and students from across JPL, FSU, NCAR, and GMU. Initial
> > > community management and participation will be distributed across the
> > > entire team, most of which have been involved with the constituent
> > projects
> > > for <2 years.
> > >
> > > = Alignment =
> > > All SDAP code is licensed under Apache v2.0.
> > >
> > > = Known Risks =
> > >
> > > == Orphaned products ==
> > > There are currently no orphaned products. Each component of SDAP has
> > > dedicated personnel leading and participating in its ongoing
> development.
> > > Additionally, there is substantial collaboration between projects
> > > facilitated by regular project meetings which are specific the the
> > initial
> > > member entities and focused on advancing physical oceanographic
> science.
> > >
> > > == Inexperience with Open Source ==
> > > JPL (in particular Lewis McGibbney) has been part of several efforts to
> > > transition to and grow projects communities at Apache e.g. Apache OODT,
> > > Apache Open Climate Workbench, Apache Joshua (Incubating), Apache
> > SensSoft
> > > (Incubating), Apache DRAT (Incubating). Most of the code developed
> under
> > > the SDAP umbrella was and is open source prior to the Incubator effort
> so
> > > we are well familiarized with the nuances of open source software.
> > >
> > > = Relationships with Other Apache Products =
> > > SDAP has strong dependency upon a number of high profile and smaller
> > > profile Apache products. Examples can be seen in the breakdown of
> > External
> > > Dependencies. As we continue to grow SDAP within the Incubator, we will
> > > make efforts to share community stories, software advancements and
> > possible
> > > improvements in our use of our Apache dependencies back to those
> project
> > > communities.
> > >
> > > = Developers =
> > > The SDAP project and hence developers is currently funded through a
> NASA
> > > AIST follow-on grant with funding secured for the next ~18 months.
> There
> > > are currently no 100% time dedicated developers, however, the same core
> > > team that does work currently will continue to work on the project
> > > throughout the next current funding period and after. There is
> currently
> > no
> > > business strategy aligned with SDAP however it is perceived that
> future,
> > > yet unsecured funding may by directed to further feature advancement
> and
> > > project evangelism.
> > >
> > > = Documentation =
> > > Documentation is currently available in a number of locations e.g.
> Github
> > > wiki, Github pages, etc. with each repository under the oceanworks-aist
> > > Github Org maintaining documentation available through wiki’s attached
> to
> > > the repositories. Additionally, most of the SDAP sub-projects have been
> > > extensively documented within plethora of formal academic publications
> > > across several academic communities. It would be our intention,
> certainly
> > > atleast to unify the Github wiki ad Github pages documentation most
> > likely
> > > to make up the sdap.apache.org Website content.
> > >
> > > = Initial Source =
> > > Current source resides in several locations Github:
> > >  * https://github.com/dataplumber/nexus (NEXUS, OceanXtremes, DOMS)
> > >  * https://github.com/dataplumber/edge (EDGE)
> > >  * https://github.com/aist-oceanworks/mudrod (MUDROD)
> > >  * https://bitbucket.org/coaps_mdc/doms/src (DOMS)
> > >
> > > = External Dependencies =
> > > Each component of the Science Data Analytics Platform has its own
> > > dependencies. Documentation will be available for integrating them.
> > >
> > > == MUDROD ==
> > > '''Core'''
> > > com.google.code.gson gson 2.5 compile
> > > jar false
> > > org.jdom jdom 2.0.2 compile
> > > jar false
> > > org.elasticsearch elasticsearch 5.2.0 compile
> > > jar false
> > > org.elasticsearch elasticsearch-spark-20_2.11 5.2.0 compile
> > > jar false
> > > joda-time joda-time 2.9.4 compile
> > > jar false
> > > com.carrotsearch hppc 0.7.1 compile
> > > jar false
> > > org.apache.spark spark-core_2.11 2.1.0 compile
> > > jar false
> > > org.apache.spark spark-sql_2.11 2.1.0 compile
> > > jar false
> > > org.apache.spark spark-mllib_2.11 2.1.0 compile
> > > jar false
> > > org.scala-lang scala-library 2.11.8 compile
> > > jar false
> > > org.codehaus.jettison jettison 1.3.8 compile
> > > jar false
> > > commons-cli commons-cli 1.2 compile
> > > jar false
> > > net.sf.opencsv opencsv 2.3 compile
> > > jar false
> > > org.apache.jena jena-core 3.3.0 compile
> > > jar false
> > > junit junit 4.12 test
> > > jar false
> > >
> > > '''Service'''
> > > gov.nasa.jpl.mudrod mudrod-core 0.0.1-SNAPSHOT compile
> > > jar false
> > > javax.servlet javax.servlet-api 3.1.0 provided
> > > jar false
> > > com.google.code.gson gson 2.5 compile
> > > jar false
> > >
> > > '''Web'''
> > >  * AngularJS - MIT License
> > >  * BootstrapJS - MIT License
> > >  * jQueryJS - MIT License
> > >  * Underscore JS - MIT License
> > >
> > > == DOMS ==
> > >  * Apache Solr version 5.5.1http://lucene.apache.org/solr/
> > >  * EDGE https://github.com/dataplumber/edge
> > >  * NetCDF4 http://unidata.github.io/netcdf4-python/
> > >  * Python 3.5 (NOTE: only partial support for py2.7)
> > >
> > > Non stdlib Python dependencies:
> > >  * Jinja2==2.9.5
> > >  * python-dateutil==2.6.0
> > >  * cython==0.25.2
> > >  * numpy==1.12.0
> > >  * scipy==0.18.1
> > >  * netCDF4==1.2.7
> > >  * solrpy3
> > >  * siphon==0.4.0
> > >  * neo4j-driver==1.1.0
> > >  * matplotlib==2.0.0
> > >  * requests==2.13.0
> > >  * shapely==1.5.17
> > >  * flask==0.12
> > >  * networkx==1.11
> > >  * pyproj==1.9.5.1
> > >  * blist==1.3.6
> > >
> > > == NEXUS ==
> > > '''Analysis'''
> > >  * https://github.com/dataplumber/nexus/blob/master/
> > > analysis/package-list.txt
> > >  * https://github.com/dataplumber/nexus/blob/master/
> > > analysis/requirements.txt
> > >
> > > '''Client'''
> > >  * https://github.com/dataplumber/nexus/blob/master/
> > > client/requirements.txt
> > >
> > > '''Climatology'''
> > >  * matplotlib
> > >  * numpy
> > >  * netCDF4
> > >  * pathos (https://pypi.python.org/pypi/pathos)
> > >
> > > '''Data-access'''
> > >  * https://github.com/dataplumber/nexus/blob/master/
> > > data-access/requirements.txt
> > >
> > > '''Nexus-ingest'''
> > > ''Dataset-tiler''
> > >  * https://github.com/dataplumber/nexus/tree/master/
> > > nexus-ingest/dataset-tiler/build/reports
> > >
> > > ''developer-box''
> > >  * Just a collection of scripts/vagrant file used to stand up a
> developer
> > > instance of nexus ingestion. No dependencies to report
> > >
> > > ''Groovy-scripts''
> > >  * Collection of Groovy scripts that can be used as part of data
> > > ingestion. They only rely on the standard Groovy library and the
> > > ‘nexus-messages’ project
> > >
> > > ''Nexus-messages''
> > >  * https://github.com/dataplumber/nexus/tree/master/
> > > nexus-ingest/nexus-messages/build/reports
> > >
> > > ''nexus-sink''
> > >  * https://github.com/dataplumber/nexus/tree/master/
> > > nexus-ingest/nexus-sink/build/reports
> > >
> > > ''nexus-xd-python-modules''
> > >  * https://github.com/dataplumber/nexus/blob/master/
> > > nexus-ingest/nexus-xd-python-modules/package-list.txt
> > >  * https://github.com/dataplumber/nexus/blob/master/
> > > nexus-ingest/nexus-xd-python-modules/requirements.txt
> > >
> > > ''spring-xd-python''
> > >  * only python standard libraries are used
> > >
> > > ''tcp-shell''
> > >  * https://github.com/dataplumber/nexus/tree/master/
> > > nexus-ingest/tcp-shell/build/reports
> > >
> > > '''tools/deletebyquery'''
> > >  * https://github.com/dataplumber/nexus/blob/master/
> tools/deletebyquery/
> > > requirements.txt
> > >
> > > = Required Resources =
> > > Mailing Lists
> > >  * private@sdap.incubator.apache.org
> > >  * dev@sdap.incubator.apache.org
> > >  * commits@sdap.incubator.apache.org
> > >
> > > Git Repos
> > >  * https://git-wip-us.apache.org/repos/asf/incubator-nexus.git
> > >  * https://git-wip-us.apache.org/repos/asf/incubator-doms.git
> > >  * https://git-wip-us.apache.org/repos/asf/incubator-mudrod.git
> > >
> > > Issue Tracking
> > >  * JIRA Science Data Analytics Platform (SDAP)
> > >
> > > Continuous Integration
> > >  * Jenkins builds on https://builds.apache.org/
> > >
> > > Web
> > >  * http://sdap.incubator.apache.org/
> > >  * wiki at http://cwiki.apache.org
> > >
> > > = Initial Committers =
> > > The following is a list of the planned initial Apache committers (the
> > > active subset of the committers for the current repository on Github).
> > >  * Lewis John McGibbney (lewismc@apache.org)
> > >  * Vardis M. Tsontos (vardis.m.tsontos@jpl.nasa.gov)
> > >  * Joseph C. Jacob (Joseph.C.Jacob@jpl.nasa.gov)
> > >  * Ed Armstrong (edward.m.armstrong@jpl.nasa.gov)
> > >  * Frank Greguska (greguska@jpl.nasa.gov)
> > >  * Brian Wilson (brian.wilson@jpl.nasa.gov)
> > >  * Chaowe Phil Yang (cyang3@gmu.edu)
> > >  * Yongyao Jiang (yjiang8@gmu.edu)
> > >  * Yun Li (yli38@gmu.edu)
> > >  * Shawn R. Smith (smith@coaps.fsu.edu)
> > >  * Jocelyn Elya (jelya@coaps.fsu.edu)
> > >  * Mark Bourassa (bourassa@coaps.fsu.edu)
> > >  * Thomas Cram (tcram@ucar.edu)
> > >  * Thomas Huang (thomas.huang@jpl.nasa.gov)
> > >  * Steven Worley (worley@ucar.edu)
> > >  * Zaihua Ji (zji@ucar.edu)
> > >
> > > = Affiliations =
> > > NASA JPL
> > >  * Lewis John McGibbney (lewismc@apache.org)
> > >  * Vardis M. Tsontos (vardis.m.tsontos@jpl.nasa.gov)
> > >  * Joseph C. Jacob (Joseph.C.Jacob@jpl.nasa.gov)
> > >  * Ed Armstrong (edward.m.armstrong@jpl.nasa.gov)
> > >  * Frank Greguska (greguska@jpl.nasa.gov)
> > >  * Thomas Huang (thomas.huang@jpl.nasa.gov)
> > >  * Brian Wilson (brian.wilson@jpl.nasa.gov)
> > >
> > > George Mason University
> > >  * Chaowe Phil Yang (cyang3@gmu.edu)
> > >  * Yongyao Jiang (yjiang8@gmu.edu)
> > >  * Yun Li (yli38@gmu.edu)
> > >
> > > Center for Ocean-Atmospheric Prediction Studies, Florida State
> University
> > >  * Shawn R. Smith (smith@coaps.fsu.edu)
> > >  * Jocelyn Elya (jelya@coaps.fsu.edu)
> > >  * Mark Bourassa (bourassa@coaps.fsu.edu)
> > >
> > > Computational Information Systems Laboratory (CISL) / National Center
> for
> > > Atmospheric Research (NCAR)
> > >  * Thomas Cram (tcram@ucar.edu)
> > >  * Zaihua Ji (zji@ucar.edu)
> > >  * Steven Worley (worley@ucar.edu)
> > >
> > > = Sponsors =
> > >
> > > = Champion =
> > > * Lewis McGibbney (NASA/JPL)
> > >
> > > = Nominated Mentors =
> > >  * TBD
> > >  * TBD
> > >  * TBD
> > >
> > > = Sponsoring Entity =
> > > The Apache Incubator
> > >
> > >
> > > --
> > > http://home.apache.org/~lewismc/
> > > @hectorMcSpector
> > > http://www.linkedin.com/in/lmcgibbney
> > >
> >
> >
> >
> > --
> > http://home.apache.org/~lewismc/
> > @hectorMcSpector
> > http://www.linkedin.com/in/lmcgibbney
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message