incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Barber <tom.bar...@meteorite.bi>
Subject Re: [VOTE] Accept Joshua as an Apache Incubator Podling
Date Fri, 12 Feb 2016 19:57:02 GMT
You're making the presumption its passed its vote! ;)

On Fri, Feb 12, 2016 at 7:33 PM, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Yep, will send a result shortly.
>
> Lewis, after that, can you help me get the podling bootstrap tasks
> started?
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
> -----Original Message-----
> From: Lewis John Mcgibbney <lewis.mcgibbney@gmail.com>
> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>
> Date: Friday, February 12, 2016 at 11:31 AM
> To: "general@incubator.apache.org" <general@incubator.apache.org>
> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
>
> >Hi Chris,
> >Is it time to close out this VOTE and bring Joshua on board?
> >Lewis
> >
> >On Wed, Feb 3, 2016 at 4:01 PM, <general-digest-help@incubator.apache.org
> >
> >wrote:
> >
> >>
> >> From: Danese Cooper <danese@gmail.com>
> >> To: "general@incubator.apache.org" <general@incubator.apache.org>
> >> Cc: "post@cs.jhu.edu" <post@cs.jhu.edu>
> >> Date: Wed, 3 Feb 2016 07:43:11 -0800
> >> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
> >> +1 (binding) Accept Joshua as an Apache Incubator podling.
> >>
> >> D
> >>
> >> > On Jan 30, 2016, at 12:00 PM, Mattmann, Chris A (3980) <
> >> chris.a.mattmann@jpl.nasa.gov> wrote:
> >> >
> >> > Hi Everyone,
> >> >
> >> > OK the discussion is now completed. Please VOTE to accept Joshua
> >> > into the Apache Incubator. I’ll leave the VOTE open for at least
> >> > the next 72 hours, with hopes to close it next Friday the 5th of
> >> > February, 2016.
> >> >
> >> > [ ] +1 Accept Joshua as an Apache Incubator podling.
> >> > [ ] +0 Abstain.
> >> > [ ] -1 Don’t accept Joshua as an Apache Incubator podling because..
> >> >
> >> > Of course, I am +1 on this. Please note VOTEs from Incubator PMC
> >> > members are binding but all are welcome to VOTE!
> >> >
> >> > Cheers,
> >> > Chris
> >> >
> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> > Chris Mattmann, Ph.D.
> >> > Chief Architect
> >> > Instrument Software and Science Data Systems Section (398)
> >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> > Office: 168-519, Mailstop: 168-527
> >> > Email: chris.a.mattmann@nasa.gov
> >> > WWW:  http://sunset.usc.edu/~mattmann/
> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> > Adjunct Associate Professor, Computer Science Department
> >> > University of Southern California, Los Angeles, CA 90089 USA
> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > -----Original Message-----
> >> > From: jpluser <chris.a.mattmann@jpl.nasa.gov>
> >> > Date: Tuesday, January 12, 2016 at 10:56 PM
> >> > To: "general@incubator.apache.org" <general@incubator.apache.org>
> >> > Cc: "post@cs.jhu.edu" <post@cs.jhu.edu>
> >> > Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine
> >>Translation
> >> > Toolkit
> >> >
> >> >> Hi Everyone,
> >> >>
> >> >> Please find attached for your viewing pleasure a proposed new
> >>project,
> >> >> Apache Joshua, a statistical machine translation toolkit. The
> >>proposal
> >> >> is in wiki draft form at:
> >> https://wiki.apache.org/incubator/JoshuaProposal
> >> >>
> >> >> Proposal text is copied below. I’ll leave the discussion open
for a
> >> week
> >> >> and we are interested in folks who would like to be initial
> >>committers
> >> >> and mentors. Please discuss here on the thread.
> >> >>
> >> >> Thanks!
> >> >>
> >> >> Cheers,
> >> >> Chris (Champion)
> >> >>
> >> >> ———
> >> >>
> >> >> = Joshua Proposal =
> >> >>
> >> >> == Abstract ==
> >> >> [[joshua-decoder.org|Joshua]] is an open-source statistical machine
> >> >> translation toolkit. It includes a Java-based decoder for translating
> >> with
> >> >> phrase-based, hierarchical, and syntax-based translation models, a
> >> >> Hadoop-based grammar extractor (Thrax), and an extensive set of tools
> >> and
> >> >> scripts for training and evaluating new models from parallel text.
> >> >>
> >> >> == Proposal ==
> >> >> Joshua is a state of the art statistical machine translation system
> >>that
> >> >> provides a number of features:
> >> >>
> >> >> * Support for the two main paradigms in statistical machine
> >>translation:
> >> >> phrase-based and hierarchical / syntactic.
> >> >> * A sparse feature API that makes it easy to add new feature
> >>templates
> >> >> supporting millions of features
> >> >> * Native implementations of many tuners (MERT, MIRA, PRO, and
> >>AdaGrad)
> >> >> * Support for lattice decoding, allowing upstream NLP tools to expose
> >> >> their hypothesis space to the MT system
> >> >> * An efficient representation for models, allowing for quick loading
> >>of
> >> >> multi-gigabyte model files
> >> >> * Fast decoding speed (on par with Moses and mtplz)
> >> >> * Language packs — precompiled models that allow the decoder
to be
> >> run as
> >> >> a black box
> >> >> * Thrax, a Hadoop-based tool for learning translation models from
> >> >> parallel text
> >> >> * A suite of tools for constructing new models for any language pair
> >>for
> >> >> which sufficient training data exists
> >> >>
> >> >> == Background and Rationale ==
> >> >> A number of factors make this a good time for an Apache project
> >>focused
> >> on
> >> >> machine translation (MT): the quality of MT output (for many language
> >> >> pairs); the average computing resources available on computers,
> >>relative
> >> >> to the needs of MT systems; and the availability of a number of
> >> >> high-quality toolkits, together with a large base of researchers
> >>working
> >> >> on them.
> >> >>
> >> >> Over the past decade, machine translation (MT; the automatic
> >>translation
> >> >> of one human language to another) has become a reality. The research
> >> into
> >> >> statistical approaches to translation that began in the early
> >>nineties,
> >> >> together with the availability of large amounts of training data, and
> >> >> better computing infrastructure, have all come together to produce
> >> >> translations results that are “good enough† for a large
set of
> >> language
> >> >> pairs and use cases. Free services like
> >> >> [[https://www.bing.com/translator|Bing Translator]] and
> >> >> [[https://translate.google.com|Google Translate]] have made these
> >> services
> >> >> available to the average person through direct interfaces and through
> >> >> tools like browser plugins, and sites across the world with higher
> >> >> translation needs use them to translate their pages through
> >> automatically.
> >> >>
> >> >> MT does not require the infrastructure of large corporations in
> >>order to
> >> >> produce feasible output. Machine translation can be
> >>resource-intensive,
> >> >> but need not be prohibitively so. Disk and memory usage are mostly
a
> >> >> matter of model size, which for most language pairs is a few
> >>gigabytes
> >> at
> >> >> most, at which size models can provide coverage on the order of tens
> >>or
> >> >> even hundreds of thousands of words in the input and output
> >>languages.
> >> The
> >> >> computational complexity of the algorithms used to search for
> >> translations
> >> >> of new sentences are typically linear in the number of words in the
> >> input
> >> >> sentence, making it possible to run a translation engine on a
> >>personal
> >> >> computer.
> >> >>
> >> >> The research community has produced many different open source
> >> translation
> >> >> projects for a range of programming languages and under a variety of
> >> >> licenses. These projects include the core “decoder†, which
takes
> >>a
> >> model
> >> >> and uses it to translate new sentences between the language pair the
> >> model
> >> >> was defined for. They also typically include a large set of tools
> >>that
> >> >> enable new models to be built from large sets of example translations
> >> >> (“parallel data†) and monolingual texts. These toolkits
are
> >>usually
> >> built
> >> >> to support the agendas of the (largely) academic researchers that
> >>build
> >> >> them: the repeated cycle of building new models, tuning model
> >>parameters
> >> >> against development data, and evaluating them against held-out test
> >> data,
> >> >> using standard metrics for testing the quality of MT output.
> >> >>
> >> >> Together, these three factors—the quality of machine translation
> >> output,
> >> >> the feasibility of translating on standard computers, and the
> >> availability
> >> >> of tools to build models—make it reasonable for the end users
to
> >>use
> >> MT as
> >> >> a black-box service, and to run it on their personal machine.
> >> >>
> >> >> These factors make it a good time for an organization with the
> >>status of
> >> >> the Apache Foundation to host a machine translation project.
> >> >>
> >> >> == Current Status ==
> >> >> Joshua was originally ported from David Chiang’s Python
> >> implementation of
> >> >> Hiero by Zhifei Li, while he was a Ph.D. student at Johns Hopkins
> >> >> University. The current version is maintained by Matt Post at Johns
> >> >> Hopkins’ Human Language Technology Center of Excellence. Joshua
has
> >> made
> >> >> many releases with a list of over 20 source code tags. The last
> >>release
> >> of
> >> >> Joshua was 6.0.5 on November 5th, 2015.
> >> >>
> >> >> == Meritocracy ==
> >> >> The current developers are familiar with meritocratic open source
> >> >> development at Apache. Apache was chosen specifically because we
> >>want to
> >> >> encourage this style of development for the project.
> >> >>
> >> >> == Community ==
> >> >> Joshua is used widely across the world. Perhaps its biggest (known)
> >> >> research / industrial user is the Amazon research group in Berlin.
> >> Another
> >> >> user is the US Army Research Lab. No formal census has been
> >>undertaken,
> >> >> but posts to the Joshua technical support mailing list, along with
> >>the
> >> >> occasional contributions, suggest small research and academic
> >> communities
> >> >> spread across the world, many of them in India.
> >> >>
> >> >> During incubation, we will explicitly seek to increase our usage
> >>across
> >> >> the board, including academic research, industry, and other end users
> >> >> interested in statistical machine translation.
> >> >>
> >> >> == Core Developers ==
> >> >> The current set of core developers is fairly small, having fallen
> >>with
> >> the
> >> >> graduation from Johns Hopkins of some core student participants.
> >> However,
> >> >> Joshua is used fairly widely, as mentioned above, and there remains
a
> >> >> commitment from the principal researcher at Johns Hopkins to
> >>continue to
> >> >> use and develop it. Joshua has seen a number of new community members
> >> >> become interested recently due to a potential for its projected use
> >>in a
> >> >> number of ongoing DARPA projects such as XDATA and Memex.
> >> >>
> >> >> == Alignment ==
> >> >> Joshua is currently Copyright (c) 2015, Johns Hopkins University All
> >> >> rights reserved and licensed under BSD 2-clause license. It would of
> >> >> course be the intention to relicense this code under AL2.0 which
> >>would
> >> >> permit expanded and increased use of the software within Apache
> >> projects.
> >> >> There is currently an ongoing effort within the Apache Tika
> >>community to
> >> >> utilize Joshua within Tika’s Translate API, see
> >> >> [[https://issues.apache.org/jira/browse/TIKA-1343|TIKA-1343]].
> >> >>
> >> >> == Known Risks ==
> >> >>
> >> >> === Orphaned products ===
> >> >> At the moment, regular contributions are made by a single
> >>contributor,
> >> the
> >> >> lead maintainer. He (Matt Post) plans to continue development for the
> >> next
> >> >> few years, but it is still a single point of failure, since the
> >>graduate
> >> >> students who worked on the project have moved on to jobs, mostly in
> >> >> industry. However, our goal is to help that process by growing the
> >> >> community in Apache, and at least in growing the community with users
> >> and
> >> >> participants from NASA JPL.
> >> >>
> >> >> === Inexperience with Open Source ===
> >> >> The team both at Johns Hopkins and NASA JPL have experience with many
> >> OSS
> >> >> software projects at Apache and elsewhere. We understand "how it
> >>works"
> >> >> here at the foundation.
> >> >>
> >> >>
> >> >> == Relationships with Other Apache Products ==
> >> >> Joshua includes dependences on Hadoop, and also is included as a
> >>plugin
> >> in
> >> >> Apache Tika. We are also interested in coordinating with other
> >>projects
> >> >> including Spark, and other projects needing MT services for language
> >> >> translation.
> >> >>
> >> >> == Developers ==
> >> >> Joshua only has one regular developer who is employed by Johns
> >>Hopkins
> >> >> University. NASA JPL (Mattmann and McGibbney) have been contributing
> >> >> lately including a Brew formula and other contributions to the
> >>project
> >> >> through the DARPA XDATA and Memex programs.
> >> >>
> >> >> == Documentation ==
> >> >> Documentation and publications related to Joshua can be found at
> >> >> joshua-decoder.org. The source for the Joshua documentation is
> >> currently
> >> >> hosted on Github at
> >> >> https://github.com/joshua-decoder/joshua-decoder.github.com
> >> >>
> >> >> == Initial Source ==
> >> >> Current source resides at Github: github.com/joshua-decoder/joshua
> >>(the
> >> >> main decoder and toolkit) and github.com/joshua-decoder/thrax (the
> >> grammar
> >> >> extraction tool).
> >> >>
> >> >> == External Dependencies ==
> >> >> Joshua has a number of external dependencies. Only BerkeleyLM (Apache
> >> 2.0)
> >> >> and KenLM (LGPG 2.1) are run-time decoder dependencies (one of which
> >>is
> >> >> needed for translating sentences with pre-built models). The rest are
> >> >> dependencies for the build system and pipeline, used for constructing
> >> and
> >> >> training new models from parallel text.
> >> >>
> >> >> Apache projects:
> >> >> * Ant
> >> >> * Hadoop
> >> >> * Commons
> >> >> * Maven
> >> >> * Ivy
> >> >>
> >> >> There are also a number of other open-source projects with various
> >> >> licenses that the project depends on both dynamically (runtime), and
> >> >> statically.
> >> >>
> >> >> === GNU GPL 2 ===
> >> >> * Berkeley Aligner: https://code.google.com/p/berkeleyaligner/
> >> >>
> >> >> === LGPG 2.1 ===
> >> >> * KenLM: github.com/kpu/kenlm
> >> >>
> >> >> === Apache 2.0 ===
> >> >> * BerkeleyLM: https://code.google.com/p/berkeleylm/
> >> >>
> >> >> === GNU GPL ===
> >> >> * GIZA++: http://www.statmt.org/moses/giza/GIZA++.html
> >> >>
> >> >> == Required Resources ==
> >> >> * Mailing Lists
> >> >>  * private@joshua.incubator.apache.org
> >> >>  * dev@joshua.incubator.apache.org
> >> >>  * commits@joshua.incubator.apache.org
> >> >>
> >> >> * Git Repos
> >> >>  * https://git-wip-us.apache.org/repos/asf/joshua.git
> >> >>
> >> >> * Issue Tracking
> >> >>  * JIRA Joshua (JOSHUA)
> >> >>
> >> >> * Continuous Integration
> >> >>  * Jenkins builds on https://builds.apache.org/
> >> >>
> >> >> * Web
> >> >>  * http://joshua.incubator.apache.org/
> >> >>  * wiki at http://cwiki.apache.org
> >> >>
> >> >> == Initial Committers ==
> >> >> The following is a list of the planned initial Apache committers (the
> >> >> active subset of the committers for the current repository on
> >>Github).
> >> >>
> >> >> * Matt Post (post@cs.jhu.edu)
> >> >> * Lewis John McGibbney (lewismc@apache.org)
> >> >> * Chris Mattmann (mattmann@apache.org)
> >> >>
> >> >> == Affiliations ==
> >> >>
> >> >> * Johns Hopkins University
> >> >>  * Matt Post
> >> >>
> >> >> * NASA JPL
> >> >>  * Chris Mattmann
> >> >>  * Lewis John McGibbney
> >> >>
> >> >>
> >> >> == Sponsors ==
> >> >> === Champion ===
> >> >> * Chris Mattmann (NASA/JPL)
> >> >>
> >> >> === Nominated Mentors ===
> >> >> * Paul Ramirez
> >> >> * Lewis John McGibbney
> >> >> * Chris Mattmann
> >> >>
> >> >> == Sponsoring Entity ==
> >> >> The Apache Incubator
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >> Chris Mattmann, Ph.D.
> >> >> Chief Architect
> >> >> Instrument Software and Science Data Systems Section (398)
> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> >> Office: 168-519, Mailstop: 168-527
> >> >> Email: chris.a.mattmann@nasa.gov
> >> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >> Adjunct Associate Professor, Computer Science Department
> >> >> University of Southern California, Los Angeles, CA 90089 USA
> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message