madlib-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rashmi Raghu <rra...@pivotal.io>
Subject Re: [VOTE] MADlib v1.14-rc1
Date Mon, 30 Apr 2018 20:11:09 GMT
Hi Frank,

Thank you for your response to my feedback and posting the pdf. :-)

Looking forward to using the new release!

Thanks,
Rashmi


On Mon, Apr 30, 2018 at 10:51 AM, Frank McQuillan <fmcquillan@pivotal.io>
wrote:

> Hi Rashmi,
>
> I attached the completed user docs for balanced data sets to the JIRA
> https://issues.apache.org/jira/browse/MADLIB-1168
> for your review.
>
> The doc is called "MADlib_Balanced Sampling.pdf"
>
> Your idea of posting the updated user docs for the upcoming release is a
> good one.
>
> Frank
>
> On Fri, Apr 27, 2018 at 6:56 PM, Srivatsan Ramanujam <vatsan.cs@utexas.edu
> >
> wrote:
>
> > Built from source and tested on Mac. (High Sierra - 10.13.3, cmake
> version
> > 3.11.0-rc2, Postgres 9.6.4)
> >
> > +1 (binding)
> >
> >
> >
> >
> > On Fri, Apr 27, 2018 at 6:09 PM, Jingyi Mei <jmei@pivotal.io> wrote:
> >
> > > Hi Rashmi,
> > >
> > > Thanks for the comments and feedback!
> > >
> > > The release page with a page-not-found error should not be there since
> we
> > > haven't made the actual release yet. We just removed the link in that
> > page
> > > and it will be added again after the community has voted and we have an
> > > official release.
> > >
> > > Concerning the documentation links for new features, it is definitely a
> > > great idea to add them in the release notes and also vote email! Thanks
> > for
> > > the recommendation and we will see if we can make it better in this
> > release.
> > >
> > > Cheers,
> > > Jingyi Mei
> > >
> > > On Fri, Apr 27, 2018 at 3:19 PM, Rashmi Raghu <rraghu@pivotal.io>
> wrote:
> > >
> > >> Installed on Postgres 9.6 on MacOS using dmg.
> > >> Checked out the new additions to the summary function. Looks good. My
> > >> vote: +1 (binding).
> > >>
> > >> Some comments aside from the vote:
> > >>
> > >>    - I followed this link in the email: https://cwiki.apache.or
> > >>    g/confluence/display/MADLIB/MADlib+1.14
> > >>    <https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.14>
> and
> > >>    then from there clicked on https://dist.apache.org/rep
> > >>    os/dist/release/madlib/1.14/ which gives a page-not-found error.
> > >>    - I didn't see a link to documentation associated with this
> release -
> > >>    it would be useful to have that also available (let me know if it
> > was in
> > >>    the email and I missed it or if it is not standard practice). For
> > instance,
> > >>    I wanted to briefly look at the new balanced datasets module and it
> > would
> > >>    have been easy to look it up in the web version of the docs. I did
> > find the
> > >>    docs through the function call e.g. madlib.balance_sample('usage')
> > but that
> > >>    requires knowing roughly what function name to look for (not hard
> in
> > this
> > >>    case but I can imagine other situations where it might not be
> > >>    straightforward)
> > >>
> > >> Great to see all the new features and bug fixes!
> > >>
> > >> Thanks,
> > >> Rashmi
> > >>
> > >>
> > >> On Fri, Apr 27, 2018 at 1:40 PM, Orhan Kislal <okislal@pivotal.io>
> > wrote:
> > >>
> > >>> Tested on PG 10.3 (src and dmg). Looks good. +1 (binding)
> > >>>
> > >>> Thanks for preparing the release Jingyi,
> > >>>
> > >>> Orhan Kislal
> > >>>
> > >>> On Fri, Apr 27, 2018 at 11:44 AM, Frank McQuillan <
> > fmcquillan@pivotal.io
> > >>> > wrote:
> > >>>
> > >>>> Hi Jingyi,
> > >>>>
> > >>>> Thanks for posting the artifacts and sending out the vote.
> > >>>>
> > >>>> My findings:
> > >>>>
> > >>>> Installation and IC passed on postgres 9.6.7
> > >>>>
> > >>>> Also I tested a cpl of the new features (personalized page rank
and
> > >>>> mini-batch preprocessor)
> > >>>> and they worked OK for me with a small sample data set.
> > >>>>
> > >>>> +1 (binding)
> > >>>>
> > >>>> On Thu, Apr 26, 2018 at 2:57 PM, Jingyi Mei <jmei@pivotal.io>
> wrote:
> > >>>>
> > >>>> > Hello Apache MADlib dev community,
> > >>>> >
> > >>>> > This is the vote for Apache MADlib 1.14 Release (RC1). It
provides
> > the
> > >>>> > source release tarball and convenience binaries. This is the
third
> > >>>> > Apache MADlib release as an Apache Top Level Project (TLP).
> > >>>> >
> > >>>> > The vote will run for at least 72 working hours and will close
on
> > >>>> > Tuesday, May 1st, 2018 @ 6pm PDT. A minimum of 3 binding +1
votes
> > and
> > >>>> > more binding +1 than binding -1 are required to pass.
> > >>>> >
> > >>>> > The main goals of this release are:
> > >>>> >
> > >>>> > New features:
> > >>>> >
> > >>>> >    - New module - Balanced datasets: A sampling module to
balance
> > >>>> >    classification
> > >>>> >    datasets by resampling using various techniques including
> > >>>> >    undersampling,
> > >>>> >    oversampling, uniform sampling or user-defined proportion
> > sampling
> > >>>> >    (MADLIB-1168)
> > >>>> >    - Mini-batch: Added a mini-batch optimizer for MLP and
a
> > >>>> preprocessor
> > >>>> >    function
> > >>>> >    necessary to create batches from the data (MADLIB-1200,
> > >>>> MADLIB-1206,
> > >>>> >    MADLIB-1220, MADLIB-1224, MADLIB-1226, MADLIB-1227)
> > >>>> >    - k-NN: Added weighted averaging/voting by distance
> (MADLIB-1181)
> > >>>> >    - Summary: Added additional stats: number of positive,
> negative,
> > >>>> zero
> > >>>> >    values and
> > >>>> >    95% confidence intervals for the mean (MADLIB-1167)
> > >>>> >    - Encode categorical: Updated to produce lower-case column
> names
> > >>>> when
> > >>>> >    possible
> > >>>> >    (MADLIB-1202)
> > >>>> >    - MLP: Added support for already one-hot encoded categorical
> > >>>> dependent
> > >>>> >    variable
> > >>>> >    in a classification task (MADLIB-1222)
> > >>>> >    - Pagerank: Added option for personalized vertices that
allows
> > >>>> higher
> > >>>> >    weightage
> > >>>> >    for a subset of vertices which will have a higher jump
> > probability
> > >>>> as
> > >>>> >    compared to other vertices and a random surfer is more
likely
> to
> > >>>> >    jump to these personalization vertices (MADLIB-1084)
> > >>>> >
> > >>>> > Bug fixes:
> > >>>> >
> > >>>> >    - Fixed issue with invalid calls of construct_array that
led to
> > >>>> >    problems
> > >>>> >    in Postgresql 10 (MADLIB-1185)
> > >>>> >    - Added newline between file concatenation during PGXN
install
> > >>>> >    (MADLIB-1194)
> > >>>> >    - Fixed upgrade issues in knn (MADLIB-1197)
> > >>>> >    - Added fix to ensure RF variable importance are always
> > >>>> non-negative
> > >>>> >    - Fixed inconsistency in LDA output and improved usability
> > >>>> >    (MADLIB-1160, MADLIB-1201)
> > >>>> >    - Fixed MLP and RF predict for models trained in earlier
> versions
> > >>>> to
> > >>>> >    ensure missing optional parameters are given appropriate
> default
> > >>>> values
> > >>>> >    (MADLIB-1207)
> > >>>> >    - Fixed a scenario in DT where no features exist due
> categorical
> > >>>> >    columns with single level being dropped led to the database
> > >>>> crashing
> > >>>> >    - Fixed step size initialization in MLP based on learning
rate
> > >>>> policy
> > >>>> >    (MADLIB-1212)
> > >>>> >    - Fixed PCA issue that leads to failure when grouping column
> is a
> > >>>> TEXT
> > >>>> >    type (MADLIB-1215)
> > >>>> >    - Fixed cat levels output in DT when grouping is enabled
> > >>>> (MADLIB-1218)
> > >>>> >    - Fixed and simplified initialization of model coefficients
in
> > MLP
> > >>>> >    - Removed source table dependency for predicting regression
> > models
> > >>>> in
> > >>>> >    MLP (MADLIB-1223)
> > >>>> >    - Print loss of first iteration in MLP (MADLIB-1228)
> > >>>> >    - Fixed MLP failure on GPDB 4.3 when verbose=True (MADLIB-1209)
> > >>>> >    - Fixed RF issue that showed up when var_importance=True
with
> no
> > >>>> >    continuous features (MADLIB-1219)
> > >>>> >    - Fixed DT/RF issue for null_as_category=True and grouping
> > enabled
> > >>>> >    (MADLIB-1217)
> > >>>> >
> > >>>> > Other:
> > >>>> >
> > >>>> >    - Reduced install-check runtime for PCA, DT, RF, elastic
net
> > >>>> >    (MADLIB-1216)
> > >>>> >    - Added CentOS 7 PostgreSQL 9.6/10 docker files
> > >>>> >
> > >>>> > For additional information, please see:
> > >>>> > https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.14
> > >>>> >
> > >>>> > Here are the release artifact details:
> > >>>> >
> > >>>> > Source release tag to be voted on: rc/1.14-rc1, located here:
> > >>>> > https://git-wip-us.apache.org/repos/asf?p=madlib.git;a=tag;
> > >>>> > h=refs/tags/rc/1.14-rc1
> > >>>> >
> > >>>> > Source release tarball can be retrieved from the following
> > locations:
> > >>>> >
> > >>>> > Package:
> > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> > >>>> > apache-madlib-1.14-src.tar.gz
> > >>>> > PGP Signature:
> > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> > >>>> > apache-madlib-1.14-src.tar.gz.asc
> > >>>> > SHA512 Hash:
> > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> > >>>> > apache-madlib-1.14-src.tar.gz.sha512
> > >>>> >
> > >>>> > Convenience binary packages can be retrieved from the following
> > >>>> > locations:
> > >>>> >
> > >>>> > macOS: 10.* PostgreSQL 9.6 & 10.2
> > >>>> >
> > >>>> > Package:
> > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> > >>>> > apache-madlib-1.14-bin-Darwin.dmg
> > >>>> > PGP Signature:
> > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> > >>>> > apache-madlib-1.14-bin-Darwin.dmg.asc
> > >>>> > SHA512 Hash:
> > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> > >>>> > apache-madlib-1.14-bin-Darwin.dmg.sha512
> > >>>> >
> > >>>> > CentOS* GPDB 4.3.5+
> > >>>> >
> > >>>> > Package:
> > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> > >>>> > apache-madlib-1.14-bin-Linux-GPDB43.rpm
> > >>>> > PGP Signature:
> > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> > >>>> > apache-madlib-1.14-bin-Linux-GPDB43.rpm.asc
> > >>>> > SHA512 Hash:
> > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> > >>>> > apache-madlib-1.14-bin-Linux-GPDB43.rpm.sha512
> > >>>> >
> > >>>> > CentOS 6 &* GPDB 5.3.0, PostgreSQL 9.6 & 10.2
> > >>>> >
> > >>>> > Package:
> > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> > >>>> > apache-madlib-1.14-bin-Linux.rpm
> > >>>> > PGP Signature:
> > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> > >>>> > apache-madlib-1.14-bin-Linux.rpm.asc
> > >>>> > SHA512 Hash:
> > >>>> > https://dist.apache.org/repos/dist/dev/madlib/1.14-RC1/
> > >>>> > apache-madlib-1.14-bin-Linux.rpm.sha512
> > >>>> >
> > >>>> > The PGP KEYS file used to validate the signature of the release
> > >>>> artifacts
> > >>>> > is available here:
> > >>>> > https://dist.apache.org/repos/dist/dev/madlib/KEYS
> > >>>> >
> > >>>> > To help in tallying the vote, PMC members please be sure to
> indicate
> > >>>> > “(binding)” with the vote.
> > >>>> >
> > >>>> > [ ] +1 approve
> > >>>> > [ ] +0 no opinion
> > >>>> > [ ] -1 disapprove (and reason why)
> > >>>> >
> > >>>> > Regards,
> > >>>> > Jingyi Mei
> > >>>> >
> > >>>> > Pivotal R&D Advanced Analytics
> > >>>> > ​
> > >>>> >
> > >>>>
> > >>>
> > >>>
> > >>
> > >>
> > >> --
> > >> Rashmi Raghu, Ph.D.
> > >> Pivotal Data Science
> > >>
> > >
> > >
> >
>



-- 
Rashmi Raghu, Ph.D.
Pivotal Data Science

Mime
View raw message