madlib-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jingyi Mei <>
Subject [Announce] Apache MADlib v1.14 released
Date Thu, 03 May 2018 00:23:50 GMT
The Apache MADlib team is pleased to announce the immediate
availability of the 1.14 release.

The main goals of this release are:

New features:

   - New module - Balanced datasets: A sampling module to balance
   datasets by resampling using various techniques including undersampling,
   oversampling, uniform sampling or user-defined proportion sampling
   - Mini-batch: Added a mini-batch optimizer for MLP and a preprocessor
   necessary to create batches from the data (MADLIB-1200, MADLIB-1206,
   MADLIB-1220, MADLIB-1224, MADLIB-1226, MADLIB-1227)
   - k-NN: Added weighted averaging/voting by distance (MADLIB-1181)
   - Summary: Added additional stats: number of positive, negative, zero
   values and
   95% confidence intervals for the mean (MADLIB-1167)
   - Encode categorical: Updated to produce lower-case column names when
   - MLP: Added support for already one-hot encoded categorical dependent
   in a classification task (MADLIB-1222)
   - Pagerank: Added option for personalized vertices that allows higher
   for a subset of vertices which will have a higher jump probability as
   compared to other vertices and a random surfer is more likely to
   jump to these personalization vertices (MADLIB-1084)

Bug fixes:

   - Fixed issue with invalid calls of construct_array that led to problems
   in Postgresql 10 (MADLIB-1185)
   - Added newline between file concatenation during PGXN install
   - Fixed upgrade issues in knn (MADLIB-1197)
   - Added fix to ensure RF variable importance are always non-negative
   - Fixed inconsistency in LDA output and improved usability (MADLIB-1160,
   - Fixed MLP and RF predict for models trained in earlier versions to
   ensure missing optional parameters are given appropriate default values
   - Fixed a scenario in DT where no features exist due categorical columns
   with single level being dropped led to the database crashing
   - Fixed step size initialization in MLP based on learning rate policy
   - Fixed PCA issue that leads to failure when grouping column is a TEXT
   type (MADLIB-1215)
   - Fixed cat levels output in DT when grouping is enabled (MADLIB-1218)
   - Fixed and simplified initialization of model coefficients in MLP
   - Removed source table dependency for predicting regression models in
   MLP (MADLIB-1223)
   - Print loss of first iteration in MLP (MADLIB-1228)
   - Fixed MLP failure on GPDB 4.3 when verbose=3DTrue (MADLIB-1209)
   - Fixed RF issue that showed up when var_importance=3DTrue with no
   continuous features (MADLIB-1219)
   - Fixed DT/RF issue for null_as_category=3DTrue and grouping enabled


   - Reduced install-check runtime for PCA, DT, RF, elastic net
   - Added CentOS 7 PostgreSQL 9.6/10 docker files

All release changes can be found here:

You can download the source release and convenience binary packages
from Apache MADlib's download page here:

Alternatively, you can download through an ASF mirror near you:


Apache MADlib is an open-source library for scalable in-database
analytics. It provides data-parallel implementations of mathematical,
statistical and machine learning methods for structured and
unstructured data.

The MADlib mission: to foster widespread development of scalable
analytic skills, by harnessing efforts from commercial practice,
academic research, and open-source development.

We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at


Thank you, everyone who contributed to the MADlib 1.13 release. We
look forward to continued community participation for the next

Jingyi Mei

View raw message