Subject MADlib Board Report - 2020 April
Date Sun, 05 Apr 2020 13:06:30 GMT
## Description:

- Apache MADlib is a scalable, big data, SQL-driven machine learning framework
  for data scientists.

## Issues:

- There are no issues requiring board attention at this time.

## Activity:

- Code complete and release in progress for 1.17
(as of time of this writing) which will be the 7th release as an Apache TLP

- Main 1.17 JIRAs include:
* feature improvements for deep learning including training multiple models in
  parallel for parameter selection (hyper-parameter tuning and model
  architecture search), inference on models trained outside of MADlib, and
  performance improvements to mini-batch preprocessor and DL training
* performance improvements to correlation/covariance, association rules, and
  weakly connected components graph algorithm
* stopping criteria on LDA using perplexity
* auto selection of number of centroids for K-mean clustering
* Postgres 12 support

- Next will be the 1.18 release with JIRAs related to deep learning and other
  ML methods

— Frank McQuillan (MADlib committer and PMC member) presented the latest deep
  learning work at FOSDEM'20 in
  a talk called: "Efficient Model Selection for Deep Neural Networks on
  Massively Parallel Processing Databases"

- Several new Jupyter notebook examples have been published to the community
  artifacts repo

## Health report:

The community is relatively small but very engaged with robust mailing list
traffic, interest in doing frequent releases and new functionality being
developed by contributors.

The number of developers actively contributing to the code/documentation is
approximately 7 in the 1st quarter of calendar year 2020.

We will constantly be on a lookout for new community members to be invited
either as committers or PMC.

## PMC changes:

- No changes in the last quarter.  Currently stands at 14 PMC members.

## Committer base changes:

- Currently 17 committers, no new committers since last report.

- The most recent committers added were: Ekta Khanna (2019-07-27) Himanshu
  Pandey (2019-07-27) Domino Valdano (2019-07-27)

## Releases:

- Next release: v1.18 planned for 2H 2020

- v1.17.0 released early April 2020

- v1.16.0 released on 2019-07-08

- v1.15.1 released on 2018-10-15

## Mailing list activity:

Average monthly mailing list activity was 56 posts to dev@ and 5 posts to
user@ for the last 3 months Jan-Mar 2020.

## JIRA Statistics:

- 8 JIRA tickets created in the last month

- 15 JIRA tickets resolved in the last month

