From Justin Mclean <>
Subject Re: Confusion over NOTICE vs LICENSE files
Date Thu, 04 Feb 2016 07:11:42 GMT

I took a look at all the LICENSE, NOTCE and DISCLAIMER files in the non documentation / non
web site github repos of all incubating projects. 

I was assisted by scripts and make a few assumptions for expediency so may of missed a couple/included
a graduated or retired project.

Some data points:
- 10 repos are missing a LICENSE file
- There's some (very) minor variations of text in the LICENSE appendix
- 39 repos use a boiler plate LICENSE file
- 1 LICENSE file is missing Apache boilerplate test
- 1 repo is missing the LICENSE appendix part
- 2 repos have a non standard LICENSE appendix (filled in copyright line)
- 10 LICENSE files have the long form of MIT/BSD licenses where the short form is preferred
- 1 LICENSE file oddly / verbosely lists out the MIT/BSD license of all individual files
- at least 1 LICENSE file lists Apache licensed ASF software
- at least 8 LICENSE files list non ASF Apache licensed software
- 14 repos are missing a NOTICE file
- in the NOTICE file 14 repos use the name "Apache XXXX (incubating)”, 55 use "Apache XXXX”,
and 3 use just “XXX”  (missing Apache)
- 29 repos have a NOTICE file copyright year before 2016
- 2 use the older “developed by” instead of “developed at” in the NOTICE file
- 2 have incorrect text in the NOTICE files
- at least 8 including licensing information in NOTICE that should be in LICENSE (IMO from
a quick look)
- at least 1 has excessive copyright lines which may be incorrect
- 21 repos are missing DISCLAIMER files
- There's some (minor) variation on the DISCLAIMER wording

Projects are works in progress or may not have made a release or updated the files for the
next release or the expected files may not be in the 1/2 dozen places my scripts looked at.
Just take these numbers as a rough indication. I really didn’t want to spend too long on

A few NOTICE / LICENSE files have TODO’s which is nice to see. I would pass an IPMC vote
on a release if I saw this.

It looks like a few projects are getting confused with what goes in LICENSE and NOTICE. The
two issues seem to be adding MIT, BSD or Apache licenses to NOTICE when it is not required
and adding extra copyright notices to NOTICE. An update on policy documentation to make it
clearer what goes in both files would help here I think - which is already under way.

There also seems be some confusion around what to do with bundled Apache licensed software.
This existing documentation is not entirely clear on how to handle non ASF Apache software
and this has come up on the list a few times with some differing opionions.

A few questions on incubator policy that may need to be clarified:
- A release must include a NOTICE file, but should a repo include one?
- Likewise should a DISCLAIMER file be present in the repo?
- I thought incubating projects should be named "Apache XXXX (incubating)” but the majority
are named "Apache XXXX” missing the “(incubating)" in the NOTICE file.
- What is the correct way to handle non ASF Apache license software? Currently policy (AFAIK)
is not to add to LICENSE but not an error if you do so. What advice should we give to podlings

I think some of these issues are likely to occur from copy and paste from other projects files.
Would it make sense when creating new source repos to add boiler plate LICENSE, NOTICE and

Anyone have any other views / opionions / insights based on the above data?

Now I don’t want to look at a LICENSE or NOTICE file for a week or so and need a stiff drink.


PS If anyone is interested in the simple scripts/process to get those numbers just ask offline.
I used grep, wc and sort a fair bit to narrow down which files to look at.
