incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: Binary file inclusion (was [VOTE] Apache Toree (incubating) 0.1.0-rc4 as 0.1.0)
Date Mon, 23 Jan 2017 03:24:23 GMT
On Sat, Jan 21, 2017 at 9:34 AM, John D. Ament <> wrote:
> On Sat, Jan 21, 2017 at 12:19 PM Marvin Humphrey <>
> wrote:
>> On Sat, Jan 21, 2017 at 6:41 AM, John D. Ament <>
>> wrote:
>> > However, regarding the
>> > binaries.  In a recent discussion (on legal-discuss) it was decided that
>> > this was OK.  Ideally the NOTICE would include the information on the
>> > binary's source of origin (assuming that the source was eligible to be
>> > licensed this way).  In this case, the .tar.gz  is actually the
>> > distribution of Apache Spark R that looks like its required to build
>> Toree.
>> I must have missed this on legal-discuss, and it's counter to my
>> understanding. Can you please provide a link?
>> Here is something I wrote to legal-discuss recently, which talks about
>> some of the security reasons why bundling a binary dependency is
>> problematic:
> Same thread.  Specifically Mark T's response [1] and Craig's affirmation [2]
> [1]:
> [2]:

Let me be clear: compiled code does not belong in our official source

Here's the relevant policy clause:

  The Apache Software Foundation produces open source software. All releases
  are in the form of the source materials needed to make changes to the
  software being released.

Creating releases which adhere to this policy is almost always
straightforward.  Just because there are some edge cases where we have to
apply judgment doesn't invalidate the policy and allow willy-nilly bundling of

The OpenWebBeans case from legal-discuss was just such an edge case.  The
.class file wasn't on the class path and was used only when running unit tests
for some bytecode stuff.  This is quite difficult to exploit.

The debate on legal-discuss was over whether it was worth doing anything
about, because it was more of a binary resource (like a .jpeg) than compiled
object form.  In the end we didn't even make a policy exception because the
project applied a workaround -- they extracted the bytecode out of the .class
file and encoded it as a static variable in the source file.

Now, that's not really all that different from a test-time security standpoint
from having the .class file in a test dir outside the classpath or renaming
`Foo.class` to `Foo.dat` or `Foo.bin`.  It is better though from an auditing
perspective because when changes are made there will be a human-readable diff
in the commit notification email.

And that brings me to the bundling of SparkR in Toree.  The standard procedure
would be for the user to fetch that dependency themselves.  By embedding it,
we actually make it *harder* for security-minded consumers to understand where
their dependencies are coming from.

I don't see a strong rationale for bundling this dependency.  It isn't
compiled code, it's compressed source -- but when it's updated, there's no
diff because the tar.gz is binary.  Why not treat it like any other

Marvin Humphrey

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message