So it may be an object stored in your task that is not


Le 26 juil. 2015 11:42, "tog" <> a écrit :
Thanks Jeff for your quick answer.

Yes, the tasks shall be serializable and I believe they are.

My test script has 2 tasks (doing the same job) one is a closure, the other is a - and according to a small test in my script both are serializable for Java/Groovy.

I am a bit puzzled/stuck here.

On 26 July 2015 at 10:34, Jeff MAURY <> wrote:
Spark is distribution tasks on cluster nodes so the task needs to be serializable. Appears that you task is a Groovy closure so you must make it serializable.


On Sun, Jul 26, 2015 at 11:12 AM, tog <> wrote:

I am starting to play with Apache Spark using groovy. I have a small script that I use for that purpose.

When the script is transformed in a class and launched with java, this is working fine but it fails when run as a script.

Any idea what I am doing wrong ? May be some of you have already come accros that problem.

$ groovy -version

Groovy Version: 2.4.3 JVM: 1.8.0_40 Vendor: Oracle Corporation OS: Mac OS X

$ groovy GroovySparkWordcount.groovy 




Caught: org.apache.spark.SparkException: Task not serializable

org.apache.spark.SparkException: Task not serializable

at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315)

at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305)

at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132)

at org.apache.spark.SparkContext.clean(SparkContext.scala:1893)

at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311)

at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)

at org.apache.spark.rdd.RDD.filter(RDD.scala:310)


at$filter$ Source)


Caused by: GroovySparkWordcount

Serialization stack:

- object not serializable (class: GroovySparkWordcount, value: GroovySparkWordcount@57c6feea)

- field (class: GroovySparkWordcount$1, name: this$0, type: class GroovySparkWordcount)

- object (class GroovySparkWordcount$1, GroovySparkWordcount$1@3db1ce78)

- field (class:$$anonfun$filter$1, name: f$1, type: interface

- object (class$$anonfun$filter$1, <function1>)

at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)

at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)

at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)

at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312)

... 12 more


"Legacy code" often differs from its suggested alternative by actually working and scaling.
 - Bjarne Stroustrup

PGP KeyID: 2048R/EA31CFC9