groovy-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tog <guillaume.all...@gmail.com>
Subject Re: Apache Spark & Groovy
Date Sun, 26 Jul 2015 11:54:47 GMT
Thanks Cedric, I learnt something :-) and it solved my issue.

Few additional questions then:

In my script, should Serializable.isAssignableFrom(filterClosure.class)
returns true only when I call dehydrate on it ? (this is not the case ?)

Would there be a way to automatically create "dehydrated" closure in a
script ?

or should I catch all calls to map on JavaRDD to make sure the closure is
dehydrated before calling the actual method ?

On 26 July 2015 at 11:07, Cédric Champeau <cedric.champeau@gmail.com> wrote:

> A closure keeps a reference to its owner/thisObject, which is in your
> case the script. The script is not serializable. If you dehydrate the
> closure (call closure.dehydrate()) it will not keep a reference to the
> script anymore and it should be serializable.
>
> 2015-07-26 11:57 GMT+02:00 Jeff MAURY <jeffmaury@jeffmaury.com>:
> > So it may be an object stored in your task that is not
> >
> > Jeff
> >
> > Le 26 juil. 2015 11:42, "tog" <guillaume.alleon@gmail.com> a écrit :
> >>
> >> Thanks Jeff for your quick answer.
> >>
> >> Yes, the tasks shall be serializable and I believe they are.
> >>
> >> My test script has 2 tasks (doing the same job) one is a closure, the
> >> other is a org.apache.spark.api.java.function.Function - and according
> to a
> >> small test in my script both are serializable for Java/Groovy.
> >>
> >> I am a bit puzzled/stuck here.
> >>
> >> On 26 July 2015 at 10:34, Jeff MAURY <jeffmaury@jeffmaury.com> wrote:
> >>>
> >>> Spark is distribution tasks on cluster nodes so the task needs to be
> >>> serializable. Appears that you task is a Groovy closure so you must
> make it
> >>> serializable.
> >>>
> >>> Jeff
> >>>
> >>> On Sun, Jul 26, 2015 at 11:12 AM, tog <guillaume.alleon@gmail.com>
> wrote:
> >>>>
> >>>> Hi
> >>>>
> >>>> I am starting to play with Apache Spark using groovy. I have a small
> >>>> script that I use for that purpose.
> >>>>
> >>>> When the script is transformed in a class and launched with java, this
> >>>> is working fine but it fails when run as a script.
> >>>>
> >>>> Any idea what I am doing wrong ? May be some of you have already come
> >>>> accros that problem.
> >>>>
> >>>> $ groovy -version
> >>>>
> >>>> Groovy Version: 2.4.3 JVM: 1.8.0_40 Vendor: Oracle Corporation OS: Mac
> >>>> OS X
> >>>>
> >>>> $ groovy GroovySparkWordcount.groovy
> >>>>
> >>>> class org.apache.spark.api.java.JavaRDD
> >>>>
> >>>> true
> >>>>
> >>>> true
> >>>>
> >>>> Caught: org.apache.spark.SparkException: Task not serializable
> >>>>
> >>>> org.apache.spark.SparkException: Task not serializable
> >>>>
> >>>> at
> >>>>
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315)
> >>>>
> >>>> at
> >>>>
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305)
> >>>>
> >>>> at
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132)
> >>>>
> >>>> at org.apache.spark.SparkContext.clean(SparkContext.scala:1893)
> >>>>
> >>>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311)
> >>>>
> >>>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310)
> >>>>
> >>>> at
> >>>>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
> >>>>
> >>>> at
> >>>>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
> >>>>
> >>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
> >>>>
> >>>> at org.apache.spark.rdd.RDD.filter(RDD.scala:310)
> >>>>
> >>>> at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78)
> >>>>
> >>>> at org.apache.spark.api.java.JavaRDD$filter$0.call(Unknown Source)
> >>>>
> >>>> at GroovySparkWordcount.run(GroovySparkWordcount.groovy:27)
> >>>>
> >>>> Caused by: java.io.NotSerializableException: GroovySparkWordcount
> >>>>
> >>>> Serialization stack:
> >>>>
> >>>> - object not serializable (class: GroovySparkWordcount, value:
> >>>> GroovySparkWordcount@57c6feea)
> >>>>
> >>>> - field (class: GroovySparkWordcount$1, name: this$0, type: class
> >>>> GroovySparkWordcount)
> >>>>
> >>>> - object (class GroovySparkWordcount$1,
> GroovySparkWordcount$1@3db1ce78)
> >>>>
> >>>> - field (class: org.apache.spark.api.java.JavaRDD$$anonfun$filter$1,
> >>>> name: f$1, type: interface
> org.apache.spark.api.java.function.Function)
> >>>>
> >>>> - object (class org.apache.spark.api.java.JavaRDD$$anonfun$filter$1,
> >>>> <function1>)
> >>>>
> >>>> at
> >>>>
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
> >>>>
> >>>> at
> >>>>
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
> >>>>
> >>>> at
> >>>>
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
> >>>>
> >>>> at
> >>>>
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312)
> >>>>
> >>>> ... 12 more
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Jeff MAURY
> >>>
> >>>
> >>> "Legacy code" often differs from its suggested alternative by actually
> >>> working and scaling.
> >>>  - Bjarne Stroustrup
> >>>
> >>> http://www.jeffmaury.com
> >>> http://riadiscuss.jeffmaury.com
> >>> http://www.twitter.com/jeffmaury
> >>
> >>
> >>
> >>
> >> --
> >> PGP KeyID: 2048R/EA31CFC9  subkeys.pgp.net
>



-- 
PGP KeyID: 2048R/EA31CFC9  subkeys.pgp.net

Mime
View raw message