From users-return-578-apmail-groovy-users-archive=groovy.apache.org@groovy.incubator.apache.org Sun Jul 26 09:58:22 2015 Return-Path: X-Original-To: apmail-groovy-users-archive@minotaur.apache.org Delivered-To: apmail-groovy-users-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4C996186AD for ; Sun, 26 Jul 2015 09:58:22 +0000 (UTC) Received: (qmail 31531 invoked by uid 500); 26 Jul 2015 09:58:22 -0000 Delivered-To: apmail-groovy-users-archive@groovy.apache.org Received: (qmail 31490 invoked by uid 500); 26 Jul 2015 09:58:22 -0000 Mailing-List: contact users-help@groovy.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@groovy.incubator.apache.org Delivered-To: mailing list users@groovy.incubator.apache.org Received: (qmail 31479 invoked by uid 99); 26 Jul 2015 09:58:22 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Jul 2015 09:58:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9FC6DC0473 for ; Sun, 26 Jul 2015 09:58:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.001 X-Spam-Level: *** X-Spam-Status: No, score=3.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id y9VJ_i59UaJd for ; Sun, 26 Jul 2015 09:58:10 +0000 (UTC) Received: from mail-wi0-f174.google.com (mail-wi0-f174.google.com [209.85.212.174]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 16EBC42959 for ; Sun, 26 Jul 2015 09:58:10 +0000 (UTC) Received: by wibxm9 with SMTP id xm9so81028180wib.1 for ; Sun, 26 Jul 2015 02:57:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=RCMlsdiCAWs3mD2IJDbQuGV/Dde0SAJBptLQnr8aDeo=; b=RuA4pC6w/FhhxvtbUe9FdqWxsmKgLPRXGlsJ3imfbak5xpJM9xNxkUxSnxJMaaz/vp MQDcGmevr8SldnyhZ/GQG9ayf9UP2ygmjBSciyW5BsFQ9c/DOw0eY4COIZlHZmKLuXXm NF3iXPpnDRFB5ffvprhaIYk2GZ3YbY3iFKZ5ghbgXBNowzFPAAnAyAwDOOD2rGTLegX3 WNsIcsnVQ1igm9gwjhQ43N3dI7ZXAvpD0Auh98d7YnVxbuuCgnIvZYCnMFqUaDAWAoCm Ug33V0g9vEDVN226VSUbSbn9mR2wry0JlsK5iZunUcTrpZzynmI14qn+RBqwoAI+gwJa 92uA== MIME-Version: 1.0 X-Received: by 10.180.231.40 with SMTP id td8mr13847411wic.9.1437904644316; Sun, 26 Jul 2015 02:57:24 -0700 (PDT) Sender: jeffmaury@gmail.com Received: by 10.28.86.8 with HTTP; Sun, 26 Jul 2015 02:57:24 -0700 (PDT) Received: by 10.28.86.8 with HTTP; Sun, 26 Jul 2015 02:57:24 -0700 (PDT) In-Reply-To: References: Date: Sun, 26 Jul 2015 11:57:24 +0200 X-Google-Sender-Auth: StZtkaSTVwz6uza7TC02b1mjvlo Message-ID: Subject: Re: Apache Spark & Groovy From: Jeff MAURY To: users@groovy.incubator.apache.org Content-Type: multipart/alternative; boundary=001a1134cc508ba5e9051bc44346 --001a1134cc508ba5e9051bc44346 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable So it may be an object stored in your task that is not Jeff Le 26 juil. 2015 11:42, "tog" a =C3=A9crit : > Thanks Jeff for your quick answer. > > Yes, the tasks shall be serializable and I believe they are. > > My test script has 2 tasks (doing the same job) one is a closure, the > other is a org.apache.spark.api.java.function.Function - and according to > a small test in my script both are serializable for Java/Groovy. > > I am a bit puzzled/stuck here. > > On 26 July 2015 at 10:34, Jeff MAURY wrote: > >> Spark is distribution tasks on cluster nodes so the task needs to be >> serializable. Appears that you task is a Groovy closure so you must make= it >> serializable. >> >> Jeff >> >> On Sun, Jul 26, 2015 at 11:12 AM, tog wrote= : >> >>> Hi >>> >>> I am starting to play with Apache Spark using groovy. I have a small >>> script that I >>> use for that purpose. >>> >>> When the script is transformed in a class and launched with java, this >>> is working fine but it fails when run as a script. >>> >>> Any idea what I am doing wrong ? May be some of you have already come >>> accros that problem. >>> >>> $ groovy -version >>> >>> Groovy Version: 2.4.3 JVM: 1.8.0_40 Vendor: Oracle Corporation OS: Mac >>> OS X >>> >>> $ groovy GroovySparkWordcount.groovy >>> >>> class org.apache.spark.api.java.JavaRDD >>> >>> true >>> >>> true >>> >>> Caught: org.apache.spark.SparkException: Task not serializable >>> >>> org.apache.spark.SparkException: Task not serializable >>> >>> at >>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner= .scala:315) >>> >>> at >>> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureClea= ner$$clean(ClosureCleaner.scala:305) >>> >>> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132= ) >>> >>> at org.apache.spark.SparkContext.clean(SparkContext.scala:1893) >>> >>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311) >>> >>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310) >>> >>> at >>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.sca= la:147) >>> >>> at >>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.sca= la:108) >>> >>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) >>> >>> at org.apache.spark.rdd.RDD.filter(RDD.scala:310) >>> >>> at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78) >>> >>> at org.apache.spark.api.java.JavaRDD$filter$0.call(Unknown Source) >>> >>> at GroovySparkWordcount.run(GroovySparkWordcount.groovy:27) >>> >>> Caused by: java.io.NotSerializableException: GroovySparkWordcount >>> >>> Serialization stack: >>> >>> - object not serializable (class: GroovySparkWordcount, value: >>> GroovySparkWordcount@57c6feea) >>> >>> - field (class: GroovySparkWordcount$1, name: this$0, type: class >>> GroovySparkWordcount) >>> >>> - object (class GroovySparkWordcount$1, GroovySparkWordcount$1@3db1ce78= ) >>> >>> - field (class: org.apache.spark.api.java.JavaRDD$$anonfun$filter$1, >>> name: f$1, type: interface org.apache.spark.api.java.function.Function) >>> >>> - object (class org.apache.spark.api.java.JavaRDD$$anonfun$filter$1, >>> ) >>> >>> at >>> org.apache.spark.serializer.SerializationDebugger$.improveException(Ser= ializationDebugger.scala:40) >>> >>> at >>> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSer= ializer.scala:47) >>> >>> at >>> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerial= izer.scala:81) >>> >>> at >>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner= .scala:312) >>> >>> ... 12 more >>> >>> >>> >> >> >> -- >> Jeff MAURY >> >> >> "Legacy code" often differs from its suggested alternative by actually >> working and scaling. >> - Bjarne Stroustrup >> >> http://www.jeffmaury.com >> http://riadiscuss.jeffmaury.com >> http://www.twitter.com/jeffmaury >> > > > > -- > PGP KeyID: 2048R/EA31CFC9 subkeys.pgp.net > --001a1134cc508ba5e9051bc44346 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

So it may be an object stored in your task that is not

Jeff


On 26 July 2015 at 10:34, Jeff MAURY <
jeffmau= ry@jeffmaury.com> wrote:
Spark is distribution tasks on cluster nodes so the task nee= ds to be serializable. Appears that you task is a Groovy closure so you mus= t make it serializable.

Jeff

On Sun, Jul 26, 2015 at= 11:12 AM, tog <guillaume.alleon@gmail.com> wrote:<= br>
Hi

I am starting to play with=C2=A0Apache Spark using=C2=A0groovy. = I have a small script that I use for that purpose.

When the script is transformed in a class and la= unched with java, this is working fine=C2=A0but it fails when run as a script.

Any idea = what I am doing wrong ? May be some of you have already come accros that pr= oblem.
<= font size=3D"2">

$=C2=A0groovy -version

Groovy Version: 2.4.3 JVM: 1.8.0_40 Vendor: Oracle Corporation OS: Mac= OS X

$ groovy GroovySparkWordcount.= groovy=C2=A0

class org.apache.spark.api.java.JavaRDD

true

true

Caught: org= .apache.spark.SparkException: Task not serializable

org.= apache.spark.SparkException: Task not serializable

at org.apache.spark.util.ClosureCleaner$.ensureSerializable(Closur= eCleaner.scala:315)

at org.apache.spark.ut= il.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureClean= er.scala:305)

at org.apache.spark.util.Clo= sureCleaner$.clean(ClosureCleaner.scala:132)

at org.apache.spark.SparkContext.clean(SparkContext.scala:1893)

at org.apache.spark.rdd.RDD$$anonfun$filter$1.appl= y(RDD.scala:311)

at org.apache.spark.rdd.R= DD$$anonfun$filter$1.apply(RDD.scala:310)

= at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scal= a:147)

at org.apache.spark.rdd.RDDOperatio= nScope$.withScope(RDDOperationScope.scala:108)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)

at org.apache.spark.rdd.RDD.filter(RDD.scala:310)

at org.apache.spark.api.java.JavaRDD.filter(JavaRDD= .scala:78)

at org.apache.spark.api.java.Ja= vaRDD$filter$0.call(Unknown Source)

at Gro= ovySparkWordcount.run(GroovySparkWordcount.groovy:27)

Ca= used by: java.io.NotSerializableException: GroovySparkWordcount

<= p>Serialization stack:

- object not = serializable (class: GroovySparkWordcount, value: GroovySparkWordcount@57c6= feea)

- field (class: GroovySparkWordcount= $1, name: this$0, type: class GroovySparkWordcount)

- object (class GroovySparkWordcount$1, GroovySparkWordcount$1@3d= b1ce78)

- field (class: org.apache.spark.a= pi.java.JavaRDD$$anonfun$filter$1, name: f$1, type: interface org.apache.sp= ark.api.java.function.Function)

- object (= class org.apache.spark.api.java.JavaRDD$$anonfun$filter$1, <function1>= ;)

at org.apache.spark.serializer.Serializ= ationDebugger$.improveException(SerializationDebugger.scala:40)

<= p> at org.apache.spark.serializer.JavaSerializationStrea= m.writeObject(JavaSerializer.scala:47)

at = org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer= .scala:81)

at org.apache.spark.util.Closur= eCleaner$.ensureSerializable(ClosureCleaner.scala:312)

... 12 more





<= font color=3D"#888888">--
Jeff MAURY

=
"Legacy code" often differs from its suggested alternative by= actually working and scaling.
=C2=A0- Bjarne Stroustrup

http://www.jeffmaury.com
http://r= iadiscuss.jeffmaury.com
http://www.twitter.com/jeffmaury



--
PGP Key= ID: 2048R/EA31CFC9=C2=A0 subkeys.pgp.net
--001a1134cc508ba5e9051bc44346--