Just to close the loop on this...

Did not have time to experiment with other EMR versions, so just going with emr-4.9.2 for the near future since Pig Phoenix storage works as expected when running the script from the command line.

However, made an action item for a future date to try submitting the Pig script as an EMR step to see if I get better results.

Thanks,
    Steve

On Mon, Aug 21, 2017 at 4:48 PM, Steve Terrell <sterrell@oculus360.us> wrote:
Thanks for the extra info!  Will let everyone know if I solve this.

On Mon, Aug 21, 2017 at 4:24 PM, anil gupta <anilgupta84@gmail.com> wrote:
And forgot to mention that we invoke our pig scripts through oozie.

On Mon, Aug 21, 2017 at 2:20 PM, anil gupta <anilgupta84@gmail.com> wrote:
Sorry, cant share the pig script.
Here is what we are registering:
REGISTER /usr/lib/phoenix/phoenix-4.7.0-HBase-1.2-client.jar;
REGISTER /usr/lib/pig/lib/piggybank.jar;


Following is the classpath of Hadoop and Yarn:
[hadoop@ip-52-143 ~]$ hadoop classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*::/etc/tez/conf:/usr/lib/tez/*:/usr/lib/tez/lib/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar:/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar:/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar:/usr/share/aws/emr/cloudwatch-sink/lib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*
[hadoop@ip-52-143 ~]$ yarn classpath
/etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*::/etc/tez/conf:/usr/lib/tez/*:/usr/lib/tez/lib/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar:/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar:/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar:/usr/share/aws/emr/cloudwatch-sink/lib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*



On Mon, Aug 21, 2017 at 11:21 AM, Steve Terrell <sterrell@oculus360.us> wrote:
Hmm...  just repeated my test on emr-5.2.0.  This time I went with the default EMR console selections for master and core nodes (2 of them).

When running my simple Pig Phoenix store script, still getting the errors I got for other 5.x.x versions:
2017-08-21 17:50:52,431 [ERROR] [main] |app.DAGAppMaster|: Error starting DAGAppMaster
java.lang.NoSuchMethodError: org.apache.hadoop.yarn.api.records.ContainerId.fromString(Ljava/lang/String;)Lorg/apache/hadoop/yarn/api/records/ContainerId;
at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:179)
at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2304)

The simple test script:
REGISTER /usr/lib/phoenix/phoenix-client.jar;
A = load '/steve/a.txt' as (TXT:chararray);
store A into 'hbase://A_TABLE' using org.apache.phoenix.pig.PhoenixHBaseStorage('10.0.100.51','-batchSize 2500');

Calling directly from the command line like
pig try.pig

Maybe other people are calling their Phoenix Pig script some other way (EMR steps) or with different parameters?  Details where this works would really help out a lot.

Thanks,
    Steve

On Mon, Aug 21, 2017 at 10:23 AM, Steve Terrell <sterrell@oculus360.us> wrote:
Anil,

That's good news (about 5.2).  Any chance I could see your
  • full pig command line
  • PIG_CLASSPATH env variable
  • pig script or at least the REGISTER and PhoenixHBaseStorage() lines?
Might help me figure out what I'm doing wrong or differently.

One thing I did not mention because I thought it should not matter is that to avoid extra costs while testing, I was only running a master node with no slaves (no task or core nodes).  Maybe lack of slaves causes problems not normally seen.  Interesting...

Thanks so much,
    Steve


On Sun, Aug 20, 2017 at 11:04 AM, anil gupta <anilgupta84@gmail.com> wrote:
Hey Steve,

We are currently using EMR5.2 and pig-phoenix is working fine for us. We are gonna try EMR5.8 next week.

HTH,
Anil

On Fri, Aug 18, 2017 at 9:00 AM, Steve Terrell <sterrell@oculus360.us> wrote:
More info...

By trial and error, I tested different EMR versions and made a little incomplete list of which ones support Pig Phoenix storage and which ones don't:

emr-5.8.0 JacksonJaxbJsonProvider error
emr-5.6.0 JacksonJaxbJsonProvider error
emr-5.4.0 JacksonJaxbJsonProvider error
emr-5.3.1 ContainerId.fromString() error
emr-5.3.0 ContainerId.fromString() error
emr-5.0.0 ContainerId.fromString() error
emr-4.9.2 Works!
emr-4.7.0 Works!

I ran out of time trying to get 5.8.0 working, so will start using 4.9.2.  But I would like to switch to 5.8.0 if anyone has a solution.  Meanwhile, I hope this list saves other people some time and headache.

Thanks,
    Steve

On Thu, Aug 17, 2017 at 2:40 PM, Steve Terrell <sterrell@oculus360.us> wrote:
I'm running EMR 5.8.0 with these applications installed:
Pig 0.16.0, Phoenix 4.11.0, HBase 1.3.1

Here is my pig script (try.pig):

REGISTER /usr/lib/phoenix/phoenix-4.11.0-HBase-1.3-client.jar;
A = load '/steve/a.txt' as (TXT:chararray);
store A into 'hbase://A_TABLE' using org.apache.phoenix.pig.PhoenixHBaseStorage('10.0.100.51','-batchSize 2500');

I run it like this from the command line:
pig try.pig

When it fails, I dig into the hadoop task logs and find this:
2017-08-17 19:11:37,539 [ERROR] [main] |app.DAGAppMaster|: Error starting DAGAppMaster
java.lang.NoClassDefFoundError: org/apache/phoenix/shaded/org/codehaus/jackson/jaxrs/JacksonJaxbJsonProvider
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceInit(TimelineClientImpl.java:269)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.serviceInit(ATSHistoryLoggingService.java:102)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.tez.dag.history.HistoryEventHandler.serviceInit(HistoryEventHandler.java:73)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.tez.dag.app.DAGAppMaster.initServices(DAGAppMaster.java:1922)
at org.apache.tez.dag.app.DAGAppMaster.serviceInit(DAGAppMaster.java:624)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.tez.dag.app.DAGAppMaster$8.run(DAGAppMaster.java:2557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2554)
at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2359)
Caused by: java.lang.ClassNotFoundException: org.apache.phoenix.shaded.org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 28 more

Has anyone been able to get org.apache.phoenix.pig.PhoenixHBaseStorage() to work on recent EMR versions? Please help if you can.

Thank you,
    Steve




--
Thanks & Regards,
Anil Gupta





--
Thanks & Regards,
Anil Gupta



--
Thanks & Regards,
Anil Gupta