phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Perko, Ralph J" <Ralph.Pe...@pnnl.gov>
Subject pig and phoenix
Date Fri, 05 Dec 2014 22:57:43 GMT
Hi, I wrote a series of pig scripts to load data that were working well with 4.0, but since
upgrading  to 4.2.x (4.2.1 currently) are now failing.

Here is an example:

Table def:
CREATE TABLE IF NOT EXISTS t1_log_dns
(
  period BIGINT NOT NULL,
  deployment VARCHAR NOT NULL,
  file_id VARCHAR NOT NULL,
  recnum INTEGER NOT NULL,
  f1 VARCHAR,
  f2 VARCHAR,
  f3 VARCHAR,
  f4 BIGINT,
...
 CONSTRAINT pkey PRIMARY KEY (period, deployment, file_id, recnum)
) IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10,SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';

--- some index def’s – same error occurs with or without them

Pig script:

register $phoenix_jar;
register $udf_jar;

Z = load '$data' as (
file_id,
recnum,
period,
deployment,
... more fields
);

-- put it all together and generate final output!
D = foreach Z generate
period,
deployment,
file_id,
recnum ,
... more fields;

STORE D into 'hbase://$table_name' using org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
1000');

Error:
2014-12-05 14:24:06,450 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR:
Unable to process column RECNUM:INTEGER, innerMessage=java.lang.String cannot be coerced to
INTEGER
2014-12-05 14:24:06,450 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce
job(s) failed!
2014-12-05 14:24:06,452 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script
Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.4.0.2.1.5.0-695 0.12.1.2.1.5.0-695 perko 2014-12-05 14:23:17 2014-12-05 14:24:06 UNKNOWN

Based on the error it would seem that some non-integer value cannot be cast to an integer.
 But the data does not show this.  Stepping through the Pig script and running "dump" on each
variable
shows the data in the right place and the right coercible type – for example the recnum
has nothing but single digits of sample data.

I have tried to set "recnum" to an int in pig but this just pushes the error up to the previous
field - file_id:

ERROR 2999: Unexpected internal error. Unable to process column FILE_ID:VARCHAR, innerMessage=java.lang.Integer
cannot be coerced to VARCHAR

Other times I get a different error:

Unable to process column _SALT:BINARY, innerMessage=org.apache.phoenix.schema.TypeMismatchException:
ERROR 203 (22005): Type mismatch. BINARY cannot be coerced to LONG

Is there something obvious I am doing wrong?  Did something significant change between 4.0
and 4.2.x in this regard?  I would not rule out some silly user error I inadvertently introduced
:-/

Thanks for your help
Ralph


Mime
View raw message