phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Perko, Ralph J" <Ralph.Pe...@pnnl.gov>
Subject Re: pig and phoenix
Date Wed, 10 Dec 2014 22:00:11 GMT
Hi ,

After adding the four column names of the primary key to the hbase store command the behavior
I now see is just those four fields are being loaded (if I add the other column names they
are indeed loaded).  Looking at the PhoenixHBaseStorage code it would appear that if no fields
are passed in that the fields are grabbed from Phoenix otherwise it uses only what is passed
in, ignoring anything else.  Is this by design or something new?  I do not recall needing
to do this in 4.0.
Is there a syntax to use all columns rather than specify each column?

Thank you!
Ralph

From: Ravi Kiran <maghamravikiran@gmail.com<mailto:maghamravikiran@gmail.com>>
Reply-To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Date: Monday, December 8, 2014 at 8:13 PM
To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Subject: Re: pig and phoenix

Hi Ralph
   Glad that worked partly.   For the issue that you are mentioning I am not sure of any easy
way out as there could be some rows with null column values

Regards
Ravi Magham.

On Monday, December 8, 2014, Perko, Ralph J <Ralph.Perko@pnnl.gov<mailto:Ralph.Perko@pnnl.gov>>
wrote:
Ravi,

Your suggestion worked – thank you!

But I am now getting a org.apache.phoenix.schema.ConstraintViolationException on some data
files.

"T1_LOG_DNS.PERIOD may not be null”

However there is no record with a null value for this field.

I tried hardcoding a value in the pig script to see if I could get past this error and it
just moved the error to the next field:

"T1_LOG_DNS.DEPLOYMENT may not be null”

This is an intermittent error and does not happen with every file but does have consistently
with the same file.

Thank you for the help

Ralph


__________________________________________________
Ralph Perko
Pacific Northwest National Laboratory
(509) 375-2272
ralph.perko@pnnl.gov<javascript:_e(%7B%7D,'cvml','ralph.perko@pnnl.gov');>


From: Ravi Kiran <maghamravikiran@gmail.com<javascript:_e(%7B%7D,'cvml','maghamravikiran@gmail.com');>>
Reply-To: "user@phoenix.apache.org<javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>"
<user@phoenix.apache.org<javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>>
Date: Friday, December 5, 2014 at 3:20 PM
To: "user@phoenix.apache.org<javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>"
<user@phoenix.apache.org<javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>>
Subject: Re: pig and phoenix

Hi Ralph.
   Can you please try to modify the STORE command in the script to the following.
   STORE D into 'hbase://$table_name/period,deployment,file_id, recnum' using org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
1000');

Primarily, Phoenix generates the default UPSERT query to the table and it assumes the order
to be that of the columns mentioned in your CREATE table. In your case, I see you are reordering
the columns during the STORE command . Hence, with the above change, Phoenix constructs the
right UPSERT query for you with the columns you mention after $table_name.

Also, to have the look at the query Phoenix has generated, you should see a log entry which
starts with  "Phoenix Generic Upsert Statement:
That also will give insights into the UPSERT query.

Happy to help!!

Regards
Ravi


On Fri, Dec 5, 2014 at 2:57 PM, Perko, Ralph J <Ralph.Perko@pnnl.gov<javascript:_e(%7B%7D,'cvml','Ralph.Perko@pnnl.gov');>>
wrote:
Hi, I wrote a series of pig scripts to load data that were working well with 4.0, but since
upgrading  to 4.2.x (4.2.1 currently) are now failing.

Here is an example:

Table def:
CREATE TABLE IF NOT EXISTS t1_log_dns
(
  period BIGINT NOT NULL,
  deployment VARCHAR NOT NULL,
  file_id VARCHAR NOT NULL,
  recnum INTEGER NOT NULL,
  f1 VARCHAR,
  f2 VARCHAR,
  f3 VARCHAR,
  f4 BIGINT,
...
 CONSTRAINT pkey PRIMARY KEY (period, deployment, file_id, recnum)
) IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10,SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';

--- some index def’s – same error occurs with or without them

Pig script:

register $phoenix_jar;
register $udf_jar;

Z = load '$data' as (
file_id,
recnum,
period,
deployment,
... more fields
);

-- put it all together and generate final output!
D = foreach Z generate
period,
deployment,
file_id,
recnum ,
... more fields;

STORE D into 'hbase://$table_name' using org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
1000');

Error:
2014-12-05 14:24:06,450 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR:
Unable to process column RECNUM:INTEGER, innerMessage=java.lang.String cannot be coerced to
INTEGER
2014-12-05 14:24:06,450 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce
job(s) failed!
2014-12-05 14:24:06,452 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script
Statistics:

HadoopVersion PigVersionUserIdStartedAtFinishedAtFeatures
2.4.0.2.1.5.0-695 0.12.1.2.1.5.0-695perko2014-12-05 14:23:172014-12-05 14:24:06UNKNOWN

Based on the error it would seem that some non-integer value cannot be cast to an integer.
 But the data does not show this.  Stepping through the Pig script and running "dump" on each
variable
shows the data in the right place and the right coercible type – for example the recnum
has nothing but single digits of sample data.

I have tried to set "recnum" to an int in pig but this just pushes the error up to the previous
field - file_id:

ERROR 2999: Unexpected internal error. Unable to process column FILE_ID:VARCHAR, innerMessage=java.lang.Integer
cannot be coerced to VARCHAR

Other times I get a different error:

Unable to process column _SALT:BINARY, innerMessage=org.apache.phoenix.schema.TypeMismatchException:
ERROR 203 (22005): Type mismatch. BINARY cannot be coerced to LONG

Is there something obvious I am doing wrong?  Did something significant change between 4.0
and 4.2.x in this regard?  I would not rule out some silly user error I inadvertently introduced
:-/

Thanks for your help
Ralph



Mime
View raw message