phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Kiran <maghamraviki...@gmail.com>
Subject Re: pig and phoenix
Date Tue, 09 Dec 2014 04:13:46 GMT
Hi Ralph
   Glad that worked partly.   For the issue that you are mentioning I am
not sure of any easy way out as there could be some rows with null column
values

Regards
Ravi Magham.

On Monday, December 8, 2014, Perko, Ralph J <Ralph.Perko@pnnl.gov> wrote:

>  Ravi,
>
>  Your suggestion worked – thank you!
>
>  But I am now getting a
> org.apache.phoenix.schema.ConstraintViolationException on some data files.
>
>  "T1_LOG_DNS.PERIOD may not be null”
>
>  However there is no record with a null value for this field.
>
>  I tried hardcoding a value in the pig script to see if I could get past
> this error and it just moved the error to the next field:
>
>  "T1_LOG_DNS.DEPLOYMENT may not be null”
>
>  This is an intermittent error and does not happen with every file but
> does have consistently with the same file.
>
>  Thank you for the help
>
>  Ralph
>
>
>    __________________________________________________
> *Ralph Perko*
> Pacific Northwest National Laboratory
>   (509) 375-2272
> ralph.perko@pnnl.gov
> <javascript:_e(%7B%7D,'cvml','ralph.perko@pnnl.gov');>
>
>
>   From: Ravi Kiran <maghamravikiran@gmail.com
> <javascript:_e(%7B%7D,'cvml','maghamravikiran@gmail.com');>>
> Reply-To: "user@phoenix.apache.org
> <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>" <
> user@phoenix.apache.org
> <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>>
> Date: Friday, December 5, 2014 at 3:20 PM
> To: "user@phoenix.apache.org
> <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>" <
> user@phoenix.apache.org
> <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>>
> Subject: Re: pig and phoenix
>
>   Hi Ralph.
>     Can you please try to modify the STORE command in the script to the
> following.
>    STORE D into 'hbase://$table_name/period,deployment,file_id, recnum'
> using org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
> 1000');
>
>  Primarily, Phoenix generates the default UPSERT query to the table and
> it assumes the order to be that of the columns mentioned in your CREATE
> table. In your case, I see you are reordering the columns during the STORE
> command . Hence, with the above change, Phoenix constructs the right UPSERT
> query for you with the columns you mention after $table_name.
>
>  Also, to have the look at the query Phoenix has generated, you should
> see a log entry which starts with  "
> *Phoenix Generic Upsert Statement: *
> That also will give insights into the UPSERT query.
>
>  Happy to help!!
>
> Regards
> Ravi
>
>
> On Fri, Dec 5, 2014 at 2:57 PM, Perko, Ralph J <Ralph.Perko@pnnl.gov
> <javascript:_e(%7B%7D,'cvml','Ralph.Perko@pnnl.gov');>> wrote:
>
>>  Hi, I wrote a series of pig scripts to load data that were working well
>> with 4.0, but since upgrading  to 4.2.x (4.2.1 currently) are now failing.
>>
>>  Here is an example:
>>
>>  Table def:
>> CREATE TABLE IF NOT EXISTS t1_log_dns
>> (
>>   period BIGINT NOT NULL,
>>   deployment VARCHAR NOT NULL,
>>   file_id VARCHAR NOT NULL,
>>   recnum INTEGER NOT NULL,
>>   f1 VARCHAR,
>>   f2 VARCHAR,
>>   f3 VARCHAR,
>>   f4 BIGINT,
>> ...
>>  CONSTRAINT pkey PRIMARY KEY (period, deployment, file_id, recnum)
>> )
>> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10,SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
>>
>>  --- some index def’s – same error occurs with or without them
>>
>>  Pig script:
>>
>>  register $phoenix_jar;
>> register $udf_jar;
>>
>>  Z = load '$data' as (
>> file_id,
>> recnum,
>> period,
>> deployment,
>> ... more fields
>> );
>>
>>  -- put it all together and generate final output!
>> D = foreach Z generate
>> period,
>> deployment,
>> file_id,
>> recnum ,
>> ... more fields;
>>
>>  STORE D into 'hbase://$table_name' using
>> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize 1000');
>>
>>  Error:
>> 2014-12-05 14:24:06,450 [main] ERROR
>> org.apache.pig.tools.pigstats.SimplePigStats - ERROR: Unable to process
>> column RECNUM:INTEGER, innerMessage=java.lang.String cannot be coerced to
>> INTEGER
>> 2014-12-05 14:24:06,450 [main] ERROR
>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>> 2014-12-05 14:24:06,452 [main] INFO
>>  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
>>
>>  HadoopVersion PigVersionUserId StartedAtFinishedAt Features
>> 2.4.0.2.1.5.0-695 0.12.1.2.1.5.0-695perko 2014-12-05 14:23:172014-12-05
>> 14:24:06 UNKNOWN
>>
>>  Based on the error it would seem that some non-integer value cannot be
>> cast to an integer.  But the data does not show this.  Stepping through the
>> Pig script and running "dump" on each variable
>> shows the data in the right place and the right coercible type – for
>> example the recnum has nothing but single digits of sample data.
>>
>>  I have tried to set "recnum" to an int in pig but this just pushes the
>> error up to the previous field - file_id:
>>
>>  ERROR 2999: Unexpected internal error. Unable to process column
>> FILE_ID:VARCHAR, innerMessage=java.lang.Integer cannot be coerced to VARCHAR
>>
>>  Other times I get a different error:
>>
>>  Unable to process column _SALT:BINARY,
>> innerMessage=org.apache.phoenix.schema.TypeMismatchException: ERROR 203
>> (22005): Type mismatch. BINARY cannot be coerced to LONG
>>
>>  Is there something obvious I am doing wrong?  Did something significant
>> change between 4.0 and 4.2.x in this regard?  I would not rule out some
>> silly user error I inadvertently introduced :-/
>>
>>  Thanks for your help
>>  Ralph
>>
>>
>

Mime
View raw message