phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Exception of phoenix hive integration
Date Mon, 16 Apr 2018 15:34:59 GMT
Hi Wei,

Have you searched JIRA for issues that relate to the Phoenix-Hive 
integration? There have been a few in the recent past around invalid 
queries being generated, especially around column names.

On 4/14/18 7:39 PM, Lu Wei wrote:
> ## Version:
> 
> phoenix: 4.13.2-cdh5.11.2
> 
> hive: 1.1.0-cdh5.11.2
> 
> 
> There is an ColumnNotFoundException when joining a hive internal table 
> with an Phoenix external table.
> 
> 
> 
> ## Table1: phoenix external table "ext_tmp":
> 
> +-----------+------------+--------------------+--+
> 
> | col_name  | data_type  |      comment       |
> 
> +-----------+------------+--------------------+--+
> 
> | cola      | string     | from deserializer  |
> 
> | colb      | string     | from deserializer  |
> 
> +-----------+------------+--------------------+--+
> 
> 
> ### Backend Phoenix table "TMP":
> 
> select * from TMP;
> 
> *+-------+------------+*
> 
> *| **cola** |**colb** |*
> 
> *+-------+------------+*
> 
> *| *a * | *aaaa * |*
> 
> *| *b * | *bbaaaa * |*
> 
> *| *ccc * | *cccbbaaaa* |*
> 
> *+-------+------------+*
> 
> *
> *
> 
> ### Hive external table creation statement:
> 
> 
> +----------------------------------------------------+--+
> 
> |                   createtab_stmt                   |
> 
> +----------------------------------------------------+--+
> 
> | CREATE EXTERNAL TABLE `ext_tmp`(                   |
> 
> |   `cola` string COMMENT 'from deserializer',       |
> 
> |   `colb` string COMMENT 'from deserializer')       |
> 
> | ROW FORMAT SERDE                                   |
> 
> |   'org.apache.phoenix.hive.PhoenixSerDe'           |
> 
> | STORED BY                                          |
> 
> |   'org.apache.phoenix.hive.PhoenixStorageHandler'  |
> 
> | WITH SERDEPROPERTIES (                             |
> 
> |   'serialization.format'='1')                      |
> 
> | LOCATION                                           |
> 
> |   'hdfs://st:8020/data/user/hive/warehouse/ext_tmp'|
> 
> | TBLPROPERTIES (                                    |
> 
> |   'phoenix.column.mapping'='cola:cola,colb:colb',  |
> 
> |   'phoenix.rowkeys'='cola',                        |
> 
> |   'phoenix.table.name'='tmp',                      |
> 
> |   'phoenix.zookeeper.client.port'='2181',          |
> 
> |   'phoenix.zookeeper.quorum'='st1,st2,st3',        |
> 
> |   'phoenix.zookeeper.znode.parent'='/hbase',       |
> 
> |   'transient_lastDdlTime'='1523607352')            |
> 
> +----------------------------------------------------+--+
> 
> 
> 
> ## Table2: hive internal table "native1":
> 
> +-----------+------------+----------+--+
> 
> | col_name  | data_type  | comment  |
> 
> +-----------+------------+----------+--+
> 
> | cola      | string     |          |
> 
> | colb      | string     |          |
> 
> +-----------+------------+----------+--+
> 
> 
> 
> ## When join the two tables:
> 
> 
> select *  from native1  join  ext_tmp t on native1.cola= t.cola;
> 
> 
> 
> Exception:
> 
> org.apache.phoenix.schema.ColumnNotFoundException: ERROR 504 (42703): 
> Undefined column. columnName=TMP
> 
> 
> Detailed exception as below. There is an empty column "" as readcolumn 
> name, which is not exsit in Pheonix table at all. so the phoenix query 
> will never be correct: select /*+ NO_CACHE  */ "","cola","colb" from tmp 
> where "cola" is not null
> 
> 
> Any thoughts?
> 
> 
> 
> ---------
> 
> 2018-04-14 21:13:40,923 INFO 
>   org.apache.hadoop.hive.ql.io.HiveInputFormat: 
> [HiveServer2-Background-Pool: Thread-304]: hive.io.file.readcolumn.ids=
> 
> 2018-04-14 21:13:40,923 INFO 
>   org.apache.hadoop.hive.ql.io.HiveInputFormat: 
> [HiveServer2-Background-Pool: Thread-304]: 
> hive.io.file.readcolumn.names=,cola,colb
> 
> 2018-04-14 21:13:40,923 INFO 
>   org.apache.hadoop.hive.ql.io.HiveInputFormat: 
> [HiveServer2-Background-Pool: Thread-304]: Generating splits
> 
> 2018-04-14 21:13:40,924 INFO 
>   org.apache.phoenix.hive.query.PhoenixQueryBuilder: 
> [HiveServer2-Background-Pool: Thread-304]: Input query : select /*+ 
> NO_CACHE  */ "","cola","colb" from tmp where "cola" is not null
> 
> 2018-04-14 21:13:40,932 ERROR 
> org.apache.phoenix.hive.mapreduce.PhoenixInputFormat: 
> [HiveServer2-Background-Pool: Thread-304]: Failed to get the query plan 
> with error [ERROR 504 (42703): Undefined column. columnName=TMP]
> 
> 
> 
> Thanks,
> 
> Wei
> 

Mime
View raw message