Hi, while importing data using the CsvBulkLoadTool I’ve run into an issue trying to query the data using sqlline.py.  The bulk load tool was successful.  There were no errors.  However when I attempt to query the data I get some exceptions:

 

java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException

        at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2440)

 

followed by many GlobalMemoryManger errors:

 

WARN memory.GlobalMemoryManager: Orphaned chunk of xxxx bytes found during finalize

 

Not all queries, but most, produce this error and it seems related to the existence of a secondary index table:

 

select * from TABLE limit 10;  --ERROR – index not used

select <un-indexed field> from TABLE limit 10 -- ERROR

 

If I run a query on an INTEGER column with a secondary index I do not get this error:

 

select distinct(fieldx) from TABLE limit 10;  -- SUCCESS!

 

However, a similar query on an indexed VARCHAR field produces a timeout error:

java.lang.RuntimeException: … PhoenixIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?

        at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2440)

 

select count(*) … times out as well

 

Details:

Total records imported: 7.2B

Cluster size: 30 nodes

Splits: 40 (salted)

 

Phoenix version: 4.2.0

HBase version: 0.98

HDP distro 2.1.5

 

I can scan the data with no errors from hbase shell

 

Basic Phoenix table def:

 

CREATE TABLE IF NOT EXISTS

t1_csv_data

(

timestamp BIGINT NOT NULL,

location VARCHAR NOT NULL,

fileid VARCHAR NOT NULL,

recnum INTEGER NOT NULL,

field5 VARCHAR,

...

field45 VARCHAR,

CONSTRAINT pkey PRIMARY KEY (timestamp,

location, fileid,recnum)

)

IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=40, SPLIT_POLICY=’org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy’;

 

-- indexes

CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1) COMPRESSION='SNAPPY', SPLIT_POLICY=’org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy’;;

CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2) COMPRESSION='SNAPPY', SPLIT_POLICY=’org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy’;;

CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3) COMPRESSION='SNAPPY', SPLIT_POLICY=’org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy’;;

 

Thanks for your help,

Ralph