phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Lytchier <AlexanderLytch...@m800.com>
Subject Secondary Indexes - Missing Data in Phoenix
Date Thu, 25 Jul 2019 02:35:47 GMT
Hi,

We are currently using Cloudera as a package manager for our Hadoop Cluster with Phoenix 4.7.0
(CLABS_PHOENIX) and HBase 1.2.0-cdh5.7.6. Phoenix 4.7.0 appears to be the latest version supported
(http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/) even though it’s old.

The table in question has a binary row-key: pk BINARY(30): 1 Byte for salting, 8 Bytes - timestamp
(Long), 20 Bytes - hash result of other record fields. + 1 extra byte for unknown issue about
updating schema in future (not sure if relevant). We are currently facing performance issues
and are attempting to mitigate it by adding secondary indexes.

When generating a local index synchronously with the following command:

CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”);

I can see that the resulting index table in Phoenix is populated, in HBase I can see the row-key
of the index table and queries work as expected:

\x00\x171545413\x00 column=cf:cf:type, timestamp=1563954319353, value=1545413
\x00\x00\x00\x01b\xB2s\xDB
@\x1B\x94\xFA\xD4\x14c\x0B
d$\x82\xAD\xE6\xB3\xDF\x06
\xC9\x07@\xB9\xAE\x00

However, for the case where the index is created asynchronously, and then populated using
the IndexTool, with the following commands:

CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”) ASYNC;

sudo -u hdfs HADOOP_CLASSPATH=`hbase classpath` hadoop jar /opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hbase/bin/../lib/hbase-client-1.2.0-cdh5.7.1.jar
org.apache.phoenix.mapreduce.index.IndexTool --data-table "MyTable" --index-table INDEX_TABLE
--output-path hdfs://nameservice1/

I get the following row-key in HBase:

\x00\x00\x00\x00\x00\x00\x column=cf:cf:type, timestamp=1563954000238, value=1545413
00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x
00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x
151545413\x00\x00\x
00\x00\x01b\xB2s\xDB@\x1B\
x94\xFA\xD4\x14c\x0Bd$\x82
\xAD\xE6\xB3\xDF\x06\xC9\x
07@\xB9\xAE\x00

It is has 32 additional 0-bytes (\x00). Why is there a difference – is one expected? What’s
more, the index table in Phoenix is empty (I guess it’s not able to read the underlying
HBase index table with that key?), so any queries that use the local index in Phoenix return
no value.

Do you have any suggestions? We must use the async method to populate the index table on production
because of the massive amounts of data, but if Phoenix is not able to read the index table
it cannot be used for queries.

Is it possible this issue has been fixed in a newer version?

Thanks
Mime
View raw message