phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Secondary Indexes - Missing Data in Phoenix
Date Thu, 25 Jul 2019 14:00:25 GMT
Local indexes are stored in the same table as the data. They are "local" 
to the data.

I would not be surprised if you are running into issues because you are 
using such an old version of Phoenix.

On 7/24/19 10:35 PM, Alexander Lytchier wrote:
> Hi,
> 
> We are currently using Cloudera as a package manager for our Hadoop 
> Cluster with Phoenix 4.7.0 (CLABS_PHOENIX)and HBase 1.2.0-cdh5.7.6. 
> Phoenix 4.7.0 appears to be the latest version supported 
> (http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/) even 
> though it’s old.
> 
> The table in question has a binary row-key: pk BINARY(30): 1 Byte for 
> salting, 8 Bytes - timestamp (Long), 20 Bytes - hash result of other 
> record fields. + 1 extra byte for unknown issue about updating schema in 
> future (not sure if relevant). We are currently facing performance 
> issues and are attempting to mitigate it by adding secondary indexes.
> 
> When generating a local index synchronously with the following command:
> 
> CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”);
> 
> I can see that the resulting index table in Phoenix is populated, in 
> HBase I can see the row-key of the index table and queries work as expected:
> 
> \x00\x171545413\x00 column=cf:cf:type, timestamp=1563954319353, 
> value=1545413
> 
> \x00\x00\x00\x01b\xB2s\xDB
> 
> @\x1B\x94\xFA\xD4\x14c\x0B
> 
> d$\x82\xAD\xE6\xB3\xDF\x06
> 
> \xC9\x07@\xB9\xAE\x00
> 
> However, for the case where the index is created asynchronously, and 
> then populated using the IndexTool, with the following commands:
> 
> 
> CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”) ASYNC;
> 
> sudo -u hdfs HADOOP_CLASSPATH=`hbase classpath` hadoop jar 
> /opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hbase/bin/../lib/hbase-client-1.2.0-cdh5.7.1.jar

> org.apache.phoenix.mapreduce.index.IndexTool --data-table "MyTable" 
> --index-table INDEX_TABLE --output-path hdfs://nameservice1/
> 
> I get the following row-key in HBase:
> 
> 
> \x00\x00\x00\x00\x00\x00\x column=cf:cf:type, timestamp=1563954000238, 
> value=1545413
> 
> 00\x00\x00\x00\x00\x00\x00
> 
> \x00\x00\x00\x00\x00\x00\x
> 
> 00\x00\x00\x00\x00\x00\x00
> 
> \x00\x00\x00\x00\x00\x00\x
> 
> 151545413\x00\x00\x
> 
> 00\x00\x01b\xB2s\xDB@\x1B\
> 
> x94\xFA\xD4\x14c\x0Bd$\x82
> 
> \xAD\xE6\xB3\xDF\x06\xC9\x
> 
> 07@\xB9\xAE\x00
> 
> It is has 32 additional 0-bytes (\x00). Why is there a difference – is 
> one expected? What’s more, the index table in Phoenix is empty (I guess 
> it’s not able to read the underlying HBase index table with that key?), 
> so any queries that use the local index in Phoenix return no value.
> 
> Do you have any suggestions? We must use the /async /method to populate 
> the index table on production because of the massive amounts of data, 
> but if Phoenix is not able to read the index table it cannot be used for 
> queries.
> 
> Is it possible this issue has been fixed in a newer version?
> 
> Thanks
> 

Mime
View raw message