phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vamsi Krishna <vamsi.attl...@gmail.com>
Subject How to deal with Phoenix Secondary Indexes & HBase snapshots in batch job
Date Tue, 28 Jun 2016 14:09:46 GMT
Team,

We are using HDP 2.3.2 (HBase 1.1.2, Phoenix 4.4.0).

We have two Phoenix tables 'TABLE_A', 'TABLE_B' and a Phoenix view
'TABLE_VIEW'.
Phoenix view is always pointing to one of the above two Phoenix tables
which is called Active table and the other table is called Standby table.
We have a batch job (Oozie workflow) that executes every night to process
some data files and insert data into Phoenix table.
In one of the Oozie actions we do:
Figure out which Phoenix table is Active/Standby. We do this by querying
phoenix meta table (SYSTEM.CATALOG) to check which phoenix table is the
phoenix view pointing to.
Drop Phoenix standby table.
Create HBase snapshot of HBase Active table.
Clone the above snapshot to create HBase Standby table.
Create Phoenix Standby table pointing to the HBase Standby table cloned
from HBase Active table in the above step.
By this point we are able to get Phoenix Standby table to the same state of
Phoenix Active table without any actual movement of data.
Now, we will process the new data files and insert the data into Phoenix
Standby table using Phoenix CsvBulkLoadTool.
At the end we flip the Phoenix view to point to the Phoenix Standby table.

*New requirement:*
With a need for secondary access pattern, we are planning on adding
secondary index (local index) on one of the columns of the Phoenix table.
Now in the Oozie action detailed above, recreating the HBase Standby table
using snapshot of HBase Active table and recreating Phoenix Standby table
on top of the HBase Standby table is not going to create the secondary
index on the Phoenix Standby table.
Ths is because the data table and the index table are completely
independent in HBase. Please correct me if my assumption is wrong.
One option that I can think of here is to create the secondary index on the
Phoenix Standby table after processing the data files and inserting data
using Phoenix CsvBulkLoadTool.
But, as the table volume keeps getting bigger the above step is going to
take more and more time.
What are the other alternative solutions for this scenario?

*Idea:*
After recreating the HBase Standby table using snapshot of HBase Active
table and recreating Phoenix Standby table on top of the HBase Standby
table, create HBase index table for Standby table using snapshot of HBase
index table of Active table.
Create secondary index on the Phoenix Standby table pointing to the HBase
index table created above.
Is this possible?


Thanks,

Vamsi Attluri
-- 
Vamsi Attluri

Mime
View raw message