phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Boado <pedro.bo...@gmail.com>
Subject Help: setting hbase row timestamp in phoenix upserts ?
Date Wed, 29 Nov 2017 19:46:21 GMT
Hi,

I'm looking for a little bit of help trying to get some light over
ROW_TIMESTAMP.

Some background over the problem ( simplified ) : I'm working in a project
that needs to create a "enriched" replica of a RBDMS table based on a
stream of cdc changes off that table.

Each cdc event contains the timestamp of the change plus all the column
values 'before' and 'after' the change . And each event is pushed to a
kafka topic.  Because of certain "non-negotiable" design decisions kafka
guarantees delivering each event at least once, but doesn't guarantee
ordering for changes over the same row in the source table.

The final step of the kafka-based flow is sinking the information into
HBase/Phoenix.

As I cannot get in order delivery guarantee from Kafka I need to use the
cdc event timestamp to ensure that HBase keeps the latest change over a row.

This fits perfectly well with an HBase table design with VERSIONS=1 and
using the source event timestamp as HBase row/cells timestamp

The thing is that I cannot find a way to define the value of the HBase cell
from a Phoenix upsert.

I came across the ROW_TIMESTAMP functionality, but I've just found ( I'm
devastated now ) that the ROW_TIMESTAMP columns store the date in both
hbase's cell timestamp and in the primary key, meaning that I cannot
leverage that functionality to keep only the latest change.

Is there a way of defining hbase's row timestamp when doing the UPSERT -
even by setting it through some obscure hidden jdbc property - ?

I want to avoid by all means doing a checkAndPut as the volume of changes
is going to be quite bug.



-- 
Un saludo.
Pedro Boado.

Mime
View raw message