phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <>
Subject Re: Replication?
Date Thu, 11 Dec 2014 03:27:32 GMT
bq. Then we run a Phoenix DDL SQL script to create the views.
In your CREATE TABLE statements, use the following syntax:
This will prevent an error message from occurring if the replication
of SYSTEM.CATALOG rows for the table occur before the Phoenix DDL
statement is run.

bq. There is no indexes yet but for sure we will want them too.
You mentioned that you're creating views, but if you're planning on
using indexes, I'd recommend creating tables, not views so that your
secondary indexes are maintained wrt your data tables automatically by

bq. So based on what you said below, can I "simply" add the
replication scope to ALL the tables including all Phoenix tables, both
Yes, that sounds correct, but I'm not familiar with the particular
HBase settings for replication.

bq. Regarding the sequence number, how can we bump it when we detect a
failure of the other cluster?
// Set autocommit to true on your connection first for better performance.
// connection.setAutoCommit(true);

bq. And related question, how to safely detect the failure ;)
You mean how can you detect when a cluster has failed? Good question
for the HBase dev list :-)

In the spirit of "more information is better", let me try to
illustrate the potential problem with sequences when a cluster failure
occurs through an example.
1) assume table T and sequence S are being used in the application
with a statement like this:
2) to increment the sequence (that's what the NEXT VALUE FOR will do),
Phoenix will make an RPC to allocate 1000 sequence values by
incrementing the CURRENT_VALUE of S in the SYSTEM.SEQUENCE table by
1000 (the sequence is represented by a row in the SYSTEM.SEQUENCE
table with the 1000 coming from the CACHE value when you create the
3) the client will dole out sequences from these 1000 and once
exhausted will rinse and repeat with (2) again.
4) let's say that the following sequence of events occurs:
a) the SYSTEM.SEQUENCE row S is incremented by 1000
b) the client commits rows that use one or more of these sequences
c) the rows that use these new sequence values are replicated to the
other cluster.
d) the cluster goes down after the commit, but before the replication
of the increment of the SYSTEM.SEQUENCE row

Now you have an issue, because when you fail over to the other
cluster, the sequence value won't have been incremented, but the rows
using the new sequence values were replicated.

So, if you bump up the sequence values, you can lower the possibility
of this corner case occurring. Note that it's also possible that more
than one 1000 batch of sequence values were allocated before the
SYSTEM.SEQUENCE row was replicated. If the rate of data insertion is
very high, then in theory you wouldn't know by how much to bump up the
sequence values.

Andrew pointed out one way around this by allocating IDs through a
stateless mechanism (PHOENIX-1422) which seems like a good solution
for many use cases (often sequences don't need to be monotonically
increasing). Another solution if that doesn't work would be if the
SYSTEM.SEQUENCE table could be replicated synchronously (HBASE-12672).



On Wed, Dec 10, 2014 at 7:42 AM, Jean-Marc Spaggiari
<> wrote:
> Thanks James (And Andrew). I think there can not be to much information. The
> more information we share, the more knowledge we get.
> So here is the situation.
> We have 2 clusters that we want to configure in master/master mode.
> Application is built using HBase 0.98 and Phoenix.
> We deploy our HBase Schema with an RPM. This creates all the tables we need
> and activate the replication for all of them. Then we run a Phoenix DDL SQL
> script to create the views. We do this on both clusters so they are
> identical, only the peer ID changes.
> There is no indexes yet but for sure we will want them too.
> The goal is to have 2 identical clusters with the same performances in case
> one of them fails.
> So based on what you said below, can I "simply" add the replication scope to
> ALL the tables including all Phoenix tables, both ways?
> Regarding the sequence number, how can we bump it when we detect a failure
> of the other cluster? And related question, how to safely detect the failure
> ;)
> Thanks,
> JM
> 2014-12-09 20:48 GMT-05:00 James Taylor <>:
>> No, we're not saying to avoid replication: at SFDC, we rely on
>> replication to provide an active/active configuration for failover.
>> Lars H. & co. can explain in more detail, but there are some nuances
>> of which you should be aware. For example, the HBase table metadata
>> needs to exist on both clusters. How is this done in your environment?
>> One way to do this is the run the Phoenix DDL statements on both
>> sides, but this requires some extra processing, as replication won't
>> know about Phoenix DDL.
>> Whether or not you replicate indexes depends on 1) how much your use
>> case depends on them - if they're not available, will crucial queries
>> become so slow that it's as if the system is down?, and 2) the size of
>> your data and how long it takes to regenerate the index. Our current
>> thinking is to replicate the indexes just as we replicate tables (an
>> index just looks like any other HBase table as far as HBase is
>> concerned), as we want to be able to failover immediately without
>> performance degradation.
>> As far as replicating the SYSTEM.CATALOG table, that's important
>> depending on your use case as well. If you're using views (including
>> multi-tenant tables) that are created dynamically/on-the-fly, then
>> you'd likely want to replicate this table as otherwise this DDL has
>> the potential to be lost. Adding the IF NOT EXISTS that Andrew
>> referred to would prevent an error message when running the DDL on the
>> secondary cluster if the row from the SYSTEM.CATALOG table was already
>> replicated.
>> For the SYSTEM.SEQUENCE table, as Andrew pointed out, we allocate
>> chunks of sequences and dole them out on the client. You'd want to
>> replicate this table, as otherwise when you switch to the other
>> cluster, you'd start repeating the same sequence values. Once
>> replicated, if the primary cluster goes down, then the sequences will
>> pick up at the value after the already allocated chunk (which is fine,
>> as it's fine to have "holes" in the sequence values that get doled
>> out). There is a potential for a race condition if the primary cluster
>> returns a batch of new sequences and then dies before replicating the
>> updated sequence value to the other cluster. This can be mitigated, as
>> Andrew points out by bumping up the sequence values on a failover
>> event.
>> HTH. Maybe more information than you wanted? Tell us more about how
>> you're relying on replication when you get a chance.
>> Thanks,
>> James
>> On Tue, Dec 9, 2014 at 5:00 PM, Jean-Marc Spaggiari
>> <> wrote:
>> > Hum. Thanks for al those updates.
>> >
>> > So are we saying that master/master HBase replication should be avoided
>> > when
>> > using Phoenix with latest stable version?
>> >
>> > 2014-12-09 19:51 GMT-05:00 Andrew Purtell <>:
>> >
>> >> You also need to replicate the Phoenix system tables. It's still
>> >> necessary
>> >> to run DDL operations on both clusters to keep Phoenix schema and HBase
>> >> tables in sync. Use IF EXISTS or IF NOT EXISTS to avoid DDL statement
>> >> failures. Phoenix should do the right thing. If not, it's a bug.
>> >>
>> >> The sequence table is interesting. The Phoenix client caches a range of
>> >> sequence values to use when inserting data that include generated
>> >> sequence
>> >> values. You'll want to always grab a new cached range of sequence
>> >> values
>> >> when failing over from one site to another and back to avoid potential
>> >> duplication. It's possible upon site failure that the latest updates to
>> >> the
>> >> sequence table did not replicate. Or,
>> >> would side step this
>> >> issue if implemented.
>> >>
>> >>
>> >> On Mon, Dec 8, 2014 at 10:22 PM, Jeffrey Zhong <>
>> >> wrote:
>> >>>
>> >>>
>> >>> You need to enable replication on both data & index table in Hbase
>> >>> level
>> >>> using Phoenix 4.2(previous 4.2 Phoenix version may have issues on
>> >>> local
>> >>> index). There is a test case MutableIndexReplicationIT where you can
>> >>> see
>> >>> some details. Ideally Phoenix should provide a customer replication
>> >>> sink so
>> >>> that a user doesn't have to setup replication on index table.
>> >>>
>> >>> From: Jean-Marc Spaggiari <>
>> >>> Reply-To: <>
>> >>> Date: Monday, December 8, 2014 at 9:29 AM
>> >>> To: user <>
>> >>> Subject: Replication?
>> >>>
>> >>> Hi,
>> >>>
>> >>> How do we replicate data between 2 cluster when Phoenix is in the
>> >>> picture?
>> >>>
>> >>> Can we simply replicate the table we want from A to B and on cluster
>> >>> Phoenix will do the required re-indexing? Or should we also replicate
>> >>> the
>> >>> Phoenix tables too?
>> >>>
>> >>> Thanks,
>> >>>
>> >>> JM
>> >>>
>> >>> NOTICE: This message is intended for the use of the individual or
>> >>> entity
>> >>> to which it is addressed and may contain information that is
>> >>> confidential,
>> >>> privileged and exempt from disclosure under applicable law. If the
>> >>> reader of
>> >>> this message is not the intended recipient, you are hereby notified
>> >>> that any
>> >>> printing, copying, dissemination, distribution, disclosure or
>> >>> forwarding of
>> >>> this communication is strictly prohibited. If you have received this
>> >>> communication in error, please contact the sender immediately and
>> >>> delete it
>> >>> from your system. Thank You.
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Best regards,
>> >>
>> >>    - Andy
>> >>
>> >> Problems worthy of attack prove their worth by hitting back. - Piet
>> >> Hein
>> >> (via Tom White)
>> >
>> >

View raw message