phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Replication?
Date Wed, 10 Dec 2014 01:48:35 GMT
No, we're not saying to avoid replication: at SFDC, we rely on
replication to provide an active/active configuration for failover.
Lars H. & co. can explain in more detail, but there are some nuances
of which you should be aware. For example, the HBase table metadata
needs to exist on both clusters. How is this done in your environment?
One way to do this is the run the Phoenix DDL statements on both
sides, but this requires some extra processing, as replication won't
know about Phoenix DDL.

Whether or not you replicate indexes depends on 1) how much your use
case depends on them - if they're not available, will crucial queries
become so slow that it's as if the system is down?, and 2) the size of
your data and how long it takes to regenerate the index. Our current
thinking is to replicate the indexes just as we replicate tables (an
index just looks like any other HBase table as far as HBase is
concerned), as we want to be able to failover immediately without
performance degradation.

As far as replicating the SYSTEM.CATALOG table, that's important
depending on your use case as well. If you're using views (including
multi-tenant tables) that are created dynamically/on-the-fly, then
you'd likely want to replicate this table as otherwise this DDL has
the potential to be lost. Adding the IF NOT EXISTS that Andrew
referred to would prevent an error message when running the DDL on the
secondary cluster if the row from the SYSTEM.CATALOG table was already
replicated.

For the SYSTEM.SEQUENCE table, as Andrew pointed out, we allocate
chunks of sequences and dole them out on the client. You'd want to
replicate this table, as otherwise when you switch to the other
cluster, you'd start repeating the same sequence values. Once
replicated, if the primary cluster goes down, then the sequences will
pick up at the value after the already allocated chunk (which is fine,
as it's fine to have "holes" in the sequence values that get doled
out). There is a potential for a race condition if the primary cluster
returns a batch of new sequences and then dies before replicating the
updated sequence value to the other cluster. This can be mitigated, as
Andrew points out by bumping up the sequence values on a failover
event.

HTH. Maybe more information than you wanted? Tell us more about how
you're relying on replication when you get a chance.

Thanks,
James



On Tue, Dec 9, 2014 at 5:00 PM, Jean-Marc Spaggiari
<jean-marc@spaggiari.org> wrote:
> Hum. Thanks for al those updates.
>
> So are we saying that master/master HBase replication should be avoided when
> using Phoenix with latest stable version?
>
> 2014-12-09 19:51 GMT-05:00 Andrew Purtell <apurtell@apache.org>:
>
>> You also need to replicate the Phoenix system tables. It's still necessary
>> to run DDL operations on both clusters to keep Phoenix schema and HBase
>> tables in sync. Use IF EXISTS or IF NOT EXISTS to avoid DDL statement
>> failures. Phoenix should do the right thing. If not, it's a bug.
>>
>> The sequence table is interesting. The Phoenix client caches a range of
>> sequence values to use when inserting data that include generated sequence
>> values. You'll want to always grab a new cached range of sequence values
>> when failing over from one site to another and back to avoid potential
>> duplication. It's possible upon site failure that the latest updates to the
>> sequence table did not replicate. Or,
>> https://issues.apache.org/jira/browse/PHOENIX-1422 would side step this
>> issue if implemented.
>>
>>
>> On Mon, Dec 8, 2014 at 10:22 PM, Jeffrey Zhong <jzhong@hortonworks.com>
>> wrote:
>>>
>>>
>>> You need to enable replication on both data & index table in Hbase level
>>> using Phoenix 4.2(previous 4.2 Phoenix version may have issues on local
>>> index). There is a test case MutableIndexReplicationIT where you can see
>>> some details. Ideally Phoenix should provide a customer replication sink so
>>> that a user doesn't have to setup replication on index table.
>>>
>>> From: Jean-Marc Spaggiari <jean-marc@spaggiari.org>
>>> Reply-To: <user@phoenix.apache.org>
>>> Date: Monday, December 8, 2014 at 9:29 AM
>>> To: user <user@phoenix.apache.org>
>>> Subject: Replication?
>>>
>>> Hi,
>>>
>>> How do we replicate data between 2 cluster when Phoenix is in the
>>> picture?
>>>
>>> Can we simply replicate the table we want from A to B and on cluster B
>>> Phoenix will do the required re-indexing? Or should we also replicate the
>>> Phoenix tables too?
>>>
>>> Thanks,
>>>
>>> JM
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader of
>>> this message is not the intended recipient, you are hereby notified that any
>>> printing, copying, dissemination, distribution, disclosure or forwarding of
>>> this communication is strictly prohibited. If you have received this
>>> communication in error, please contact the sender immediately and delete it
>>> from your system. Thank You.
>>
>>
>>
>>
>> --
>> Best regards,
>>
>>    - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>
>

Mime
View raw message