phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Replication?
Date Wed, 10 Dec 2014 15:42:47 GMT
Thanks James (And Andrew). I think there can not be to much information.
The more information we share, the more knowledge we get.

So here is the situation.

We have 2 clusters that we want to configure in master/master mode.
Application is built using HBase 0.98 and Phoenix.

We deploy our HBase Schema with an RPM. This creates all the tables we need
and activate the replication for all of them. Then we run a Phoenix DDL SQL
script to create the views. We do this on both clusters so they are
identical, only the peer ID changes.

There is no indexes yet but for sure we will want them too.

The goal is to have 2 identical clusters with the same performances in case
one of them fails.

So based on what you said below, can I "simply" add the replication scope
to ALL the tables including all Phoenix tables, both ways?

Regarding the sequence number, how can we bump it when we detect a failure
of the other cluster? And related question, how to safely detect the
failure ;)

Thanks,

JM



2014-12-09 20:48 GMT-05:00 James Taylor <jamestaylor@apache.org>:

> No, we're not saying to avoid replication: at SFDC, we rely on
> replication to provide an active/active configuration for failover.
> Lars H. & co. can explain in more detail, but there are some nuances
> of which you should be aware. For example, the HBase table metadata
> needs to exist on both clusters. How is this done in your environment?
> One way to do this is the run the Phoenix DDL statements on both
> sides, but this requires some extra processing, as replication won't
> know about Phoenix DDL.
>
> Whether or not you replicate indexes depends on 1) how much your use
> case depends on them - if they're not available, will crucial queries
> become so slow that it's as if the system is down?, and 2) the size of
> your data and how long it takes to regenerate the index. Our current
> thinking is to replicate the indexes just as we replicate tables (an
> index just looks like any other HBase table as far as HBase is
> concerned), as we want to be able to failover immediately without
> performance degradation.
>
> As far as replicating the SYSTEM.CATALOG table, that's important
> depending on your use case as well. If you're using views (including
> multi-tenant tables) that are created dynamically/on-the-fly, then
> you'd likely want to replicate this table as otherwise this DDL has
> the potential to be lost. Adding the IF NOT EXISTS that Andrew
> referred to would prevent an error message when running the DDL on the
> secondary cluster if the row from the SYSTEM.CATALOG table was already
> replicated.
>
> For the SYSTEM.SEQUENCE table, as Andrew pointed out, we allocate
> chunks of sequences and dole them out on the client. You'd want to
> replicate this table, as otherwise when you switch to the other
> cluster, you'd start repeating the same sequence values. Once
> replicated, if the primary cluster goes down, then the sequences will
> pick up at the value after the already allocated chunk (which is fine,
> as it's fine to have "holes" in the sequence values that get doled
> out). There is a potential for a race condition if the primary cluster
> returns a batch of new sequences and then dies before replicating the
> updated sequence value to the other cluster. This can be mitigated, as
> Andrew points out by bumping up the sequence values on a failover
> event.
>
> HTH. Maybe more information than you wanted? Tell us more about how
> you're relying on replication when you get a chance.
>
> Thanks,
> James
>
>
>
> On Tue, Dec 9, 2014 at 5:00 PM, Jean-Marc Spaggiari
> <jean-marc@spaggiari.org> wrote:
> > Hum. Thanks for al those updates.
> >
> > So are we saying that master/master HBase replication should be avoided
> when
> > using Phoenix with latest stable version?
> >
> > 2014-12-09 19:51 GMT-05:00 Andrew Purtell <apurtell@apache.org>:
> >
> >> You also need to replicate the Phoenix system tables. It's still
> necessary
> >> to run DDL operations on both clusters to keep Phoenix schema and HBase
> >> tables in sync. Use IF EXISTS or IF NOT EXISTS to avoid DDL statement
> >> failures. Phoenix should do the right thing. If not, it's a bug.
> >>
> >> The sequence table is interesting. The Phoenix client caches a range of
> >> sequence values to use when inserting data that include generated
> sequence
> >> values. You'll want to always grab a new cached range of sequence values
> >> when failing over from one site to another and back to avoid potential
> >> duplication. It's possible upon site failure that the latest updates to
> the
> >> sequence table did not replicate. Or,
> >> https://issues.apache.org/jira/browse/PHOENIX-1422 would side step this
> >> issue if implemented.
> >>
> >>
> >> On Mon, Dec 8, 2014 at 10:22 PM, Jeffrey Zhong <jzhong@hortonworks.com>
> >> wrote:
> >>>
> >>>
> >>> You need to enable replication on both data & index table in Hbase
> level
> >>> using Phoenix 4.2(previous 4.2 Phoenix version may have issues on local
> >>> index). There is a test case MutableIndexReplicationIT where you can
> see
> >>> some details. Ideally Phoenix should provide a customer replication
> sink so
> >>> that a user doesn't have to setup replication on index table.
> >>>
> >>> From: Jean-Marc Spaggiari <jean-marc@spaggiari.org>
> >>> Reply-To: <user@phoenix.apache.org>
> >>> Date: Monday, December 8, 2014 at 9:29 AM
> >>> To: user <user@phoenix.apache.org>
> >>> Subject: Replication?
> >>>
> >>> Hi,
> >>>
> >>> How do we replicate data between 2 cluster when Phoenix is in the
> >>> picture?
> >>>
> >>> Can we simply replicate the table we want from A to B and on cluster B
> >>> Phoenix will do the required re-indexing? Or should we also replicate
> the
> >>> Phoenix tables too?
> >>>
> >>> Thanks,
> >>>
> >>> JM
> >>>
> >>> CONFIDENTIALITY NOTICE
> >>> NOTICE: This message is intended for the use of the individual or
> entity
> >>> to which it is addressed and may contain information that is
> confidential,
> >>> privileged and exempt from disclosure under applicable law. If the
> reader of
> >>> this message is not the intended recipient, you are hereby notified
> that any
> >>> printing, copying, dissemination, distribution, disclosure or
> forwarding of
> >>> this communication is strictly prohibited. If you have received this
> >>> communication in error, please contact the sender immediately and
> delete it
> >>> from your system. Thank You.
> >>
> >>
> >>
> >>
> >> --
> >> Best regards,
> >>
> >>    - Andy
> >>
> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> >> (via Tom White)
> >
> >
>

Mime
View raw message