phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Heather, James (ELS)" <james.heat...@elsevier.com>
Subject Re: How to tell when an insertion has "finished"
Date Fri, 29 Jul 2016 05:59:21 GMT
I don't really know enough about the low level details to know which replication I was referring
to...

Let me ask the higher level question:

1. Am I right in thinking that after you insert a large number of rows, the performance of
the cluster (and maybe of those rows in particular) will be initially slow while some stuff
is still happening at a lower level in the background?

2. If so, how do you tell when that stuff has finished, and when your query performance will
reach a steady state?

James

On 29 July 2016 12:05:30 a.m. James Taylor <jamestaylor@apache.org> wrote:

That's a good point, Mujtaba. Not sure which replication he meant either.

On Thu, Jul 28, 2016 at 4:02 PM, Mujtaba Chohan <mujtaba@apache.org<mailto:mujtaba@apache.org>>
wrote:
Oh sorry I thought OP was referring to HDFS level replication.

On Thu, Jul 28, 2016 at 3:48 PM, James Taylor <jamestaylor@apache.org<mailto:jamestaylor@apache.org>>
wrote:
I believe you can also measure the depth of the replication queue to know what's pending.
HBase replication is asynchronous, so you're right that Phoenix would return while replication
may still be occurring.

On Thu, Jul 28, 2016 at 12:06 PM, Mujtaba Chohan <mujtaba@apache.org<mailto:mujtaba@apache.org>>
wrote:
Query running first time would be slower since data is not in HBase cache rather than things
being not settled. Replication shouldn't be putting load on cluster which you can check by
turning replication off. On HBase side to force things to be optimal before running perf queries
is to do a major compaction and wait for compaction to complete.

- mujtaba

On Thu, Jul 28, 2016 at 8:09 AM, Heather, James (ELS) <james.heather@elsevier.com<mailto:james.heather@elsevier.com>>
wrote:

If you upsert lots of rows into a table, presumably Phoenix will return as soon as HBase has
received the data, but before the data has been replicated?


Is there a way to tell when everything has "settled", i.e., when everything has finished replicating
or whatever it needs to do?


The reason I ask is that this might affect our benchmarking. If we add lots of rows, and then
run some sample queries straight away, they might return more slowly initially, if the replication
is still taking place.


(Does this make sense? I'm not completely clear on how HBase replication works anyway.)


James

________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5
1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.





________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5
1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.

Mime
View raw message