phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kumar Palaniappan <kpalaniap...@marinsoftware.com>
Subject Re: Help tuning for bursts of high traffic?
Date Fri, 04 Dec 2015 16:11:56 GMT
I'm in the same exact position as Zack described. Appreciate your feedback.

So far we tried the call queue n the handlers, nope. Planned to try off-heap cache.

Kumar Palaniappan   

> On Dec 4, 2015, at 6:45 AM, Riesland, Zack <Zack.Riesland@sensus.com> wrote:
> 
> Thanks Satish,
>  
> To clarify: I’m not looking up single rows. I’m looking up the history of each widget,
which returns hundreds-to-thousands of results per widget (per query).
>  
> Each query is a range scan, it’s just that I’m performing thousands of them.
>  
> From: Satish Iyengar [mailto:satysh@gmail.com] 
> Sent: Friday, December 04, 2015 9:43 AM
> To: user@phoenix.apache.org
> Subject: Re: Help tuning for bursts of high traffic?
>  
> Hi Zack,
>  
> Did you consider avoiding hitting hbase for every single row by doing that step in an
offline mode? I was thinking if you could have some kind of daily export of hbase table and
then use pig to perform join (co-group perhaps) to do the same. Obviously this would work
only when your hbase table is not maintained by stream based system. Hbase is really good
at range scans and may not be ideal for single row (large number of).
>  
> Thanks,
> Satish
>  
>  
>  
>  
>  
> On Fri, Dec 4, 2015 at 9:09 AM, Riesland, Zack <Zack.Riesland@sensus.com> wrote:
> SHORT EXPLANATION: a much higher percentage of queries to phoenix return exceptionally
slow after querying very heavily for several minutes.
>  
> LONGER EXPLANATION:
>  
> I’ve been using Pheonix for about a year as a data store for web-based reporting tools
and it works well.
>  
> Now, I’m trying to use the data in a different (much more request-intensive) way and
encountering some issues.
>  
> The scenario is basically this:
>  
> Daily, ingest very large CSV files with data for widgets.
>  
> Each input file has hundreds of rows of data for each widget, and tens of thousands of
unique widgets.
>  
> As a first step, I want to de-duplicate this data against my Phoenix-based DB (I can’t
rely on just upserting the data for de-dup because it will go through several ETL steps before
being stored into Phoenix/HBase).
>  
> So, per-widget, I perform a query against Phoenix (the table is keyed against the unique
widget ID + sample point). I get all the data for a given widget id, within a certain period
of time, and then I only ingest rows for that widget that are new to me.
>  
> I’m doing this in Java in a single step: I loop through my input file and perform one
query per widget, using the same Connection object to Phoenix.
>  
> THE ISSUE:
>  
> What I’m finding is that for the first several thousand queries, I almost always get
a very fast (less than 10 ms) response (good).
>  
> But after 15-20 thousand queries, the response starts to get MUCH slower. Some queries
respond as expected, but many take as many as 2-3 minutes, pushing the total time to prime
the data structure into the 12-15 hour range, when it would only take 2-3 hours if all the
queries were fast.
>  
> The same exact queries, when run manually and not part of this bulk process, return in
the (expected) < 10 ms.
>  
> So it SEEMS like the burst of queries puts Phoenix into some sort of busy state that
causes it to respond far too slowly.
>  
> The connection properties I’m setting are:
>  
> Phoenix.query.timeoutMs: 90000
> Phoenix.query.keepAliveMs: 90000
> Phenix.query.threadPoolSize: 256
>  
> Our cluster is 9 (beefy) region servers and the table I’m referencing is 511 regions.
We went through a lot of pain to get the data split extremely well, and I don’t think Schema
design is the issue here.
>  
> Can anyone help me understand how to make this better? Is there a better approach I could
take? A better set of configuration parameters? Is our cluster just too small for this?
>  
>  
> Thanks!
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
> 
> 
>  
> --
> Satish Iyengar
> 
> "Anyone who has never made a mistake has never tried anything new."
> Albert Einstein

Mime
View raw message