Thanks for your reply, James.

 

I apologize if I’m not being very clear.

 

To clarify, NO DATA is being modified (upserted or deleted) in HBase/Phoenix during this use case. Only queries during the day time.

 

Throughout the day, different files are consumed by this java process (performing the queries), and resulting files are generated.

 

Then, at night, long after this code is finished running, a series of batch processes take these files and eventually ingest their data into Hive. THEN, the Hive data is sent to Phoenix via CsvBulkUpload.

 

These processes are completely independent and do not run at the same time. Data in Phoenix is ONLY modified at night.

 

Finally, yes, the pattern that I’m seeing regarding CPU utilization and queries taking minutes instead of milliseconds is on a  CLIENT machine – NOT on a data node. My theory is that the Phoenix client is somehow getting behind or doing garbage collection in a way that causes the queries to get backed up.

 

Some comments (replies inline):

- 4.2.2 is a pretty old release at this point - we've up to 4.6.0 as our last stable release. I'd recommend upgrading.

--> Agreed. I’m going to follow the instructions here and try to upgrade to 4.6.0 today: https://phoenix.apache.org/upgrading.html

 

- If the problem is on the client-side, you should be able to profile the Java application during the period of slowness and get us the details we need to diagnose this.

--> Do you have a recommendation for a tool that I should go learn about for this?

 

- Phoenix executes a single query using many threads already, so your ~2000 thread calculations wouldn't be accurate. For a little bit more info on how Phoenix parallelized queries, see https://phoenix.apache.org/tuning.html#Parallelization. At a minimum, Phoenix will execute one thread per region involved in your query. You can control the level of parallelization through the phoenix.stats.guidepost.width config parameter. If you want to minimize the parallelization that Phoenix does, you can set this config parameter to a very larger value in the server side hbase-site.xml. The default is 104857600 (or 10MB). If you set it to your MAX_FILESIZE (the size you allow a region to grow to before it splits - default 20GB), then you're essentially disabling it. You could also try increasing it somewhere in between to maybe 5 or 10GB.

--> Thank you. This is helpful. I will experiment with this.

 

- Running 8 simultaneous instances on the same client will overload that machine. Run them on separate client machines - but start with a single one on one machine.

--> I only have one machine for this (for now), but it is a very beefy machine (8 very fast CPUs and almost 1 TB of RAM). I need to perform this input/dedup operation for about 500 files each day (one per customer), and each file will require 5,000-100,000 queries, depending on the size of the customer. Supposing an average of 10K queries/customer, that’s about 5 million queries per day. At 10ms per query, that’s about 14 hours, which I need to improve considerably, especially as our data grows. Based on some experimenting, I’ve found that I can get an overall average much closer to 2-3ms / query if I run several instances in parallel. However, this is when I start to encounter the thread limit issue. So I’ll continue to experiment with this and I appreciate any feedback or recommendations this community can provide.

 

Thanks!

 

 

From: James Taylor [mailto:jamestaylor@apache.org]
Sent: Monday, December 07, 2015 6:34 PM
To: user
Subject: Re: Help tuning for bursts of high traffic?

 

Thanks for the additional information, Zack. I'd like to confirm I understand how you're using Phoenix:

- Data is not being upserted at the same time it is being queried. But I'm confused by your statement of "and THEN some of the Hive data is send to HBase/Phoenix". How is the data being sent? How much data? Is querying occurring while this data is being ingested?

- Querying happens *after* any data ingest. "So, per-widget, I perform a query against Phoenix". Are you determining duplicates through these queries and then issuing a Phoenix DELETE of the duplicates?

- The CPUs you mention being pegged are the CPUs on the *client* machine: "I see that, periodically – maybe every 60 or 90 seconds – all of my CPUs (there are 8 on this machine) go from mildly busy to almost totally pegged".

 

If you could confirm the above, that'd be super helpful. I can't think of any period process in Phoenix that runs every 60 or 90 seconds.

 

Some comments:

- 4.2.2 is a pretty old release at this point - we've up to 4.6.0 as our last stable release. I'd recommend upgrading.

- If the problem is on the client-side, you should be able to profile the Java application during the period of slowness and get us the details we need to diagnose this.

- Phoenix executes a single query using many threads already, so your ~2000 thread calculations wouldn't be accurate. For a little bit more info on how Phoenix parallelized queries, see https://phoenix.apache.org/tuning.html#Parallelization. At a minimum, Phoenix will execute one thread per region involved in your query. You can control the level of parallelization through the phoenix.stats.guidepost.width config parameter. If you want to minimize the parallelization that Phoenix does, you can set this config parameter to a very larger value in the server side hbase-site.xml. The default is 104857600 (or 10MB). If you set it to your MAX_FILESIZE (the size you allow a region to grow to before it splits - default 20GB), then you're essentially disabling it. You could also try increasing it somewhere in between to maybe 5 or 10GB.

- Running 8 simultaneous instances on the same client will overload that machine. Run them on separate client machines - but start with a single one on one machine.

 

On Mon, Dec 7, 2015 at 8:32 AM, Riesland, Zack <Zack.Riesland@sensus.com> wrote:

Also, and somewhat related:

 

I’m trying to running 8 simultaneous instances of this code (on 8 separate input files), since I have 8 CPUs on the machine.

 

When I try this, I get java.lang.RuntimeException: java.lang.OutOfMemory: unable to create a new native thread

 

My phoenix connection has the “phoenix.query.threadPoolSize” set to “256”, which should result in 8x256 = ~2,000 threads being spawned by Phoenix.

 

Is that correct?

 

I’m running RH Linux 6, with my ulimit set to “unlimited”, so I should be able to handle thousands of threads.

 

Any ideas?

 

From: Andrew Purtell [mailto:andrew.purtell@gmail.com]
Sent: Friday, December 04, 2015 4:24 PM
To: user@phoenix.apache.org
Cc: Haisty, Geoffrey

Subject: Re: Help tuning for bursts of high traffic?

 

Any chance of stack dumps from the debug servlet? Impossible to get anywhere with 'pegged the CPU' otherwise. Thanks. 


On Dec 4, 2015, at 12:20 PM, Riesland, Zack <Zack.Riesland@sensus.com> wrote:

James,

 

2 quick followups, for whatever they’re worth:

 

1 – There is nothing phoenix-related in /tmp

 

2 – I added a ton of logging, and played with the properties a bit, and I think I see a pattern:

 

Watching the logging and the system profiler side-by-side, I see that, periodically – maybe every 60 or 90 seconds – all of my CPUs (there are 8 on this machine) go from mildly busy to almost totally pegged.

 

They USUALLY stay pegged for 5-10 seconds, and then calm down.

 

However, occasionally, they stay pegged for around a minute. When this happens, I get the very slow queries. I added logic so that when I get a very slow response (> 1 second), I pause for 30 seconds.

 

This ‘fixes’ everything, in the sense that I’m usually able to get a couple thousand good queries before the whole pattern repeats.

 

For reference, there’s nothing external that should be causing those CPU spikes, so I’m guessing that it’s maybe java GC (?) or perhaps something that the phoenix client is doing ?

 

Can you guess at what Phoenix might do periodically that would peg the CPUs – and in such a way that a query has to wait as much as 2 minutes to execute (I’m guessing from the pattern that it’s not actually the query that is slow, but a very long between when it gets queued and when it actually gets executed).

 

Oh and the methods you mentioned aren’t in my version of PhoenixRuntime, evidently. I’m on 4.2.2.something.

 

Thanks for any further feedback you can provide on this. Hopefully the conversation is helpful to the whole Phoenix community.

 

From: Riesland, Zack
Sent: Friday, December 04, 2015 1:36 PM
To: user@phoenix.apache.org
Cc: Geoff.haisty@sensus.com
Subject: RE: Help tuning for bursts of high traffic?

 

Thanks, James

 

I'll work on gathering more information.

 

In the meantime, answers to a few of your questions inline below just narrow the scope a bit:

 


From: James Taylor [jamestaylor@apache.org]
Sent: Friday, December 04, 2015 12:21 PM
To: user
Subject: Re: Help tuning for bursts of high traffic?

Zack,

Thanks for reporting this and for the detailed description. Here's a bunch of questions and some things you can try in addition to what Andrew suggested:

1) Is this reproducible in a test environment (perhaps through Pherf: https://phoenix.apache.org/pherf.html) so you can experiment more?

-Will check

 

2) Do you get a sense of whether the bottleneck is on the client or the server? CPU, IO, or network? How many clients are you running and have you tried increasing this? Do you think your network is saturated by the data being returned?

-I'm no expert on this. When I look at the HBase dashboard on Ambari, everything looks good. When I look at the stats on the machine running the java code, it also looks good. Certainly no bottleneck related to memory or CPU. Network wise, the box is on the same rack as the cluster, with 10GB switches everywhere, so I'd be surprised if network latency were an issue.

 

3) From your description, it sounds like you're querying the data as your ingesting. When it gets slow, have you tried running a major compaction to see if that helps? Perhaps queries are getting slower because of the number of HFiles that need to be merged.

-Rereading my original email, I see where you get that. But actually, there is nothing being ingested by HBase during this process. At the end of the process, I generate a CSV file that is then consumed and altered by Pentaho, then consumed by Hive, and THEN some of the Hive data is send to HBase/Phoenix. So this is part of the ingest process, but a precursor to the cluster Ingesting any data.

 

4) If you bounce your cluster when it gets slow, does this have any impact?

-Can check. What should I expect to happen if I restart HBase-related services while trying to query Phoenix? Will the query just wait until everything is back up? Will I get strange exceptions? (Of course I'll go find this out myself)

 

5) What kinds of queries are running? Aggregation? Joins? Or just plain single table selects? Any ORDER BY clauses? Are you using secondary indexes, and if so, what kind?

-Very simple query: select x, y, z from my_table where key = 'my_key' and sample_point <= upper_range and sample_point >= lower_range. x, y, and z are integers.

 

6) Are you seeing GC pauses on the server during times of slowness (correlate time of slowness with your server logs)?

-Can look

 

7) Sounds like your queries are returning a lot of data. On the client-side, Phoenix will keep phoenix.query.spoolThresholdBytes in memory and then spool to disk as parallel execution happens. Are you seeing many spool files on the client side your /tmp directory (this is where Phoenix puts these by default with a name of ResultSpoolerXXX.bin). Try increasing this spool threshold if that's the case.

-I'll look into this

 

8) For the data ingest, are you using UPSERT VALUES? How big of batches are you committing? That's one thing to tune, especially if you're using secondary indexing.

-Again, nothing ingesting

 

9) Have you tried tuning the level of parallelization that Phoenix is doing for queries? This is controlled by the server-side phoenix.stats.guidepost.width parameter (assuming you haven't set the phoenix.stats.guidepost.per.region parameter) and defaults to 300MB. Try increasing it (you'll need to run a major compaction for this to take effect, and there's 15min lag to when the client sees it).

-This sounds interesting. I'll have to learn more about it.

 

10) If you're doing aggregation or join queries, try increasing the phoenix.query.maxGlobalMemorySize property on the server side. Both hash joins and aggregation are done in memory, up to this % limit. If the limit is reached, then on the server side, Phoenix will wait phoenix.query.maxGlobalMemoryWaitMs time for the usage to go below the limit (and then throw an exception if it doesn't). You can try tuning this wait time down to see if it has an impact.

-Not applicable

 

11) There a bunch of client-side metrics you can collect (but little documentation yet - keep your eye on PHOENIX-2486) that might help you diagnose this. See PhoenixRuntime.getGlobalPhoenixClientMetrics(), PhoenixRuntime.getOverAllReadRequestMetrics(), and other methods with Metrics in the name.

-I'll look into this

 

12) There's also tracing, which is end-to-end client/server, but it's in a bit on the raw side still: https://phoenix.apache.org/tracing.htmlhttps://phoenix.apache.org/tracing.html

-OK

 

There's more information on these tuning parameters here: https://phoenix.apache.org/tuning.html and you should take a look at Andrew's excellent tuning presentation here: https://phoenix.apache.org/resources.html.

 

Thanks,

James

 

 

On Fri, Dec 4, 2015 at 8:28 AM, Andrew Purtell <apurtell@apache.org> wrote:

Kumar - I believe you mentioned you are seeing this in a cluster of ~20 regionservers. 

 

Zack - Yours is smaller yet, at 9.

 

These clusters are small enough to make getting stack dumps through the HBase debug servlet during periods of unusually slow response possible. Perhaps you can write a script that queries all of the debug servlets (can use curl) and dumps the received output into per-regionserver files? Scrape every 10 or so seconds during the observed periods of slowness? Then compress them and make them available for Phoenix devs up on S3? Consider it a poor man's sampler. I don't know what we might find, but this could prove very helpful.

 

 

On Fri, Dec 4, 2015 at 8:11 AM, Kumar Palaniappan <kpalaniappan@marinsoftware.com> wrote:

I'm in the same exact position as Zack described. Appreciate your feedback.

 

So far we tried the call queue n the handlers, nope. Planned to try off-heap cache.

Kumar Palaniappan   

Thanks Satish,

 

To clarify: I’m not looking up single rows. I’m looking up the history of each widget, which returns hundreds-to-thousands of results per widget (per query).

 

Each query is a range scan, it’s just that I’m performing thousands of them.

 

From: Satish Iyengar [mailto:satysh@gmail.com]
Sent: Friday, December 04, 2015 9:43 AM
To: user@phoenix.apache.org
Subject: Re: Help tuning for bursts of high traffic?

 

 

On Fri, Dec 4, 2015 at 9:09 AM, Riesland, Zack <Zack.Riesland@sensus.com> wrote:

SHORT EXPLANATION: a much higher percentage of queries to phoenix return exceptionally slow after querying very heavily for several minutes.

 

LONGER EXPLANATION:

 

I’ve been using Pheonix for about a year as a data store for web-based reporting tools and it works well.

 

Now, I’m trying to use the data in a different (much more request-intensive) way and encountering some issues.

 

The scenario is basically this:

 

Daily, ingest very large CSV files with data for widgets.

 

Each input file has hundreds of rows of data for each widget, and tens of thousands of unique widgets.

 

As a first step, I want to de-duplicate this data against my Phoenix-based DB (I can’t rely on just upserting the data for de-dup because it will go through several ETL steps before being stored into Phoenix/HBase).

 

So, per-widget, I perform a query against Phoenix (the table is keyed against the unique widget ID + sample point). I get all the data for a given widget id, within a certain period of time, and then I only ingest rows for that widget that are new to me.

 

I’m doing this in Java in a single step: I loop through my input file and perform one query per widget, using the same Connection object to Phoenix.

 

THE ISSUE:

 

What I’m finding is that for the first several thousand queries, I almost always get a very fast (less than 10 ms) response (good).

 

But after 15-20 thousand queries, the response starts to get MUCH slower. Some queries respond as expected, but many take as many as 2-3 minutes, pushing the total time to prime the data structure into the 12-15 hour range, when it would only take 2-3 hours if all the queries were fast.

 

The same exact queries, when run manually and not part of this bulk process, return in the (expected) < 10 ms.

 

So it SEEMS like the burst of queries puts Phoenix into some sort of busy state that causes it to respond far too slowly.

 

The connection properties I’m setting are:

 

Phoenix.query.timeoutMs: 90000

Phoenix.query.keepAliveMs: 90000

Phenix.query.threadPoolSize: 256

 

Our cluster is 9 (beefy) region servers and the table I’m referencing is 511 regions. We went through a lot of pain to get the data split extremely well, and I don’t think Schema design is the issue here.

 

Can anyone help me understand how to make this better? Is there a better approach I could take? A better set of configuration parameters? Is our cluster just too small for this?

 

 

Thanks!

 

 

 

 

 

 

 

 

 

 



 

--



 

--