phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Help tuning for bursts of high traffic?
Date Thu, 10 Dec 2015 23:24:31 GMT
Slight correction to a couple of additional steps after truncating stats
table:
1) DELETE FROM SYSTEM.STATS; -- ensure autocommit is on which is the
default in sqlline
2) bounce your cluster (as we cache stats on server-side)
3) restart client (as we cache on client side as well)
4) reissue your queries and let us know if this makes a difference.
5) make sure SYSTEM.STATS is still empty - a major compaction would cause
stats to be regenerated
    SELECT sum(guide_posts_count) FROM SYSTEM.STATS -- should return 0


On Thu, Dec 10, 2015 at 12:33 PM, James Taylor <jamestaylor@apache.org>
wrote:

> Zack - have you tried tuning the stats-related parameters yet? You can
> start by just by truncating the stats table:
>
> DELETE FROM SYSTEM.STATS;
>
> If that solves the problem, then here's what you can do to prevent stats
> from being generated:
> - set phoenix.stats.guidepost.per.region to 1 on all region server
> hbase-site.xml
> - bounce your cluster
> - force a major compaction
>
> Thanks,
> James
>
>
> On Thu, Dec 10, 2015 at 12:15 PM, Riesland, Zack <Zack.Riesland@sensus.com
> > wrote:
>
>> Thanks Jain,
>>
>> The 3800 seconds was just for the executeQUery()
>>
>> I've seen it as high as 5900
>> ________________________________________
>> From: Samarth Jain [samarth.jain@gmail.com]
>> Sent: Thursday, December 10, 2015 2:31 PM
>> To: user@phoenix.apache.org
>> Subject: Re: Help tuning for bursts of high traffic?
>>
>> Thanks for the additional information, Zack. Looking at the numbers it
>> looks like the bottle-neck is probably not coming from the phoenix thread
>> pool.
>>
>> For request level metrics:
>>
>> TASK_QUEUE_WAIT_TIME - represents the length of time (wall clock)
>> phoenix scans had to wait in the thread pool's queue before they were
>> picked up for execution. A higher value of this would mean that you likely
>> need to increase your thread pool size.
>>
>> TASK_EXECUTION_TIME - represents the time taken by scans to complete.
>>
>> TASK_END_TO_END_TIME - is roughly task_queue_wait_time +
>> task_execution_time
>>
>> Did you get 3800ms for stmt.executeQuery() itself or did that time
>> include time spent in retrieving records via resultSet.next() too?
>>
>>
>> On Thu, Dec 10, 2015 at 7:38 AM, Riesland, Zack <Zack.Riesland@sensus.com
>> <mailto:Zack.Riesland@sensus.com>> wrote:
>> Thanks,
>>
>> I did some experimenting.
>>
>> Now, anytime I get a query that lasts longer than 500 ms, I call
>> getRequestReadMetrics() and print everything.
>>
>> The output suggests to me that Phoenix thinks that the query is still
>> reasonably fast:
>>
>> TASK_QUEUE_WAIT_TIME: 0
>> TASK_EXECUTION_TIME: 80
>> TASK_END_TO_END_TIME: 80
>>
>> Should I read this as: “The query did not have to wait at all and took 80
>> ms to execute” ?
>>
>> Comparing System.currentTimeMillis() immediately before and after calling
>> stmt.executeQuery() is about 3,800 ms for this particular example.
>>
>> The other (non-zero) counters, from getGlobalPhoenixClientMetrics() at
>> this time, are:
>>
>> QUERY_TIME: 347802
>> NUM_PARALLEL_SCANS: 5638
>> SCAN_BYTES: 3074163200<tel:3074163200>
>> TASK_QUEUE_WAIT_TIME: 12129
>> TASK_END_TO_END_TIME: 851524
>> TASK_EXECUTION_TIME: 839347
>> MUTATION_SQL_COUNTER: 93
>> SELECT_SQL_COUNTER: 5610 (this is the number of  total queries on this
>> connection)
>>
>> Can you please help me understand how to interpret all this and possibly
>> understand how to configure this scenario to make it faster?
>>
>> Right now, I’m only averaging about 5 queries/second, even though I’m
>> querying by the primary key.
>>
>> Before I upgraded, I was getting a lot closer to 100.
>>
>> Thanks!
>>
>>
>>
>> From: Samarth Jain [mailto:samarth@apache.org<mailto:samarth@apache.org>]
>> Sent: Wednesday, December 09, 2015 1:59 PM
>> To: user@phoenix.apache.org<mailto:user@phoenix.apache.org>
>> Subject: Re: Help tuning for bursts of high traffic?
>>
>> Zack,
>>
>> These stats are collected continuously and at the global client level. So
>> collecting them only when the query takes more than 1 second won't work. A
>> better alternative for you would be to report stats at a request level. You
>> could then conditionally report the metrics for queries that exceed 1
>> second execution. The metric you want to make that decision on is
>> WALL_CLOCK_TIME_MS.
>>
>> Request level metric collection isn't enabled by default. You can turn it
>> on/off for a PhoenixConnection by doing the following:
>>
>> Properties props = new Properties();
>> props.put(QueryServices.COLLECT_REQUEST_LEVEL_METRICS,
>> String.valueOf(true));
>> Connection conn = DriverManager.getConnection(url, props);
>>
>> Then see PhoenixRuntime.getRequestReadMetrics() on how to get hold of
>> these metrics.
>>
>> I have a JIRA filed - https://issues.apache.org/jira/browse/PHOENIX-2486
>> for adding the documentation for these metrics. Will get to it shortly.
>>
>>
>>
>>
>>
>>
>> On Wed, Dec 9, 2015 at 10:38 AM, Riesland, Zack <Zack.Riesland@sensus.com
>> <mailto:Zack.Riesland@sensus.com>> wrote:
>> Thanks a lot, James.
>>
>> And now that I’m using 4.6, I have access to some statistics.
>>
>> I added code to collect stats whenever a query takes more than 1 second.
>>
>> The first time this happened was after about 51,000 queries, and the
>> output is below, for whatever it’s worth.
>>
>> Can you point me to any documentation to describe these values, or to
>> utilize this information?
>>
>> I have a hunch that SCAN_BYTES and TASK_QUEUE_WAIT_TIME are the
>> interesting numbers here?
>>
>> PHOENIX STATS:
>> Batch sizes of mutations | MUTATION_BATCH_SIZE: 0, Number of samples: 0 |
>> 0
>> Size of mutations in bytes | MUTATION_BYTES: 0, Number of samples: 0 | 0
>> Time it took to commit mutations | MUTATION_COMMIT_TIME: 0, Number of
>> samples: 0 | 0
>> Query times | QUERY_TIME: 2424030, Number of samples: 51555 | 2424030
>> Number of scans that were executed in parallel | NUM_PARALLEL_SCANS:
>> 51769, Number of samples: 51554 | 51769
>> Number of bytes read by scans | SCAN_BYTES: 28193431552, Number of
>> samples: 220261184 | 28193431552
>> Size of spool files created in bytes | SPOOL_FILE_SIZE: 0, Number of
>> samples: 0 | 0
>> Number of bytes allocated by the memory manager | MEMORY_CHUNK_BYTES: 0,
>> Number of samples: 0 | 0
>> Number of milliseconds threads needed to wait for memory to be allocated
>> through memory manager | MEMORY_WAIT_TIME: 0, Number of samples: 0 | 0
>> Time in milliseconds tasks had to wait in the queue of the thread pool
>> executor | TASK_QUEUE_WAIT_TIME: 114349, Number of samples: 2235211 | 114349
>> Time in milliseconds spent by tasks from creation to completion |
>> TASK_END_TO_END_TIME: 6030797, Number of samples: 2235211 | 6030797
>> Time in milliseconds tasks took to execute | TASK_EXECUTION_TIME:
>> 5916067, Number of samples: 2235211 | 5916067
>> Counter for number of mutation sql statements | MUTATION_SQL_COUNTER: 93,
>> Number of samples: 93 | 93
>> Counter for number of sql queries | SELECT_SQL_COUNTER: 51555, Number of
>> samples: 51555 | 51555
>> Counter for number of tasks submitted to the thread pool executor |
>> TASK_EXECUTED_COUNTER: 2235211, Number of samples: 2235211 | 2235211
>> Counter for number of tasks that were rejected by the thread pool
>> executor | TASK_REJECTED_COUNTER: 0, Number of samples: 0 | 0
>> Number of times query timed out | QUERY_TIMEOUT_COUNTER: 0, Number of
>> samples: 0 | 0
>> Number of times query failed | QUERY_FAILED_COUNTER: 0, Number of
>> samples: 0 | 0
>> Number of spool files created | SPOOL_FILE_COUNTER: 0, Number of samples:
>> 0 | 0
>>
>> From: James Taylor [mailto:jamestaylor@apache.org<mailto:
>> jamestaylor@apache.org>]
>> Sent: Tuesday, December 08, 2015 7:10 PM
>>
>> To: user
>> Subject: Re: Help tuning for bursts of high traffic?
>>
>> Thanks for clarifying, Zack. For profiling on your client machine, you
>> can just use VisualVM as it's built into the JDK (
>> https://dzone.com/articles/best-kept-secret-jdk-visualvm).
>>
>>     James
>>
>> On Tue, Dec 8, 2015 at 6:31 AM, Riesland, Zack <Zack.Riesland@sensus.com
>> <mailto:Zack.Riesland@sensus.com>> wrote:
>> Thanks for your reply, James.
>>
>> I apologize if I’m not being very clear.
>>
>> To clarify, NO DATA is being modified (upserted or deleted) in
>> HBase/Phoenix during this use case. Only queries during the day time.
>>
>> Throughout the day, different files are consumed by this java process
>> (performing the queries), and resulting files are generated.
>>
>> Then, at night, long after this code is finished running, a series of
>> batch processes take these files and eventually ingest their data into
>> Hive. THEN, the Hive data is sent to Phoenix via CsvBulkUpload.
>>
>> These processes are completely independent and do not run at the same
>> time. Data in Phoenix is ONLY modified at night.
>>
>> Finally, yes, the pattern that I’m seeing regarding CPU utilization and
>> queries taking minutes instead of milliseconds is on a  CLIENT machine –
>> NOT on a data node. My theory is that the Phoenix client is somehow getting
>> behind or doing garbage collection in a way that causes the queries to get
>> backed up.
>>
>> Some comments (replies inline):
>> - 4.2.2 is a pretty old release at this point - we've up to 4.6.0 as our
>> last stable release. I'd recommend upgrading.
>> --> Agreed. I’m going to follow the instructions here and try to upgrade
>> to 4.6.0 today: https://phoenix.apache.org/upgrading.html
>>
>> - If the problem is on the client-side, you should be able to profile the
>> Java application during the period of slowness and get us the details we
>> need to diagnose this.
>> --> Do you have a recommendation for a tool that I should go learn about
>> for this?
>>
>> - Phoenix executes a single query using many threads already, so your
>> ~2000 thread calculations wouldn't be accurate. For a little bit more info
>> on how Phoenix parallelized queries, see
>> https://phoenix.apache.org/tuning.html#Parallelization. At a minimum,
>> Phoenix will execute one thread per region involved in your query. You can
>> control the level of parallelization through the
>> phoenix.stats.guidepost.width config parameter. If you want to minimize the
>> parallelization that Phoenix does, you can set this config parameter to a
>> very larger value in the server side hbase-site.xml. The default is
>> 104857600 (or 10MB). If you set it to your MAX_FILESIZE (the size you allow
>> a region to grow to before it splits - default 20GB), then you're
>> essentially disabling it. You could also try increasing it somewhere in
>> between to maybe 5 or 10GB.
>> --> Thank you. This is helpful. I will experiment with this.
>>
>> - Running 8 simultaneous instances on the same client will overload that
>> machine. Run them on separate client machines - but start with a single one
>> on one machine.
>> --> I only have one machine for this (for now), but it is a very beefy
>> machine (8 very fast CPUs and almost 1 TB of RAM). I need to perform this
>> input/dedup operation for about 500 files each day (one per customer), and
>> each file will require 5,000-100,000 queries, depending on the size of the
>> customer. Supposing an average of 10K queries/customer, that’s about 5
>> million queries per day. At 10ms per query, that’s about 14 hours, which I
>> need to improve considerably, especially as our data grows. Based on some
>> experimenting, I’ve found that I can get an overall average much closer to
>> 2-3ms / query if I run several instances in parallel. However, this is when
>> I start to encounter the thread limit issue. So I’ll continue to experiment
>> with this and I appreciate any feedback or recommendations this community
>> can provide.
>>
>> Thanks!
>>
>>
>> From: James Taylor [mailto:jamestaylor@apache.org<mailto:
>> jamestaylor@apache.org>]
>> Sent: Monday, December 07, 2015 6:34 PM
>>
>> To: user
>> Subject: Re: Help tuning for bursts of high traffic?
>>
>> Thanks for the additional information, Zack. I'd like to confirm I
>> understand how you're using Phoenix:
>> - Data is not being upserted at the same time it is being queried. But
>> I'm confused by your statement of "and THEN some of the Hive data is send
>> to HBase/Phoenix". How is the data being sent? How much data? Is querying
>> occurring while this data is being ingested?
>> - Querying happens *after* any data ingest. "So, per-widget, I perform a
>> query against Phoenix". Are you determining duplicates through these
>> queries and then issuing a Phoenix DELETE of the duplicates?
>> - The CPUs you mention being pegged are the CPUs on the *client* machine:
>> "I see that, periodically – maybe every 60 or 90 seconds – all of my CPUs
>> (there are 8 on this machine) go from mildly busy to almost totally pegged".
>>
>> If you could confirm the above, that'd be super helpful. I can't think of
>> any period process in Phoenix that runs every 60 or 90 seconds.
>>
>> Some comments:
>> - 4.2.2 is a pretty old release at this point - we've up to 4.6.0 as our
>> last stable release. I'd recommend upgrading.
>> - If the problem is on the client-side, you should be able to profile the
>> Java application during the period of slowness and get us the details we
>> need to diagnose this.
>> - Phoenix executes a single query using many threads already, so your
>> ~2000 thread calculations wouldn't be accurate. For a little bit more info
>> on how Phoenix parallelized queries, see
>> https://phoenix.apache.org/tuning.html#Parallelization. At a minimum,
>> Phoenix will execute one thread per region involved in your query. You can
>> control the level of parallelization through the
>> phoenix.stats.guidepost.width config parameter. If you want to minimize the
>> parallelization that Phoenix does, you can set this config parameter to a
>> very larger value in the server side hbase-site.xml. The default is
>> 104857600 (or 10MB). If you set it to your MAX_FILESIZE (the size you allow
>> a region to grow to before it splits - default 20GB), then you're
>> essentially disabling it. You could also try increasing it somewhere in
>> between to maybe 5 or 10GB.
>> - Running 8 simultaneous instances on the same client will overload that
>> machine. Run them on separate client machines - but start with a single one
>> on one machine.
>>
>> On Mon, Dec 7, 2015 at 8:32 AM, Riesland, Zack <Zack.Riesland@sensus.com
>> <mailto:Zack.Riesland@sensus.com>> wrote:
>> Also, and somewhat related:
>>
>> I’m trying to running 8 simultaneous instances of this code (on 8
>> separate input files), since I have 8 CPUs on the machine.
>>
>> When I try this, I get java.lang.RuntimeException: java.lang.OutOfMemory:
>> unable to create a new native thread
>>
>> My phoenix connection has the “phoenix.query.threadPoolSize” set to
>> “256”, which should result in 8x256 = ~2,000 threads being spawned by
>> Phoenix.
>>
>> Is that correct?
>>
>> I’m running RH Linux 6, with my ulimit set to “unlimited”, so I should be
>> able to handle thousands of threads.
>>
>> Any ideas?
>>
>> From: Andrew Purtell [mailto:andrew.purtell@gmail.com<mailto:
>> andrew.purtell@gmail.com>]
>> Sent: Friday, December 04, 2015 4:24 PM
>> To: user@phoenix.apache.org<mailto:user@phoenix.apache.org>
>> Cc: Haisty, Geoffrey
>> Subject: Re: Help tuning for bursts of high traffic?
>>
>> Any chance of stack dumps from the debug servlet? Impossible to get
>> anywhere with 'pegged the CPU' otherwise. Thanks.
>>
>> On Dec 4, 2015, at 12:20 PM, Riesland, Zack <Zack.Riesland@sensus.com
>> <mailto:Zack.Riesland@sensus.com>> wrote:
>> James,
>>
>> 2 quick followups, for whatever they’re worth:
>>
>> 1 – There is nothing phoenix-related in /tmp
>>
>> 2 – I added a ton of logging, and played with the properties a bit, and I
>> think I see a pattern:
>>
>> Watching the logging and the system profiler side-by-side, I see that,
>> periodically – maybe every 60 or 90 seconds – all of my CPUs (there are 8
>> on this machine) go from mildly busy to almost totally pegged.
>>
>> They USUALLY stay pegged for 5-10 seconds, and then calm down.
>>
>> However, occasionally, they stay pegged for around a minute. When this
>> happens, I get the very slow queries. I added logic so that when I get a
>> very slow response (> 1 second), I pause for 30 seconds.
>>
>> This ‘fixes’ everything, in the sense that I’m usually able to get a
>> couple thousand good queries before the whole pattern repeats.
>>
>> For reference, there’s nothing external that should be causing those CPU
>> spikes, so I’m guessing that it’s maybe java GC (?) or perhaps something
>> that the phoenix client is doing ?
>>
>> Can you guess at what Phoenix might do periodically that would peg the
>> CPUs – and in such a way that a query has to wait as much as 2 minutes to
>> execute (I’m guessing from the pattern that it’s not actually the query
>> that is slow, but a very long between when it gets queued and when it
>> actually gets executed).
>>
>> Oh and the methods you mentioned aren’t in my version of PhoenixRuntime,
>> evidently. I’m on 4.2.2.something.
>>
>> Thanks for any further feedback you can provide on this. Hopefully the
>> conversation is helpful to the whole Phoenix community.
>>
>> From: Riesland, Zack
>> Sent: Friday, December 04, 2015 1:36 PM
>> To: user@phoenix.apache.org<mailto:user@phoenix.apache.org>
>> Cc: Geoff.haisty@sensus.com<mailto:Geoff.haisty@sensus.com>
>> Subject: RE: Help tuning for bursts of high traffic?
>>
>> Thanks, James
>>
>> I'll work on gathering more information.
>>
>> In the meantime, answers to a few of your questions inline below just
>> narrow the scope a bit:
>>
>> ________________________________
>> From: James Taylor [jamestaylor@apache.org<mailto:jamestaylor@apache.org
>> >]
>> Sent: Friday, December 04, 2015 12:21 PM
>> To: user
>> Subject: Re: Help tuning for bursts of high traffic?
>> Zack,
>> Thanks for reporting this and for the detailed description. Here's a
>> bunch of questions and some things you can try in addition to what Andrew
>> suggested:
>> 1) Is this reproducible in a test environment (perhaps through Pherf:
>> https://phoenix.apache.org/pherf.html) so you can experiment more?
>> -Will check
>>
>> 2) Do you get a sense of whether the bottleneck is on the client or the
>> server? CPU, IO, or network? How many clients are you running and have you
>> tried increasing this? Do you think your network is saturated by the data
>> being returned?
>> -I'm no expert on this. When I look at the HBase dashboard on Ambari,
>> everything looks good. When I look at the stats on the machine running the
>> java code, it also looks good. Certainly no bottleneck related to memory or
>> CPU. Network wise, the box is on the same rack as the cluster, with 10GB
>> switches everywhere, so I'd be surprised if network latency were an issue.
>>
>> 3) From your description, it sounds like you're querying the data as your
>> ingesting. When it gets slow, have you tried running a major compaction to
>> see if that helps? Perhaps queries are getting slower because of the number
>> of HFiles that need to be merged.
>> -Rereading my original email, I see where you get that. But actually,
>> there is nothing being ingested by HBase during this process. At the end of
>> the process, I generate a CSV file that is then consumed and altered by
>> Pentaho, then consumed by Hive, and THEN some of the Hive data is send to
>> HBase/Phoenix. So this is part of the ingest process, but a precursor to
>> the cluster Ingesting any data.
>>
>> 4) If you bounce your cluster when it gets slow, does this have any
>> impact?
>> -Can check. What should I expect to happen if I restart HBase-related
>> services while trying to query Phoenix? Will the query just wait until
>> everything is back up? Will I get strange exceptions? (Of course I'll go
>> find this out myself)
>>
>> 5) What kinds of queries are running? Aggregation? Joins? Or just plain
>> single table selects? Any ORDER BY clauses? Are you using secondary
>> indexes, and if so, what kind?
>> -Very simple query: select x, y, z from my_table where key = 'my_key' and
>> sample_point <= upper_range and sample_point >= lower_range. x, y, and z
>> are integers.
>>
>> 6) Are you seeing GC pauses on the server during times of slowness
>> (correlate time of slowness with your server logs)?
>> -Can look
>>
>> 7) Sounds like your queries are returning a lot of data. On the
>> client-side, Phoenix will keep phoenix.query.spoolThresholdBytes in memory
>> and then spool to disk as parallel execution happens. Are you seeing many
>> spool files on the client side your /tmp directory (this is where Phoenix
>> puts these by default with a name of ResultSpoolerXXX.bin). Try increasing
>> this spool threshold if that's the case.
>> -I'll look into this
>>
>> 8) For the data ingest, are you using UPSERT VALUES? How big of batches
>> are you committing? That's one thing to tune, especially if you're using
>> secondary indexing.
>> -Again, nothing ingesting
>>
>> 9) Have you tried tuning the level of parallelization that Phoenix is
>> doing for queries? This is controlled by the server-side
>> phoenix.stats.guidepost.width parameter (assuming you haven't set the
>> phoenix.stats.guidepost.per.region parameter) and defaults to 300MB. Try
>> increasing it (you'll need to run a major compaction for this to take
>> effect, and there's 15min lag to when the client sees it).
>> -This sounds interesting. I'll have to learn more about it.
>>
>> 10) If you're doing aggregation or join queries, try increasing the
>> phoenix.query.maxGlobalMemorySize property on the server side. Both hash
>> joins and aggregation are done in memory, up to this % limit. If the limit
>> is reached, then on the server side, Phoenix will wait
>> phoenix.query.maxGlobalMemoryWaitMs time for the usage to go below the
>> limit (and then throw an exception if it doesn't). You can try tuning this
>> wait time down to see if it has an impact.
>> -Not applicable
>>
>> 11) There a bunch of client-side metrics you can collect (but little
>> documentation yet - keep your eye on PHOENIX-2486) that might help you
>> diagnose this. See PhoenixRuntime.getGlobalPhoenixClientMetrics(),
>> PhoenixRuntime.getOverAllReadRequestMetrics(), and other methods with
>> Metrics in the name.
>> -I'll look into this
>>
>> 12) There's also tracing, which is end-to-end client/server, but it's in
>> a bit on the raw side still:
>> https://phoenix.apache.org/tracing.htmlhttps://phoenix.apache.org/tracing.html
>> -OK
>>
>> There's more information on these tuning parameters here:
>> https://phoenix.apache.org/tuning.html and you should take a look at
>> Andrew's excellent tuning presentation here:
>> https://phoenix.apache.org/resources.html.
>>
>> Thanks,
>> James
>>
>>
>> On Fri, Dec 4, 2015 at 8:28 AM, Andrew Purtell <apurtell@apache.org
>> <mailto:apurtell@apache.org>> wrote:
>> Kumar - I believe you mentioned you are seeing this in a cluster of ~20
>> regionservers.
>>
>> Zack - Yours is smaller yet, at 9.
>>
>> These clusters are small enough to make getting stack dumps through the
>> HBase debug servlet during periods of unusually slow response possible.
>> Perhaps you can write a script that queries all of the debug servlets (can
>> use curl) and dumps the received output into per-regionserver files? Scrape
>> every 10 or so seconds during the observed periods of slowness? Then
>> compress them and make them available for Phoenix devs up on S3? Consider
>> it a poor man's sampler. I don't know what we might find, but this could
>> prove very helpful.
>>
>>
>> On Fri, Dec 4, 2015 at 8:11 AM, Kumar Palaniappan <
>> kpalaniappan@marinsoftware.com<mailto:kpalaniappan@marinsoftware.com>>
>> wrote:
>> I'm in the same exact position as Zack described. Appreciate your
>> feedback.
>>
>> So far we tried the call queue n the handlers, nope. Planned to try
>> off-heap cache.
>>
>> Kumar Palaniappan<http://about.me/kumar.palaniappan>   <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>
>> On Dec 4, 2015, at 6:45 AM, Riesland, Zack <Zack.Riesland@sensus.com>
>> wrote:<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Thanks Satish,<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> To clarify: I’m not looking up single rows. I’m looking up the history of
>> each widget, which returns hundreds-to-thousands of results per widget (per
>> query). <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Each query is a range scan, it’s just that I’m performing thousands of
>> them.<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> From: Satish Iyengar [mailto:satysh@gmail.com]
>> Sent: Friday, December 04, 2015 9:43 AM
>> To: user@phoenix.apache.org
>> Subject: Re: Help tuning for bursts of high traffic?<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Hi Zack,<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Did you consider avoiding hitting hbase for every single row by doing
>> that step in an offline mode? I was thinking if you could have some kind of
>> daily export of hbase table and then use pig to perform join (co-group
>> perhaps) to do the same. Obviously this would work only when your hbase
>> table is not maintained by stream based system. Hbase is really good at
>> range scans and may not be ideal for single row (large number of).<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Thanks,<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Satish<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> On Fri, Dec 4, 2015 at 9:09 AM, Riesland, Zack <Zack.Riesland@sensus.com>
>> wrote:<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> SHORT EXPLANATION: a much higher percentage of queries to phoenix return
>> exceptionally slow after querying very heavily for several minutes.<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> LONGER EXPLANATION:<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> I’ve been using Pheonix for about a year as a data store for web-based
>> reporting tools and it works well.<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Now, I’m trying to use the data in a different (much more
>> request-intensive) way and encountering some issues.<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> The scenario is basically this:<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Daily, ingest very large CSV files with data for widgets. <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Each input file has hundreds of rows of data for each widget, and tens of
>> thousands of unique widgets.<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> As a first step, I want to de-duplicate this data against my
>> Phoenix-based DB (I can’t rely on just upserting the data for de-dup
>> because it will go through several ETL steps before being stored into
>> Phoenix/HBase).<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> So, per-widget, I perform a query against Phoenix (the table is keyed
>> against the unique widget ID + sample point). I get all the data for a
>> given widget id, within a certain period of time, and then I only ingest
>> rows for that widget that are new to me.<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> I’m doing this in Java in a single step: I loop through my input file and
>> perform one query per widget, using the same Connection object to Phoenix.<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> THE ISSUE:<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> What I’m finding is that for the first several thousand queries, I almost
>> always get a very fast (less than 10 ms) response (good).<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> But after 15-20 thousand queries, the response starts to get MUCH slower.
>> Some queries respond as expected, but many take as many as 2-3 minutes,
>> pushing the total time to prime the data structure into the 12-15 hour
>> range, when it would only take 2-3 hours if all the queries were fast.<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> The same exact queries, when run manually and not part of this bulk
>> process, return in the (expected) < 10 ms. <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> So it SEEMS like the burst of queries puts Phoenix into some sort of busy
>> state that causes it to respond far too slowly. <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> The connection properties I’m setting are: <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Phoenix.query.timeoutMs: 90000<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Phoenix.query.keepAliveMs: 90000<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Phenix.query.threadPoolSize: 256<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Our cluster is 9 (beefy) region servers and the table I’m referencing is
>> 511 regions. We went through a lot of pain to get the data split extremely
>> well, and I don’t think Schema design is the issue here. <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Can anyone help me understand how to make this better? Is there a better
>> approach I could take? A better set of configuration parameters? Is our
>> cluster just too small for this?<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Thanks!<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>
>>
>> <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> -- <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Satish Iyengar
>>
>> "Anyone who has never made a mistake has never tried anything new."
>> Albert Einstein<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>
>>
>> <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> -- <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>> Best regards,
>>
>>    - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)<
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>  <
>> https://twitter.com/intent/follow?original_referer=https://twitter.com/about/resources/buttons&region=follow_link&screen_name=megamda&source=followbutton&variant=2.0
>> >
>>
>>
>>
>>
>>
>>
>

Mime
View raw message