phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankit Singhal <ankitsingha...@gmail.com>
Subject Re: Bad performance of the first resultset.next()
Date Fri, 21 Apr 2017 05:53:15 GMT
+1 for Joanthan comment,
-- Take multiple jstack of the client during the query time and check which
thread is working for long. If you find merge sort is the bottleneck then
removing salting and using SERIAL scan will help in the query given above.
Ensure that your queries are not causing hotspotting. JFYI, merge sort is
optimized in PHOENIX-2377
-- If you are using single Phoenix client for all your concurrent queries
then increase phoenix.query.threadPoolSize    for parallelism and also
increase regionserver and datanode handler if your server capacity allows
you to do so.
-- In the case of parallel scans, first next() call prepares the scanners
and open them and wait till it completes the scan on all the regionserver
as you have specified ORDER BY on a salted table. Try removing ORDER BY and
see the performance.

Below tuning guide should help you with the tips regarding the same too.
https://phoenix.apache.org/tuning_guide.html




On Thu, Apr 20, 2017 at 9:12 PM, ashish tapdiya <ashishtapdiya@gmail.com>
wrote:

> execQuery() is asynchronous and returns immediately.
>
> next() has blocking semantics and that is why it waits for the result set
> to be generated by the server side.
>
>
> On Thu, Apr 20, 2017 at 10:18 AM, Jonathan Leech <jonathaz@gmail.com>
> wrote:
>
>> Client merge sort is just merging already sorted data from the parallel
>> scan. Look into the number of simultaneous queries vs the Phoenix thread
>> pool size and numActiveHandlers in Hbase region servers. Salting might not
>> be helping you. Also try setting the fetch size on the query in JDBC. Make
>> sure your regions for the table are spread around equally on the region
>> servers. Hbase does not do that by default.
>>
>> On Apr 20, 2017, at 5:45 AM, Binh Luong <blnr102@gmx.de> wrote:
>>
>> Hi Josh,
>> thank you for your answer.
>> Yes, I am using HDP 2.3.4. You're right, with the newer versions it may
>> improve the performance significantly. However, we are going to have a
>> release shortly, so now it's not possible for an upgrade. But yes, it
>> should happen in the upcoming application release.
>>
>> The table has 21 columns:
>> - the first 3 (id,type and timestamp) make up the PK
>> - the following 18 columns are unsigned int.
>>
>> No, there is no secondary indexes defined for the table.
>> An example query:
>> SELECT timestamp,VALUE04,VALUE15
>> FROM T.TABELLE
>> WHERE id='ID1' and type='A' and timestamp>=TO_TIMESTAMP('...')
>> timestamp<=TO_TIMESTAMP('...')
>> ORDER BY id ASC, type ASC, timestamp ASC;
>>
>> Explain plan:
>> | CLIENT 7-CHUNK PARALLEL 7-WAY RANGE SCAN OVER T.TABELLE
>> [0,'ID1','A','2015-12-02 00:00:00.000'] - [0,'ID1','A','2017-01-01
>> 00:00:00.000']
>> |     SERVER FILTER BY (A.VALUE04 IS NOT NULL OR A.VALUE15 IS NOT NULL)
>> | CLIENT MERGE SORT
>>
>> It looks like you suspect that phoenix is firstly reading the data and
>> then post-filtering / sorting the data.
>>
>> But why it take sometimes so much time in the first next() call?
>>
>> When I try to send the request sequentially, the 1.next() always takes
>> about less than 200 ms for processing. But when a large number of requests
>> are coming in parallel, the processing time is increasing significantly to
>> even more than 20, 30 secs.
>>
>> Is it something relating to HBase, as the table is minor compacted from
>> time to time and it has impact to the read performance?
>> I am not sure how the next() call is implemented in the phoenix 4.4.0?
>> Which component can be the bottleneck in such concurrent processing
>> scenario?
>>
>> Thanks in advance
>> Lee
>>
>>
>> <quote author="Josh Elser-2">
>> I'm guessing that you're using a version of HDP? If you're using those
>> versions from Apache, please update as they're dreadfully out of date.
>>
>> What is the DDL of the table you're reading from? Do you have any
>> secondary indexes on this table (if so, on what columns)? What kind of
>> query are you running? What is the output of `EXPLAIN <sql>` for these
>> queries?
>>
>> For example, this could be easily explained if Phoenix is reading the
>> data table and post-filtering records. It could take significant amounts
>> of time to read data that does not satisfy your query until you get to
>> some data which does...
>>
>> Lee wrote:
>> > Hi all,
>> >
>> > currently I am struggling with a performance issue in my Rest API. The
>> API
>> > receives loads of requests coming from frontend in parallel, makes SQL
>> > queries using Phoenix JDBC driver to fetch data from HBase. For each
>> > request, the api makes only 1 query to phoenix/hbase.
>> >
>> > I find out, that the very first ResultSet.next() always take long time
>> to
>> > get data from hbase. As far as I know, it gets data in batch, stores
>> them in
>> > main memory, enables the following next() to get data directly from main
>> > memory and thus save up the network overload. The following next() takes
>> > usually less than 10 ms to finish.
>> >
>> > Sometimes this first next() takes more than 10 seconds and gets
>> increasing
>> > from time to time to 30 or even 40 secs. For each query we expect
>> maximal
>> > 25000 rows.
>> > What can be here the bottleneck for this behaviour?
>> >
>> > Some information regarding my setup:
>> > Hadoop: 2.7.1
>> > HBase: 1.1.2
>> > Phoenix: 4.4.0 Hbase 1.1
>> > Table has 605M rows - salted in 7 buckets - 26 regions across 10 region
>> > servers
>> > phoenix.query.threadPoolSize = 128 (default)
>> > phoenix.query.queueSize = 5000 (default)
>> >
>> > Thanks!
>> > Lee
>> >
>> >
>> >
>> > --
>> > View this message in context: http://apache-phoenix
>> -user-list.1124778.n5.nabble.com/Bad-performance-of-the-
>> first-resultset-next-tp3424.html
>> > Sent from the Apache Phoenix User List mailing list archive at
>> Nabble.com.
>> </quote>
>>
>>
>

Mime
View raw message