phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Binh Luong" <blnr...@gmx.de>
Subject Bad performance of the first resultset.next()
Date Thu, 20 Apr 2017 11:45:11 GMT
<html><head></head><body><div style="font-family: Verdana;font-size:
12.0px;"><div>
<div>
<div>Hi Josh,</div>

<div>thank you for your answer.</div>

<div>Yes, I am using HDP 2.3.4. You&#39;re right, with the newer versions it may
improve the performance significantly. However, we are going to have a release shortly, so
now it&#39;s not possible for an upgrade. But yes, it should happen in the upcoming application
release.</div>

<div>&nbsp;</div>

<div>The table has 21 columns:</div>

<div>- the first 3 (id,type and timestamp) make up the PK</div>

<div>- the following 18 columns are unsigned int.</div>

<div>&nbsp;</div>

<div>No, there is no secondary indexes defined for the table.</div>

<div>An example query:</div>

<div>SELECT timestamp,VALUE04,VALUE15</div>

<div>FROM T.TABELLE</div>

<div>WHERE id=&#39;ID1&#39; and type=&#39;A&#39; and timestamp&gt;=TO_TIMESTAMP(&#39;...&#39;)
timestamp&lt;=TO_TIMESTAMP(&#39;...&#39;)</div>

<div>ORDER BY id ASC, type ASC, timestamp ASC;</div>

<div>&nbsp;</div>

<div>Explain plan:</div>

<div>&#124; CLIENT 7-CHUNK PARALLEL 7-WAY RANGE SCAN OVER T.TABELLE [0,&#39;ID1&#39;,&#39;A&#39;,&#39;2015-12-02
00:00:00.000&#39;] - [0,&#39;ID1&#39;,&#39;A&#39;,&#39;2017-01-01
00:00:00.000&#39;]</div>

<div>&#124; &nbsp; &nbsp; SERVER FILTER BY (A.VALUE04 IS NOT NULL OR A.VALUE15
IS NOT NULL)</div>

<div>&#124; CLIENT MERGE SORT</div>

<div>&nbsp;</div>

<div>It looks like you suspect that phoenix is firstly reading the data and then post-filtering
/ sorting the data.&nbsp;</div>

<div>&nbsp;</div>

<div>But why it take sometimes so much time in the first next() call?</div>

<div>&nbsp;</div>

<div>When I try to send the request sequentially, the 1.next() always takes about less
than 200 ms for processing. But when a large number of requests are coming in parallel, the
processing time is increasing significantly to even more than 20, 30 secs.&nbsp;</div>

<div>&nbsp;</div>

<div>Is it something relating to HBase, as the table is minor compacted from time to
time and it has impact to the read performance?&nbsp;</div>

<div>I am not sure how the next() call is implemented in the phoenix 4.4.0? Which component
can be the bottleneck in such concurrent processing scenario?&nbsp;</div>

<div>&nbsp;</div>

<div>Thanks in advance</div>

<div>Lee</div>
</div>

<div>&nbsp;</div>

<div>&nbsp;</div>

<div>&lt;quote author=&quot;Josh Elser-2&quot;&gt;</div>

<div>I&#39;m guessing that you&#39;re using a version of HDP? If you&#39;re
using those&nbsp;</div>

<div>versions from Apache, please update as they&#39;re dreadfully out of date.</div>

<div>&nbsp;</div>

<div>What is the DDL of the table you&#39;re reading from? Do you have any&nbsp;</div>

<div>secondary indexes on this table (if so, on what columns)? What kind of&nbsp;</div>

<div>query are you running? What is the output of &#96;EXPLAIN &lt;sql&gt;&#96;
for these&nbsp;</div>

<div>queries?</div>

<div>&nbsp;</div>

<div>For example, this could be easily explained if Phoenix is reading the&nbsp;</div>

<div>data table and post-filtering records. It could take significant amounts&nbsp;</div>

<div>of time to read data that does not satisfy your query until you get to&nbsp;</div>

<div>some data which does...</div>

<div>&nbsp;</div>

<div>Lee wrote:</div>

<div>&gt; Hi all,</div>

<div>&gt;</div>

<div>&gt; currently I am struggling with a performance issue in my Rest API. The
API</div>

<div>&gt; receives loads of requests coming from frontend in parallel, makes SQL</div>

<div>&gt; queries using Phoenix JDBC driver to fetch data from HBase. For each</div>

<div>&gt; request, the api makes only 1 query to phoenix/hbase.</div>

<div>&gt;</div>

<div>&gt; I find out, that the very first ResultSet.next() always take long time
to</div>

<div>&gt; get data from hbase. As far as I know, it gets data in batch, stores them
in</div>

<div>&gt; main memory, enables the following next() to get data directly from main</div>

<div>&gt; memory and thus save up the network overload. The following next() takes</div>

<div>&gt; usually less than 10 ms to finish.</div>

<div>&gt;</div>

<div>&gt; Sometimes this first next() takes more than 10 seconds and gets increasing</div>

<div>&gt; from time to time to 30 or even 40 secs. For each query we expect maximal</div>

<div>&gt; 25000 rows.</div>

<div>&gt; What can be here the bottleneck for this behaviour?</div>

<div>&gt;</div>

<div>&gt; Some information regarding my setup:</div>

<div>&gt; Hadoop: 2.7.1</div>

<div>&gt; HBase: 1.1.2</div>

<div>&gt; Phoenix: 4.4.0 Hbase 1.1</div>

<div>&gt; Table has 605M rows - salted in 7 buckets - 26 regions across 10 region</div>

<div>&gt; servers</div>

<div>&gt; phoenix.query.threadPoolSize = 128 (default)</div>

<div>&gt; phoenix.query.queueSize = 5000 (default)</div>

<div>&gt;</div>

<div>&gt; Thanks!</div>

<div>&gt; Lee</div>

<div>&gt;</div>

<div>&gt;</div>

<div>&gt;</div>

<div>&gt; --</div>

<div>&gt; View this message in context:&nbsp;<a data-saferedirecturl="https://www.google.com/url?hl=en&amp;q=http://apache-phoenix-user-list.1124778.n5.nabble.com/Bad-performance-of-the-first-resultset-next-tp3424.html&amp;source=gmail&amp;ust=1492774942072000&amp;usg=AFQjCNHoECxEJ9_ABQocSKEyeb7Pr-vRIw"
href="http://apache-phoenix-user-list.1124778.n5.nabble.com/Bad-performance-of-the-first-resultset-next-tp3424.html"
target="_blank">http://apache-phoenix-user-<wbr/>list.1124778.n5.nabble.com/<wbr/>Bad-performance-of-the-first-<wbr/>resultset-next-tp3424.html</a></div>

<div>&gt; Sent from the Apache Phoenix User List mailing list archive at Nabble.com.</div>

<div>&lt;/quote&gt;</div>
</div></div></body></html>

Mime
View raw message