phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Binh Luong" <>
Subject Bad performance of the first
Date Thu, 20 Apr 2017 11:45:11 GMT
<html><head></head><body><div style="font-family: Verdana;font-size:
<div>Hi Josh,</div>

<div>thank you for your answer.</div>

<div>Yes, I am using HDP 2.3.4. You&#39;re right, with the newer versions it may
improve the performance significantly. However, we are going to have a release shortly, so
now it&#39;s not possible for an upgrade. But yes, it should happen in the upcoming application


<div>The table has 21 columns:</div>

<div>- the first 3 (id,type and timestamp) make up the PK</div>

<div>- the following 18 columns are unsigned int.</div>


<div>No, there is no secondary indexes defined for the table.</div>

<div>An example query:</div>

<div>SELECT timestamp,VALUE04,VALUE15</div>

<div>FROM T.TABELLE</div>

<div>WHERE id=&#39;ID1&#39; and type=&#39;A&#39; and timestamp&gt;=TO_TIMESTAMP(&#39;...&#39;)

<div>ORDER BY id ASC, type ASC, timestamp ASC;</div>


<div>Explain plan:</div>

<div>&#124; CLIENT 7-CHUNK PARALLEL 7-WAY RANGE SCAN OVER T.TABELLE [0,&#39;ID1&#39;,&#39;A&#39;,&#39;2015-12-02
00:00:00.000&#39;] - [0,&#39;ID1&#39;,&#39;A&#39;,&#39;2017-01-01

<div>&#124; &nbsp; &nbsp; SERVER FILTER BY (A.VALUE04 IS NOT NULL OR A.VALUE15
IS NOT NULL)</div>

<div>&#124; CLIENT MERGE SORT</div>


<div>It looks like you suspect that phoenix is firstly reading the data and then post-filtering
/ sorting the data.&nbsp;</div>


<div>But why it take sometimes so much time in the first next() call?</div>


<div>When I try to send the request sequentially, the always takes about less
than 200 ms for processing. But when a large number of requests are coming in parallel, the
processing time is increasing significantly to even more than 20, 30 secs.&nbsp;</div>


<div>Is it something relating to HBase, as the table is minor compacted from time to
time and it has impact to the read performance?&nbsp;</div>

<div>I am not sure how the next() call is implemented in the phoenix 4.4.0? Which component
can be the bottleneck in such concurrent processing scenario?&nbsp;</div>


<div>Thanks in advance</div>




<div>&lt;quote author=&quot;Josh Elser-2&quot;&gt;</div>

<div>I&#39;m guessing that you&#39;re using a version of HDP? If you&#39;re
using those&nbsp;</div>

<div>versions from Apache, please update as they&#39;re dreadfully out of date.</div>


<div>What is the DDL of the table you&#39;re reading from? Do you have any&nbsp;</div>

<div>secondary indexes on this table (if so, on what columns)? What kind of&nbsp;</div>

<div>query are you running? What is the output of &#96;EXPLAIN &lt;sql&gt;&#96;
for these&nbsp;</div>



<div>For example, this could be easily explained if Phoenix is reading the&nbsp;</div>

<div>data table and post-filtering records. It could take significant amounts&nbsp;</div>

<div>of time to read data that does not satisfy your query until you get to&nbsp;</div>

<div>some data which does...</div>


<div>Lee wrote:</div>

<div>&gt; Hi all,</div>


<div>&gt; currently I am struggling with a performance issue in my Rest API. The

<div>&gt; receives loads of requests coming from frontend in parallel, makes SQL</div>

<div>&gt; queries using Phoenix JDBC driver to fetch data from HBase. For each</div>

<div>&gt; request, the api makes only 1 query to phoenix/hbase.</div>


<div>&gt; I find out, that the very first always take long time

<div>&gt; get data from hbase. As far as I know, it gets data in batch, stores them

<div>&gt; main memory, enables the following next() to get data directly from main</div>

<div>&gt; memory and thus save up the network overload. The following next() takes</div>

<div>&gt; usually less than 10 ms to finish.</div>


<div>&gt; Sometimes this first next() takes more than 10 seconds and gets increasing</div>

<div>&gt; from time to time to 30 or even 40 secs. For each query we expect maximal</div>

<div>&gt; 25000 rows.</div>

<div>&gt; What can be here the bottleneck for this behaviour?</div>


<div>&gt; Some information regarding my setup:</div>

<div>&gt; Hadoop: 2.7.1</div>

<div>&gt; HBase: 1.1.2</div>

<div>&gt; Phoenix: 4.4.0 Hbase 1.1</div>

<div>&gt; Table has 605M rows - salted in 7 buckets - 26 regions across 10 region</div>

<div>&gt; servers</div>

<div>&gt; phoenix.query.threadPoolSize = 128 (default)</div>

<div>&gt; phoenix.query.queueSize = 5000 (default)</div>


<div>&gt; Thanks!</div>

<div>&gt; Lee</div>




<div>&gt; --</div>

<div>&gt; View this message in context:&nbsp;<a data-saferedirecturl=";q=;source=gmail&amp;ust=1492774942072000&amp;usg=AFQjCNHoECxEJ9_ABQocSKEyeb7Pr-vRIw"

<div>&gt; Sent from the Apache Phoenix User List mailing list archive at</div>


View raw message