phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Update statistics made query 2-3x slower
Date Tue, 03 Mar 2015 22:17:42 GMT
Ah, so you actually *are* in the situation I mentioned (i.e. you had *NOT*
run a major compaction yet on the table, but you had run an UPDATE
STATISTICS). If you could verify that after running a major compaction that
the problem goes (and stays) away, that'd be helpful. See PHOENIX-1693 for
more detail - we suspect it's a data locality issue exasperated by the
parallelization we do when stats are in place.

See inline for comments on your other questions.

Thanks,
James

On Tue, Mar 3, 2015 at 4:23 AM, Ciureanu, Constantin (GfK) <
Constantin.Ciureanu@gfk.com> wrote:

> Hello James,
>
>
>
> Btw, I noticed some other issues:
>
> -       My table key is (DATUM,  … ) ordered ascending by key (LONG, in
> milliseconds) – I have changed that lately to seconds (/1000) but it’s the
> same situation.
>
> -       Querying min(DATUM) – takes too long (the table has 24 salting
> buckets, 15M records), I imagine it’s a full table scan – while it
> shouldn’t – just take the first row and skip / stop scanning that bucket
> anymore.
>
Have you seen PHOENIX-505? It's true, we don't optimize that query, but we
should/could. In the meantime, use this query instead (and it'll operate as
you suggested):
SELECT DATUM FROM PURCHASES
ORDER BY DATUM
LIMIT 1;


> -       Querying max(DATUM) – the same
>
Yep, same deal. Try this query instead (which will leverage the reverse
scan of HBase):
SELECT DATUM FROM PURCHASES
ORDER BY DATUM DESC
LIMIT 1;

-       Loading the table from a single JDBC connection / machine– speed is
> ~1500 rows / second
>
Is your DATUM time monotonically increasing? If so, make sure to salt your
table: http://phoenix.apache.org/salted.html

You have auto commit off, right? Play around with batching up more rows
before calling commit.

Have you profiled it to see if there are any obvious bottlenecks? You could
try using a PreparedStatement and binding the VALUES as that would prevent
any parser costs. Also, reuse the same statement over and over again, just
re-binding the values.

Other things to try (from http://phoenix.apache.org/tuning.html):
- increase phoenix.query.threadPoolSize

Not a whole lot going on for an UPSERT VALUES call, so those are my ideas.

> -       Loading the table using a MR job (12 mappers running in the same
> time, 6x dualcore machines) – the speed is ~5000 rows / second, the rows
> are very long, but still… processing my rows (commented out the Phoenix
> call to obtain the KV from the connection that gets rollbacked each row) –
> shows a performance of ~25k rows / second.   See the UPSERT below.  Any
> hints how I can improve the Phoenix speed? (looks like a bottleneck
> somewhere there)
>
Gabriel Reid may be able to give you some tips for the MR job, if the above
doesn't help you.

>
>
> Thank you,
>
>   Constantin
>
>
>
> UPSERT INTO PURCHASES
>
> (*PG, DATUM, GKNZ, *F2 DOUBLE,* ARTIKEL,* F9676937 INTEGER, F37794748
> INTEGER, F23300220 DOUBLE, F23371669 INTEGER, F33886486 INTEGER, F26121274
> DOUBLE, F18858445 INTEGER,
>
> F18858446 INTEGER, F18259285 DOUBLE, F18858203 DOUBLE, F9676962 INTEGER,
> F8309982 INTEGER, F9676954 INTEGER, F9676958 INTEGER, F26121262 INTEGER,
> F10638524 INTEGER, F23300221 DOUBLE,
>
> F23371670 INTEGER, F10455231 INTEGER, F40975418 DOUBLE, F23349942 DOUBLE,
> F3 DOUBLE, F39212168 DOUBLE, F2053117 INTEGER, F9676946 INTEGER, F21632550
> INTEGER, F24353527 INTEGER,
>
> F23300218 DOUBLE, F23300219 DOUBLE, F23371668 INTEGER, F17577365 INTEGER,
> *RECID*, F24363786 DOUBLE, F20463789 DOUBLE, F25793009 DOUBLE, F23300222
> DOUBLE, F23371671 INTEGER,
>
> F37794699 INTEGER, F40021913 INTEGER, F25793010 INTEGER, F4 DOUBLE, F101
> INTEGER, F103 INTEGER, F104 INTEGER, F105 INTEGER, F106 INTEGER, F107
> INTEGER, F108 INTEGER, F110 INTEGER,
>
> F102 INTEGER, F109 INTEGER, F16 INTEGER, F6 DOUBLE, F5 DOUBLE, F2020918
> INTEGER, F2020921 INTEGER, F2020924 INTEGER, F2020927 INTEGER, F2020930
> INTEGER, F2021775 INTEGER,
>
> F8827515 INTEGER, F9772724 INTEGER, F9864110 INTEGER, F9889647 INTEGER,
> F10628350 INTEGER, F10667363 INTEGER, F10667364 INTEGER, F10954187 INTEGER,
> F11576113 INTEGER,
>
> F18062810 DOUBLE, F18386375 DOUBLE, F18386382 DOUBLE, F18386383 DOUBLE,
> F18386386 INTEGER, F24329490 DOUBLE, F25934458 INTEGER, F26155169 DOUBLE,
> F39422692 INTEGER,
>
> F39422693 INTEGER, F39422694 INTEGER, F39422695 INTEGER, F39422696
> INTEGER, F39717091 INTEGER, F39717094 INTEGER, F39717095 INTEGER, F39717097
> INTEGER, F39717100 INTEGER,
>
> F39717102 INTEGER, F39717103 INTEGER, F40268190 INTEGER, F9903934 INTEGER,
> F9954759 INTEGER, F21682770 INTEGER, F29180660 INTEGER, F12253475 INTEGER,
> F17476800 INTEGER,
>
> F24399116 INTEGER, F23298302 INTEGER, F23298245 INTEGER, F23298263
> INTEGER, F23298308 INTEGER, F39282151 INTEGER, F39282152 INTEGER, F39282154
> INTEGER, F39282155 INTEGER,
>
> F16607101 INTEGER, F17455481 INTEGER, F16607108 INTEGER, F17455450
> INTEGER, F10524806 INTEGER, F13416710 INTEGER, F17455109 INTEGER, F2099721
> INTEGER, F40278909 INTEGER,
>
> F17455388 INTEGER, F18860130 INTEGER, F11322454 INTEGER, F19488769
> INTEGER, F19488779 INTEGER, F19488501 INTEGER, F20473401 INTEGER, F10524733
> INTEGER, F17677213 INTEGER,
>
> F2099692 INTEGER, F17677232 INTEGER, F24399111 INTEGER, F24399112 INTEGER,
> F12253498 INTEGER, F17476822 INTEGER, F28842583 INTEGER, F28842630 INTEGER,
> F28842604 INTEGER,
>
> F28842589 INTEGER, F28842119 INTEGER, F28842128 DOUBLE, F28842129 INTEGER,
> F28842164 INTEGER, F28842122 INTEGER, F40278910 INTEGER, F40782521 INTEGER,
> F40319602 INTEGER,
>
> F38938528 INTEGER, F17459560 INTEGER, F18540330 INTEGER, F10524759
> INTEGER, F17455401 INTEGER, F40006953 INTEGER)
>
>
>
> VALUES (9892169, 1325894400 , 0490 , 1.0 , 345955 , 14 , 7 , 0.0 , 0 , 0 ,
> 1.0 , 31 , 0 , 2951.0 , 1.0 , 1968 , 8 , 3 , 2 , 4 , 2 , 100.0 , 1 ,
> 5408748 , 1.0 , 1.0,
>
> 375.0 , 1.0 , 7 , 2 , 1 , 31 , 1.19 , 100.0 , 1 , 0 , 1237084160 , 1.0 ,
> 1.0 , 1.0 , 0.0 , 0 , 1 , 6 , 14 , 119.0 , 737 , 25 , 13 , 9 , 7 , 5 , 3 ,
> 106 , 106 , 106 ,
>
> 25 , 1275.9697 , 0.96479213 , 7657 , 2 , 7 , 2 , 5079 , 3 , 283002 , 2 , 1
> , 17335 , 378 , 1561477 , 4 , 84 , 6 , 0.7024999856948853 ,
> 1.0478999614715576 ,
>
>  1.0549999475479126 , 1.0490000247955322 , 1 , 1.049299955368042 , 40 ,
> 0.6729999780654907 , 1 , 1 , 13 , 272 , 272 , 1 , 1 , 27 , 7 , 5 , 1 , 1 ,
>
>  1 , 11 , 30 , 2 , 4 , 1 , 1 , 3 , 3 , 8 , 3 , 5 , 3 , 16 , 1 , 6 , 1 , 1
> , 1 , 1 , 4 , 184 , 4 , 3 , 8 , 2 , 5 , 666 , 1 , 1 , 5 , 1 , 1 , 1 , 2 ,
>
>  1 , 5 , 5 , 1 , 1 , 3 , 3 , 4 , 4 , 1 , 0.0 , 3 , 4 , 2 , 17 , 15 , 15 ,
> 16 , 19 , 3 , 1 , 1 , 3 )
>
> *From:* James Taylor [mailto:jamestaylor@apache.org]
> *Sent:* Tuesday, March 03, 2015 2:20 AM
> *To:* user; Ciureanu, Constantin (GfK)
> *Subject:* Re: Update statistics made query 2-3x slower
>
>
>
> Constantin,
>
> I've filed PHOENIX-1693 for this issue, as we seem to be seeing a similar
> phenomena too. It seems to only occur if we've never run a major compaction
> on the table, though. Is that the case for you as well?
>
> Thanks,
>
> James
>
>
>
> On Mon, Feb 16, 2015 at 8:44 AM, Vasudevan, Ramkrishna S <
> ramkrishna.s.vasudevan@intel.com> wrote:
>
> Without update statistics – if we run select count(*) what is the PLAN
> that it executes?  One of the RS has got more data I believe.
>
>
>
> Regards
>
> Ram
>
>
>
> *From:* Ciureanu, Constantin (GfK) [mailto:Constantin.Ciureanu@gfk.com]
> *Sent:* Monday, February 16, 2015 3:17 PM
>
>
> *To:* user@phoenix.apache.org
> *Subject:* RE: Update statistics made query 2-3x slower
>
>
>
> Hello,
>
>
>
> I forgot to add this info:
>
>
>
> 3. There are just a few requests while running select count(*) – no other
> query was running concurrently.
>
>
>
> *ServerName*
>
> *Request Per Second*
>
> *Read Request Count*
>
> *Write Request Count*
>
> ip-10-2-3-11.lps.stage,60020,1424079136831
> <http://ip-10-2-3-11.lps.stage:60030/>
>
> 3
>
> 0
>
> 0
>
> ip-10-2-3-12.lps.stage,60020,1423492976623
> <http://ip-10-2-3-12.lps.stage:60030/>
>
> 5
>
> 571301
>
> 106
>
> ip-10-2-3-13.lps.stage,60020,1423493014906
> <http://ip-10-2-3-13.lps.stage:60030/>
>
> 1
>
> 18515
>
> 3
>
> ip-10-2-4-11.lps.stage,60020,1423493039310
> <http://ip-10-2-4-11.lps.stage:60030/>
>
> 1
>
> 31514
>
> 5
>
> ip-10-2-4-12.lps.stage,60020,1423493067346
> <http://ip-10-2-4-12.lps.stage:60030/>
>
> 2
>
> 80751
>
> 3
>
> ip-10-2-4-13.lps.stage,60020,1423493620630
> <http://ip-10-2-4-13.lps.stage:60030/>
>
> 3
>
> 13120
>
> 226
>
>
>
> I tried to recreate the statistics but this time it failed L
>
>
>
> update statistics tableX;
>
> Error: ERROR 6000 (TIM01): Operation timed out . Query couldn't be
> completed in the alloted time: 600000 ms (state=TIM01,code=6000)
>
>
>
>
>
> Thank you,
>
>    Constantin
>
>
>
> *From:* Ciureanu, Constantin (GfK) [mailto:Constantin.Ciureanu@gfk.com
> <Constantin.Ciureanu@gfk.com>]
> *Sent:* Monday, February 16, 2015 10:31 AM
> *To:* user@phoenix.apache.org
> *Subject:* RE: Update statistics made query 2-3x slower
>
>
>
> Hi Mujtaba,
>
>
>
> 1.     Update statistics - was executed after loading the data in the
> table (and after first select count(*)… of course).
>
> 2.     6 region servers x ~50 regions each = 310 regions (too many?)
>
> 3.       Here it is:
>
> explain select count(*) from tableX;
>
> +------------------------------------------+
>
> |                   PLAN                   |
>
> +------------------------------------------+
>
> | CLIENT 24-CHUNK PARALLEL 24-WAY FULL SCAN OVER tableX |
>
> |     SERVER FILTER BY FIRST KEY ONLY      |
>
> |     SERVER AGGREGATE INTO SINGLE ROW     |
>
> +------------------------------------------+
>
> 3 rows selected (0.287 seconds)
>
>
>
> Thank you,
>
>   Constantin
>
>
>
> *From:* Mujtaba Chohan [mailto:mujtaba@apache.org <mujtaba@apache.org>]
> *Sent:* Friday, February 13, 2015 7:21 PM
> *To:* user@phoenix.apache.org
> *Subject:* Re: Update statistics made query 2-3x slower
>
>
>
> Hi Constantin,
>
> This in useful info. Just to clarify slowdown that you see after update
> statistics case, was the *update statistics* executed after initial data
> load or after you upserted it again?
>
> 2. How many regions/region servers (RS) did the data end up on?
>
> 3. 30/60 seconds for count(*) seems really high. Do you see lots of disk
> I/O during your count query? How wide are your rows and how much memory is
> available on your RS/HBase heap?
>
>
>
> 3. Can you also send output of *explain select count(*) from tablex* for
> this case?
>
>
> Thanks,
>
> Mujtaba
>
>
>
> On Fri, Feb 13, 2015 at 12:34 AM, Ciureanu, Constantin (GfK) <
> Constantin.Ciureanu@gfk.com> wrote:
>
> Hello Mujtaba,
>
>
>
> Don’t worry – it was just the *select count(*) from tableX* that was
> slowed down in a more than visible way.
>
> I presume all the regular queries do actually benefit from using the STATS.
>
>
>
> Some other cases where I saw slowdown for “*select count(*) from tableX*”:
>
> -       First time after loading 6 M records – the time to obtain the
> count was ~30 sec
>
> -       After loading the *same* 6 M records again – the time almost
> doubled L I imagine the data is doubled, not yet compacted in HBase
>
> -       After deleting the 6M rows (delete from …. , not truncate) and
> loading the 6M rows again – the same double time – same comment as above
>
> -       After update statistics tableX – the time was around 2x the
> original time (~60 seconds) – this I couldn’t really explain (perhaps the
> fleet I use is undersized)
>
>
>
> I need to mention that I’m using 4.2.2 but I can’t wait for 4.3 to be
> released (as it will fix some issues I have. Eg. one with SKIP SCAN:
>
> [1st part of PK between A and B] or [first part of PK between C and D] or
> [….] was understood as a full table scan = painfully slow -> but this
> worked today after I used a hint in the SELECT /*+ SKIP_SCAN */ which
> shouldn’t be mandatory in my opinion).
>
>
>
> Regards,
>
>   Constantin
>
>
>
> *From:* Mujtaba Chohan [mailto:mujtaba@apache.org]
> *Sent:* Thursday, February 12, 2015 9:20 PM
>
>
> *To:* user@phoenix.apache.org
> *Subject:* Re: Update statistics made query 2-3x slower
>
>
>
> Constantin - If possible can you please share your schema, approx.
> row/columns width, number of region servers in your cluster plus their heap
> size, HBase/Phoenix version and any default property overrides so we can
> identify why stats are slowing things down in your case.
>
>
>
> Thanks,
>
> Mujtaba
>
>
>
> On Thu, Feb 12, 2015 at 12:56 AM, Ciureanu, Constantin (GfK) <
> Constantin.Ciureanu@gfk.com> wrote:
>
> It worked!
>
> Without stats it’s again faster (2-3x times) – but I do understand that
> all other normal queries might benefit from the stats.
>
>
>
> Thank you Mujtaba for the info,
>
> Thank you Vasudevan for the explanations, I already used HBase and I agree
> it’s hard to have a counter for the table rows (especially if the
> tombstones for deleted rows are still there – ie. not compacted yet).
>
>
>
> Constantin
>
>
>
>
>
>
>
> *From:* Mujtaba Chohan [mailto:mujtaba@apache.org]
> *Sent:* Wednesday, February 11, 2015 8:54 PM
> *To:* user@phoenix.apache.org
> *Subject:* Re: Update statistics made query 2-3x slower
>
>
>
> To compare performance without stats, try deleting related rows from
> SYSTEM.STATS or an easier way, just truncate SYSTEM.STATS table from HBase
> shell and restart your region servers.
>
> //mujtaba
>
>
>
> On Wed, Feb 11, 2015 at 10:29 AM, Vasudevan, Ramkrishna S <
> ramkrishna.s.vasudevan@intel.com> wrote:
>
> Hi Constantin
>
>
>
> Before I could explain on the slowness part let me answer your 2nd
> question,
>
>
>
> Phoenix is on top of HBase. HBase is a distributed NoSQL DB. So the data
> that is residing inside logical entities called regions are spread across
> different nodes (region servers).  There is nothing like a table that is in
> one location where you can keep updating the count of rows that is getting
> inserted.
>
>
>
> Which means that when you need  count(*) you may have to aggregate the
> count from every region distributed across region servers. So in other
> words a table is not a single entity it is a collection of regions.
>
>
>
> Coming to your slowness in query, the update statistics query allows you
> to parallelize the query into logical chunks on a single region.  Suppose
> there are 100K rows in a region the statistics collected would allow you to
> run a query parallely for eg say execute parallely on 10 equal chunks of
> 10000 rows within that region.
>
>
>
> Have you modified any of the parameters related to statistics like this
> one ‘phoenix.stats.guidepost.width’.
>
>
>
>
>
> Regards
>
> Ram
>
> *From:* Ciureanu, Constantin (GfK) [mailto:Constantin.Ciureanu@gfk.com]
> *Sent:* Wednesday, February 11, 2015 2:51 PM
> *To:* user@phoenix.apache.org
> *Subject:* Update statistics made query 2-3x slower
>
>
>
> Hello all,
>
>
>
> 1.     Is there a good explanation why updating the statistics:
>
> *update statistics tableX;*
>
>
>
> made this query 2x times slower?   (it was 27 seconds before, now it’s
> somewhere between 60 – 90 seconds)
>
> *select count(*) from tableX;*
>
> +------------------------------------------+
>
> |                 COUNT(1)                 |
>
> +------------------------------------------+
>
> | 5786227                                  |
>
> +------------------------------------------+
>
> 1 row selected (62.718 seconds)
>
>
>
> (If possible J ) how can I “drop” those statistics?
>
>
>
> 2. Why there is nothing (like a counter / attribute for the table) to
> obtain the number of rows in one table fast?
>
>
>
> Thank you,
>
>    Constantin
>
>
>
>
>
>
>
>
>

Mime
View raw message