phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mujtaba Chohan <mujt...@apache.org>
Subject Re: Update statistics made query 2-3x slower
Date Wed, 11 Feb 2015 19:54:06 GMT
To compare performance without stats, try deleting related rows from
SYSTEM.STATS or an easier way, just truncate SYSTEM.STATS table from HBase
shell and restart your region servers.

//mujtaba

On Wed, Feb 11, 2015 at 10:29 AM, Vasudevan, Ramkrishna S <
ramkrishna.s.vasudevan@intel.com> wrote:

>  Hi Constantin
>
>
>
> Before I could explain on the slowness part let me answer your 2nd
> question,
>
>
>
> Phoenix is on top of HBase. HBase is a distributed NoSQL DB. So the data
> that is residing inside logical entities called regions are spread across
> different nodes (region servers).  There is nothing like a table that is in
> one location where you can keep updating the count of rows that is getting
> inserted.
>
>
>
> Which means that when you need  count(*) you may have to aggregate the
> count from every region distributed across region servers. So in other
> words a table is not a single entity it is a collection of regions.
>
>
>
> Coming to your slowness in query, the update statistics query allows you
> to parallelize the query into logical chunks on a single region.  Suppose
> there are 100K rows in a region the statistics collected would allow you to
> run a query parallely for eg say execute parallely on 10 equal chunks of
> 10000 rows within that region.
>
>
>
> Have you modified any of the parameters related to statistics like this
> one ‘phoenix.stats.guidepost.width’.
>
>
>
>
>
> Regards
>
> Ram
>
> *From:* Ciureanu, Constantin (GfK) [mailto:Constantin.Ciureanu@gfk.com]
> *Sent:* Wednesday, February 11, 2015 2:51 PM
> *To:* user@phoenix.apache.org
> *Subject:* Update statistics made query 2-3x slower
>
>
>
> Hello all,
>
>
>
> 1.     Is there a good explanation why updating the statistics:
>
> *update statistics tableX;*
>
>
>
> made this query 2x times slower?   (it was 27 seconds before, now it’s
> somewhere between 60 – 90 seconds)
>
> *select count(*) from tableX;*
>
> +------------------------------------------+
>
> |                 COUNT(1)                 |
>
> +------------------------------------------+
>
> | 5786227                                  |
>
> +------------------------------------------+
>
> 1 row selected (62.718 seconds)
>
>
>
> (If possible J ) how can I “drop” those statistics?
>
>
>
> 2. Why there is nothing (like a counter / attribute for the table) to
> obtain the number of rows in one table fast?
>
>
>
> Thank you,
>
>    Constantin
>

Mime
View raw message