phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Efficient way to get the row count of a table
Date Tue, 19 Dec 2017 23:18:23 GMT
If it needs to be 100% accurate, then count(*) is the only way. If your
data is write-once data, you might be able to track the row count at the
application level through some kind of atomic counter in a different table
(but this will likely be brittle). If you can live with an estimate, you
could enable statistics [1], optionally configuring Phoenix not to use
stats for parallelization [2], and query the SYSTEM.STATS table to get an
estimate [3].

Another interesting alternative if you want the approximate row count when
you have a where clause would be to use the new table sampling feature [4].
You'd also want stats enabled for this to be more accurate too.

Thanks,
James


[1] https://phoenix.apache.org/update_statistics.html
[2] phoenix.use.stats.parallelization=false
[3] select sum(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS where
physical_name='my_schema.my_table'
     and COLUMN_FAMILY='my_first_column_family' -- necessary only if you
have multiple column families
[4] https://phoenix.apache.org/tablesample.html

On Tue, Dec 19, 2017 at 2:57 PM, Jins George <jins.george@aeris.net> wrote:

> Hi,
>
> Is there a way to get the total row count of a phoenix table without
> running select count(*) from table ?
> my use case is to monitor the record count in a table every x minutes, so
> didn't want to put load on the system by running a select count(*) query.
>
> Thanks,
> Jins George
>

Mime
View raw message