Thank You James. That's great news that 4.13 for CDH is coming. I will try the same on the new version once its released.
Thank you James and Mujtaba for the responses.
I am OK with an estimate count . So using SYSTEM.STATS table looks good in my case. But when I tried this, it gave me inconsistent results. for example.
select count(*) from myschema.mytable => 3474085
select sum(guide_posts_row_count) from system.stats where physical_name = 'myschema.device_data' => 3348090
Time 2 : ( Time1 + ~10 mins)
select count(*) from myschema.mytable => 3474106
select sum(guide_posts_row_count) from system.stats where physical_name = 'myschema.device_data' => 3348080
So I was expecting the stats count to go up but surprisingly, the count went down. Is there a specific configuration or something else that I am missing?
I am using phoenix 4.7( on CDH), So cannot try Table sampling feature.
On 12/19/2017 03:43 PM, Mujtaba Chohan wrote:
Another alternate outside Phoenix is to use http://hbase.apache.org/apidoc
On Tue, Dec 19, 2017 at 3:18 PM, James Taylor <firstname.lastname@example.org> wrote:
If it needs to be 100% accurate, then count(*) is the only way. If your data is write-once data, you might be able to track the row count at the application level through some kind of atomic counter in a different table (but this will likely be brittle). If you can live with an estimate, you could enable statistics , optionally configuring Phoenix not to use stats for parallelization , and query the SYSTEM.STATS table to get an estimate .
Another interesting alternative if you want the approximate row count when you have a where clause would be to use the new table sampling feature . You'd also want stats enabled for this to be more accurate too.
ization=false select sum(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS where physical_name='my_schema.my_ta ble'and COLUMN_FAMILY='my_first_c olumn_family' -- necessary only if you have multiple column families
On Tue, Dec 19, 2017 at 2:57 PM, Jins George <email@example.com> wrote:
Is there a way to get the total row count of a phoenix table without running select count(*) from table ?
my use case is to monitor the record count in a table every x minutes, so didn't want to put load on the system by running a select count(*) query.