phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sunfl@certusnet.com.cn" <su...@certusnet.com.cn>
Subject Re: Re: rpc timeout when count on large table
Date Thu, 15 Jan 2015 11:27:47 GMT
Hi, James
Really appreciated for your detailed illustration. I issue UPDATE STATISTICS <table>
and rerun the count (*) query
then found query performance has achived better results. Does the statistics collection schema
affects all of query or
that just affects aggreate query? How does that command improve query performance? That would
be fine if you 
can explain a little : )

Our problem still occurs for now and we need to investigate more deep in the query. I will
check the configurations you 
provide in the late tests.

Thanks,
Sun
 




CertusNet 

From: James Taylor
Date: 2015-01-15 17:46
To: user
Subject: Re: Re: rpc timeout when count on large table
Those settings (one or the other - you wouldn't set both) drive the
amount of parallelization done (i.e. the number or byte size of each
parallel chunk).
 
What do you get when you run the following queries?
SELECT COUNT(*) FROM SYSTEM.STATS WHERE PHYSICAL_NAME = '<your full
table name>';
SELECT SUM(GUIDE_POSTS_COUNT) FROM SYSTEM.STATS WHERE PHYSICAL_NAME =
'<your full table name>';
 
As a test, try adding the following config parameter to
hbase-sites.xml on each region server:
<property>
    <name>phoenix.stats.guidepost.per.region</name>
    <value>1</value>
</property>
 
After setting it, bounce your cluster and run the following to update
your stats:
UPDATE STATISTICS <your full table name>
 
Then run your count(*) query again and see if there's any impact. Try
setting the phoenix.stats.guidepost.per.region successively higher to
2, 4, 8 (following above steps) and see if it makes a difference in
your query performance.
 
Thanks,
James
 
On Thu, Jan 15, 2015 at 1:23 AM, sunfl@certusnet.com.cn
<sunfl@certusnet.com.cn> wrote:
> Hi, James
> Yes, we are running 4.2.2
> Neither of these two configs are overridden. Do these configuration only
> affects stats collection?
> I had not searched for the regionserver log for refering if any major
> compaction is running.
>
> Just curious about the query performance. Cause we are good to count on that
> perviously.
>
> Thanks,
> Sun.
>
> ________________________________
> ________________________________
>
> CertusNet
>
>
> From: James Taylor
> Date: 2015-01-15 17:10
> To: user
> Subject: Re: rpc timeout when count on large table
> You're on 4.2.2, Sun? Have you overridden either of
> phoenix.stats.guidepost.width or phoenix.stats.guidepost.per.region?
> These control the size of each parallel scan. I assume you've run a
> major compaction on the table at some point?
>
> Thanks,
> James
>
> On Wed, Jan 14, 2015 at 7:06 PM, sunfl@certusnet.com.cn
> <sunfl@certusnet.com.cn> wrote:
>> Hi, all
>>
>> When counting on large table, we got the following exception
>>   org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=,
>> waitTime=69714 rpcTimetout=60000
>>
>> How would that be resolved? Table size goes to 17.3G with issuing hdfs dfs
>> -du. Table with 90+ columns
>> and only one column family F. Compression codec is snappy.
>>
>> Thanks,
>> Sun.
>>
>> ________________________________
>> ________________________________
>>
>> CertusNet
>>
>>
>
>
 
 
 
Mime
View raw message