phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Heather <james.heat...@mendeley.com>
Subject Re: DROP COLUMN timing out
Date Wed, 11 Nov 2015 15:14:26 GMT
I don't know what the answer is to your question, but I have hit this 
before.

It seems that adding a column is a lazy operation, and results in 
changing just the metadata, so it returns almost immediately; but 
dropping a column is not. In fact, if you add a column and then 
immediately drop it, it takes ages to do the drop, presumably because 
Phoenix has to check each row to see if there's anything it needs to remove.

I don't know if it would be possible to implement a lazy drop, so that 
the data isn't really removed from the row until the row is accessed. 
Obviously some care would be needed if a column was added with the same 
name before the previous one had been completely removed.

I suspect that this will be much improved if the Phoenix crew manage to 
implement the level of indirection that @JT mentioned for column names. 
This would mean that the columns in HBase would have uniquely generated 
names, and the Phoenix names would be used to map to these HBase names. 
Lazy dropping would be easier in that world, because the column couldn't 
ever be accessed after it had been dropped, and any write to the column 
could be set to remove any data from columns that no longer exist.

James

On 11/11/15 15:08, Lukáš Lalinský wrote:
> When running "ALTER TABLE xxx DROP COLUMN yyy" on a table with about
> 6M rows (which I considered small enough), it's always timing out and
> I can't see how to get it execute at least once successfully.
>
> I was getting some internal Phoenix timeouts, but after setting the
> following properties, it changed:
>
> hbase.client.scanner.timeout.period=6000000
> phoenix.query.timeoutMs=6000000
> hbase.rpc.timeout=6000000
>
> Now it fails with errors like this:
>
> Wed Nov 11 13:44:25 UTC 2015,
> RpcRetryingCaller{globalStartTime=1447246894248, pause=100,
> retries=35}, java.io.IOException: Call to XXX/XXX:16020 failed on
> local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException:
> Call id=1303, waitTime=60001, operationTimeout=60000 expired.
> Wed Nov 11 13:45:45 UTC 2015,
> RpcRetryingCaller{globalStartTime=1447246894248, pause=100,
> retries=35}, java.io.IOException: Call to XXX/XXX:16020 failed on
> local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException:
> Call id=1341, waitTime=60001, operationTimeout=60000 expired.
>
> at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:147)
> at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:64)
> ... 3 more
> Caused by: java.io.IOException: Call to XXX/XXX:16020 failed on local
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call
> id=1341, waitTime=60001, operationTimeout=60000 expired.
> at org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1232)
> at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1200)
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
> at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651)
> at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:372)
> at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:199)
> at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
> at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
> at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:369)
> at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:343)
> at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
> ... 4 more
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call
> id=1341, waitTime=60001, operationTimeout=60000 expired.
> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70)
> at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1174)
> ... 14 more
>
> While it's still running, I see log entries like this on region servers:
>
> 2015-11-11 15:00:49,059 WARN
> [B.defaultRpcServer.handler=9,queue=0,port=16020]
> coprocessor.UngroupedAggregateRegionObserver: Committing bactch of
> 1000 mutations for MEDIA
> 2015-11-11 15:00:49,259 WARN
> [B.defaultRpcServer.handler=12,queue=0,port=16020]
> coprocessor.UngroupedAggregateRegionObserver: Committing bactch of
> 1000 mutations for MEDIA
> 2015-11-11 15:00:49,537 WARN
> [B.defaultRpcServer.handler=9,queue=0,port=16020]
> coprocessor.UngroupedAggregateRegionObserver: Committing bactch of
> 1000 mutations for MEDIA
> 2015-11-11 15:00:49,766 WARN
> [B.defaultRpcServer.handler=12,queue=0,port=16020]
> coprocessor.UngroupedAggregateRegionObserver: Committing bactch of
> 1000 mutations for MEDIA
> 2015-11-11 15:00:49,960 WARN
> [B.defaultRpcServer.handler=9,queue=0,port=16020]
> coprocessor.UngroupedAggregateRegionObserver: Committing bactch of
> 1000 mutations for MEDIA
> 2015-11-11 15:00:50,212 WARN
> [B.defaultRpcServer.handler=12,queue=0,port=16020]
> coprocessor.UngroupedAggregateRegionObserver: Committing bactch of
> 1000 mutations for MEDIA
>
> Any ideas how to solve this? I'd be even fine with having just a way
> to remove the column from the Phoenix metadata and keep the values in
> HBase, but I don't see how to do it except for running DROP COLUMN and
> waiting for it to time out.
>
> Lukas


Mime
View raw message