phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject DROP COLUMN timing out
Date Wed, 11 Nov 2015 16:14:22 GMT
Try setting hbase.regionserver.lease.period on the region servers to a
higher value like 6000000.

Also PHOENIX-2357 (currently under review) will make this unnecessary, and
James was spot on about drop column once we do the column name indirection
(PHOENIX-1598 and PHOENIX-1940), though you'd likely still want to put
delete markers on each column cell, but you could do it asynchronously.

Thanks,
James

On Wednesday, November 11, 2015, James Heather <james.heather@mendeley.com
<javascript:_e(%7B%7D,'cvml','james.heather@mendeley.com');>> wrote:

> I don't know what the answer is to your question, but I have hit this
> before.
>
> It seems that adding a column is a lazy operation, and results in changing
> just the metadata, so it returns almost immediately; but dropping a column
> is not. In fact, if you add a column and then immediately drop it, it takes
> ages to do the drop, presumably because Phoenix has to check each row to
> see if there's anything it needs to remove.
>
> I don't know if it would be possible to implement a lazy drop, so that the
> data isn't really removed from the row until the row is accessed. Obviously
> some care would be needed if a column was added with the same name before
> the previous one had been completely removed.
>
> I suspect that this will be much improved if the Phoenix crew manage to
> implement the level of indirection that @JT mentioned for column names.
> This would mean that the columns in HBase would have uniquely generated
> names, and the Phoenix names would be used to map to these HBase names.
> Lazy dropping would be easier in that world, because the column couldn't
> ever be accessed after it had been dropped, and any write to the column
> could be set to remove any data from columns that no longer exist.
>
> James
>
> On 11/11/15 15:08, Lukáš Lalinský wrote:
>
>> When running "ALTER TABLE xxx DROP COLUMN yyy" on a table with about
>> 6M rows (which I considered small enough), it's always timing out and
>> I can't see how to get it execute at least once successfully.
>>
>> I was getting some internal Phoenix timeouts, but after setting the
>> following properties, it changed:
>>
>> hbase.client.scanner.timeout.period=6000000
>> phoenix.query.timeoutMs=6000000
>> hbase.rpc.timeout=6000000
>>
>> Now it fails with errors like this:
>>
>> Wed Nov 11 13:44:25 UTC 2015,
>> RpcRetryingCaller{globalStartTime=1447246894248, pause=100,
>> retries=35}, java.io.IOException: Call to XXX/XXX:16020 failed on
>> local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException:
>> Call id=1303, waitTime=60001, operationTimeout=60000 expired.
>> Wed Nov 11 13:45:45 UTC 2015,
>> RpcRetryingCaller{globalStartTime=1447246894248, pause=100,
>> retries=35}, java.io.IOException: Call to XXX/XXX:16020 failed on
>> local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException:
>> Call id=1341, waitTime=60001, operationTimeout=60000 expired.
>>
>> at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:147)
>> at
>> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:64)
>> ... 3 more
>> Caused by: java.io.IOException: Call to XXX/XXX:16020 failed on local
>> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call
>> id=1341, waitTime=60001, operationTimeout=60000 expired.
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1232)
>> at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1200)
>> at
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
>> at
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:372)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:199)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
>> at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:369)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:343)
>> at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
>> ... 4 more
>> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call
>> id=1341, waitTime=60001, operationTimeout=60000 expired.
>> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70)
>> at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1174)
>> ... 14 more
>>
>> While it's still running, I see log entries like this on region servers:
>>
>> 2015-11-11 15:00:49,059 WARN
>> [B.defaultRpcServer.handler=9,queue=0,port=16020]
>> coprocessor.UngroupedAggregateRegionObserver: Committing bactch of
>> 1000 mutations for MEDIA
>> 2015-11-11 15:00:49,259 WARN
>> [B.defaultRpcServer.handler=12,queue=0,port=16020]
>> coprocessor.UngroupedAggregateRegionObserver: Committing bactch of
>> 1000 mutations for MEDIA
>> 2015-11-11 15:00:49,537 WARN
>> [B.defaultRpcServer.handler=9,queue=0,port=16020]
>> coprocessor.UngroupedAggregateRegionObserver: Committing bactch of
>> 1000 mutations for MEDIA
>> 2015-11-11 15:00:49,766 WARN
>> [B.defaultRpcServer.handler=12,queue=0,port=16020]
>> coprocessor.UngroupedAggregateRegionObserver: Committing bactch of
>> 1000 mutations for MEDIA
>> 2015-11-11 15:00:49,960 WARN
>> [B.defaultRpcServer.handler=9,queue=0,port=16020]
>> coprocessor.UngroupedAggregateRegionObserver: Committing bactch of
>> 1000 mutations for MEDIA
>> 2015-11-11 15:00:50,212 WARN
>> [B.defaultRpcServer.handler=12,queue=0,port=16020]
>> coprocessor.UngroupedAggregateRegionObserver: Committing bactch of
>> 1000 mutations for MEDIA
>>
>> Any ideas how to solve this? I'd be even fine with having just a way
>> to remove the column from the Phoenix metadata and keep the values in
>> HBase, but I don't see how to do it except for running DROP COLUMN and
>> waiting for it to time out.
>>
>> Lukas
>>
>
>

Mime
View raw message