phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandeep Nemuri <nhsande...@gmail.com>
Subject Re: RegionServers shutdown randomly
Date Sun, 09 Aug 2015 07:56:00 GMT
As per your configs open files is set to default. You will have to increase
the number of open files.

Some configs:
-  ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 103227
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
*open files                      (-n) 1024*
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 103227
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

ᐧ

On Sun, Aug 9, 2015 at 1:23 PM, Sandeep Nemuri <nhsandeep6@gmail.com> wrote:

> Is HDFS operating normally ?
> ᐧ
>
> On Sun, Aug 9, 2015 at 9:12 AM, anil gupta <anilgupta84@gmail.com> wrote:
>
>> 2015-08-06 14:11:13,640 ERROR
>> [PriorityRpcServer.handler=19,queue=1,port=16000] master.MasterRpcServices:
>> Region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 reported a
>> fatal error:
>> ABORTING region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905:
>> Unrecoverable exception while closing region
>> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
>> still finishing close
>> Cause:
>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>> bad. Aborting...
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>
>> Do you have some bad disk in your cluster? Above error looks like some
>> HDFS problem. How stable is your hdfs?
>>
>> On Fri, Aug 7, 2015 at 2:06 AM, Adrià Vilà <avila@datknosys.com> wrote:
>>
>>> I do other workloads but not while this error happened because I was
>>> testing it on purpouse. I've noticed that the RegionServers do fail
>>> randomly.
>>>
>>> NameNode heap: 4GB
>>> DataNode heap: 1GB
>>> NameNode threads: 100
>>>
>>> HDFS-site:
>>>     <property>
>>>       <name>dfs.blocksize</name>
>>>       <value>134217728</value>
>>>     </property>
>>>    <property>
>>>       <name>dfs.datanode.du.reserved</name>
>>>       <value>1073741824</value>
>>>     </property>
>>>
>>>
>>> HBase-site:
>>>     <property>
>>>       <name>hbase.client.keyvalue.maxsize</name>
>>>       <value>1048576</value>
>>>     </property>
>>>     <property>
>>>       <name>hbase.hregion.max.filesize</name>
>>>       <value>10737418240</value>
>>>     </property>
>>>     <property>
>>>       <name>hbase.hregion.memstore.block.multiplier</name>
>>>       <value>4</value>
>>>     </property>
>>>     <property>
>>>       <name>hbase.hregion.memstore.flush.size</name>
>>>       <value>134217728</value>
>>>     </property>
>>>     <property>
>>>       <name>hbase.regionserver.wal.codec</name>
>>>
>>> <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
>>>     </property>
>>>
>>> Next I attach as many logs as I could find!
>>>
>>> ---------------
>>> NameNode log:
>>> ---------------
>>> 2015-08-06 14:11:10,079 INFO  hdfs.StateChange
>>> (FSNamesystem.java:saveAllocatedBlock(3573)) - BLOCK* allocate
>>> blk_1073766164_25847{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>>> primaryNodeIndex=-1,
>>> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
>>> ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW]]}
>>> for
>>> /apps/hbase/data/WALs/hdp-w-2.c.dks-hadoop.internal,16020,1438869941783/hdp-w-2.c.dks-hadoop.internal%2C16020%2C1438869941783..meta.1438870270071.meta
>>> 2015-08-06 14:11:10,095 INFO  hdfs.StateChange
>>> (FSNamesystem.java:fsync(3975)) - BLOCK* fsync:
>>> /apps/hbase/data/WALs/hdp-w-2.c.dks-hadoop.internal,16020,1438869941783/hdp-w-2.c.dks-hadoop.internal%2C16020%2C1438869941783..meta.1438870270071.meta
>>> for DFSClient_NONMAPREDUCE_774922977_1
>>> 2015-08-06 14:11:10,104 INFO  hdfs.StateChange
>>> (FSNamesystem.java:saveAllocatedBlock(3573)) - BLOCK* allocate
>>> blk_1073766165_25848{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>>> primaryNodeIndex=-1,
>>> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
>>> ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW]]}
>>> for
>>> /apps/hbase/data/data/hbase/meta/1588230740/.tmp/d8cb7421fb78489c97d7aa7767449acc
>>> 2015-08-06 14:11:10,120 INFO  BlockStateChange
>>> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
>>> blockMap updated: 10.240.164.0:50010 is added to
>>> blk_1073766165_25848{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>>> primaryNodeIndex=-1,
>>> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
>>> ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW]]}
>>> size 0
>>> 2015-08-06 14:11:10,120 INFO  BlockStateChange
>>> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
>>> blockMap updated: 10.240.187.182:50010 is added to
>>> blk_1073766165_25848{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>>> primaryNodeIndex=-1,
>>> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
>>> ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW]]}
>>> size 0
>>> 2015-08-06 14:11:10,122 INFO  hdfs.StateChange
>>> (FSNamesystem.java:completeFile(3493)) - DIR* completeFile:
>>> /apps/hbase/data/data/hbase/meta/1588230740/.tmp/d8cb7421fb78489c97d7aa7767449acc
>>> is closed by DFSClient_NONMAPREDUCE_774922977_1
>>> 2015-08-06 14:11:10,226 INFO  hdfs.StateChange
>>> (FSNamesystem.java:saveAllocatedBlock(3573)) - BLOCK* allocate
>>> blk_1073766166_25849{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>>> primaryNodeIndex=-1,
>>> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
>>> ReplicaUC[[DISK]DS-2fa04842-547c-417c-8aab-91eb4df999de:NORMAL:10.240.200.196:50010|RBW]]}
>>> for
>>> /apps/hbase/data/data/hbase/meta/1588230740/.tmp/e39d202af4494e5e9e7dd4b75a61a0a0
>>> 2015-08-06 14:11:10,421 INFO  BlockStateChange
>>> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
>>> blockMap updated: 10.240.200.196:50010 is added to
>>> blk_1073766166_25849{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>>> primaryNodeIndex=-1,
>>> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
>>> ReplicaUC[[DISK]DS-2fa04842-547c-417c-8aab-91eb4df999de:NORMAL:10.240.200.196:50010|RBW]]}
>>> size 0
>>> 2015-08-06 14:11:10,421 INFO  BlockStateChange
>>> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
>>> blockMap updated: 10.240.187.182:50010 is added to
>>> blk_1073766166_25849{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>>> primaryNodeIndex=-1,
>>> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
>>> ReplicaUC[[DISK]DS-2fa04842-547c-417c-8aab-91eb4df999de:NORMAL:10.240.200.196:50010|RBW]]}
>>> size 0
>>> 2015-08-06 14:11:10,423 INFO  hdfs.StateChange
>>> (FSNamesystem.java:completeFile(3493)) - DIR* completeFile:
>>> /apps/hbase/data/data/hbase/meta/1588230740/.tmp/e39d202af4494e5e9e7dd4b75a61a0a0
>>> is closed by DFSClient_NONMAPREDUCE_774922977_1
>>> 2015-08-06 14:11:13,623 INFO  hdfs.StateChange
>>> (FSNamesystem.java:saveAllocatedBlock(3573)) - BLOCK* allocate
>>> blk_1073766167_25850{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>>> primaryNodeIndex=-1,
>>> replicas=[ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW],
>>> ReplicaUC[[DISK]DS-605c96e7-f5dc-4a63-8f34-ea5865077c4b:NORMAL:10.240.2.235:50010|RBW]]}
>>> for
>>> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617
>>> 2015-08-06 14:11:13,638 INFO  hdfs.StateChange
>>> (FSNamesystem.java:fsync(3975)) - BLOCK* fsync:
>>> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617
>>> for DFSClient_NONMAPREDUCE_722958591_1
>>> 2015-08-06 14:11:13,965 INFO  BlockStateChange
>>> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
>>> blockMap updated: 10.240.2.235:50010 is added to
>>> blk_1073766167_25850{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>>> primaryNodeIndex=-1,
>>> replicas=[ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW],
>>> ReplicaUC[[DISK]DS-605c96e7-f5dc-4a63-8f34-ea5865077c4b:NORMAL:10.240.2.235:50010|RBW]]}
>>> size 90
>>> 2015-08-06 14:11:13,966 INFO  BlockStateChange
>>> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
>>> blockMap updated: 10.240.164.0:50010 is added to
>>> blk_1073766167_25850{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>>> primaryNodeIndex=-1,
>>> replicas=[ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW],
>>> ReplicaUC[[DISK]DS-605c96e7-f5dc-4a63-8f34-ea5865077c4b:NORMAL:10.240.2.235:50010|RBW]]}
>>> size 90
>>> 2015-08-06 14:11:13,968 INFO  hdfs.StateChange
>>> (FSNamesystem.java:completeFile(3493)) - DIR* completeFile:
>>> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617
>>> is closed by DFSClient_NONMAPREDUCE_722958591_1
>>>
>>> ---------------
>>> HBase master DataNode log:
>>> ---------------
>>> 2015-08-06 14:09:17,187 INFO  DataNode.clienttrace
>>> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
>>> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073749044, srvID:
>>> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
>>> 2015-08-06 14:09:17,273 INFO  DataNode.clienttrace
>>> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
>>> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073749049, srvID:
>>> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
>>> 2015-08-06 14:09:17,325 INFO  DataNode.clienttrace
>>> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
>>> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073749051, srvID:
>>> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
>>> 2015-08-06 14:09:46,810 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842 src: /
>>> 10.240.200.196:34789 dest: /10.240.200.196:50010
>>> 2015-08-06 14:09:46,843 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:34789,
>>> dest: /10.240.200.196:50010, bytes: 6127, op: HDFS_WRITE, cliID:
>>> DFSClient_attempt_14388456554803_0001_r_000000_0_1296198547_30, offset: 0,
>>> srvID: 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842, duration:
>>> 29150527
>>> 2015-08-06 14:09:46,843 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842,
>>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>>> 2015-08-06 14:09:47,193 INFO  DataNode.clienttrace
>>> (DataXceiver.java:requestShortCircuitShm(468)) - cliID:
>>> DFSClient_NONMAPREDUCE_22636141_1, src: 127.0.0.1, dest: 127.0.0.1, op:
>>> REQUEST_SHORT_CIRCUIT_SHM, shmId: a70bde6b6e67f4a5394e209320b451f3, srvID:
>>> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
>>> 2015-08-06 14:09:47,211 INFO  DataNode.clienttrace
>>> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
>>> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073766159, srvID:
>>> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
>>> 2015-08-06 14:09:52,887 INFO  datanode.ShortCircuitRegistry
>>> (ShortCircuitRegistry.java:processBlockInvalidation(251)) - Block
>>> 1073766159_BP-369072949-10.240.200.196-1437998325049 has been invalidated.
>>> Marking short-circuit slots as invalid: Slot(slotIdx=0,
>>> shm=RegisteredShm(a70bde6b6e67f4a5394e209320b451f3))
>>> 2015-08-06 14:09:52,887 INFO  impl.FsDatasetAsyncDiskService
>>> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
>>> blk_1073766159_25842 file
>>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766159
>>> for deletion
>>> 2015-08-06 14:09:52,887 INFO  impl.FsDatasetAsyncDiskService
>>> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
>>> BP-369072949-10.240.200.196-1437998325049 blk_1073766159_25842 file
>>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766159
>>> 2015-08-06 14:09:53,562 ERROR datanode.DataNode
>>> (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver
>>> error processing unknown operation  src: /127.0.0.1:35735 dst: /
>>> 127.0.0.1:50010
>>> java.io.EOFException
>>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 2015-08-06 14:10:53,649 ERROR datanode.DataNode
>>> (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver
>>> error processing unknown operation  src: /127.0.0.1:35826 dst: /
>>> 127.0.0.1:50010
>>> java.io.EOFException
>>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 2015-08-06 14:11:02,088 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:46835,
>>> dest: /10.240.200.196:50010, bytes: 434, op: HDFS_WRITE, cliID:
>>> DFSClient_NONMAPREDUCE_-522245611_1, offset: 0, srvID:
>>> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766157_25840, duration:
>>> 131662299497
>>> 2015-08-06 14:11:02,088 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766157_25840,
>>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>>> 2015-08-06 14:11:03,018 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843 src: /
>>> 10.240.164.0:47039 dest: /10.240.200.196:50010
>>> 2015-08-06 14:11:03,042 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:47039,
>>> dest: /10.240.200.196:50010, bytes: 30845, op: HDFS_WRITE, cliID:
>>> DFSClient_NONMAPREDUCE_369501114_375, offset: 0, srvID:
>>> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843, duration:
>>> 13150343
>>> 2015-08-06 14:11:03,042 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843,
>>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>>> 2015-08-06 14:11:03,834 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846 src: /
>>> 10.240.200.196:34941 dest: /10.240.200.196:50010
>>> 2015-08-06 14:11:03,917 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:34941,
>>> dest: /10.240.200.196:50010, bytes: 47461, op: HDFS_WRITE, cliID:
>>> DFSClient_NONMAPREDUCE_-1443736010_402, offset: 0, srvID:
>>> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846, duration:
>>> 76492287
>>> 2015-08-06 14:11:03,917 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846,
>>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>>> 2015-08-06 14:11:04,887 INFO  datanode.ShortCircuitRegistry
>>> (ShortCircuitRegistry.java:processBlockInvalidation(251)) - Block
>>> 1073766154_BP-369072949-10.240.200.196-1437998325049 has been invalidated.
>>> Marking short-circuit slots as invalid: Slot(slotIdx=2,
>>> shm=RegisteredShm(15d8ae8eb51e4590b7e51f8c723d60eb))
>>> 2015-08-06 14:11:04,887 INFO  impl.FsDatasetAsyncDiskService
>>> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
>>> blk_1073766154_25837 file
>>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766154
>>> for deletion
>>> 2015-08-06 14:11:04,894 INFO  datanode.ShortCircuitRegistry
>>> (ShortCircuitRegistry.java:processBlockInvalidation(251)) - Block
>>> 1073766155_BP-369072949-10.240.200.196-1437998325049 has been invalidated.
>>> Marking short-circuit slots as invalid: Slot(slotIdx=1,
>>> shm=RegisteredShm(15d8ae8eb51e4590b7e51f8c723d60eb))
>>> 2015-08-06 14:11:04,894 INFO  impl.FsDatasetAsyncDiskService
>>> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
>>> blk_1073766155_25838 file
>>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766155
>>> for deletion
>>> 2015-08-06 14:11:04,894 INFO  impl.FsDatasetAsyncDiskService
>>> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
>>> blk_1073766156_25839 file
>>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766156
>>> for deletion
>>> 2015-08-06 14:11:04,895 INFO  impl.FsDatasetAsyncDiskService
>>> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
>>> BP-369072949-10.240.200.196-1437998325049 blk_1073766154_25837 file
>>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766154
>>> 2015-08-06 14:11:07,887 INFO  impl.FsDatasetAsyncDiskService
>>> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
>>> blk_1073766157_25840 file
>>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766157
>>> for deletion
>>> 2015-08-06 14:11:07,889 INFO  impl.FsDatasetAsyncDiskService
>>> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
>>> BP-369072949-10.240.200.196-1437998325049 blk_1073766157_25840 file
>>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766157
>>> 2015-08-06 14:11:10,241 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766166_25849 src: /
>>> 10.240.187.182:60725 dest: /10.240.200.196:50010
>>> 2015-08-06 14:11:10,419 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.187.182:60725,
>>> dest: /10.240.200.196:50010, bytes: 1212340, op: HDFS_WRITE, cliID:
>>> DFSClient_NONMAPREDUCE_774922977_1, offset: 0, srvID:
>>> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766166_25849, duration:
>>> 172298145
>>> 2015-08-06 14:11:10,419 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766166_25849,
>>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>>> 2015-08-06 14:11:53,594 ERROR datanode.DataNode
>>> (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver
>>> error processing unknown operation  src: /127.0.0.1:35992 dst: /
>>> 127.0.0.1:50010
>>> java.io.EOFException
>>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 2015-08-06 14:11:57,109 INFO  DataNode.clienttrace
>>> (DataXceiver.java:releaseShortCircuitFds(407)) - src: 127.0.0.1, dest:
>>> 127.0.0.1, op: RELEASE_SHORT_CIRCUIT_FDS, shmId:
>>> 0ee5b1d24e1d07dc72f681a4fbc06040, slotIdx: 0, srvID:
>>> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
>>>
>>> ---------------
>>> HBase worker DataNode log:
>>> ---------------
>>> 2015-08-06 14:09:41,378 INFO  datanode.VolumeScanner
>>> (VolumeScanner.java:markSuspectBlock(665)) -
>>> VolumeScanner(/hadoop/hdfs/data, DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01):
>>> Not scheduling suspect block
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073749047_8295 for
>>> rescanning, because we rescanned it recently.
>>> 2015-08-06 14:09:46,812 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842 src: /
>>> 10.240.200.196:48711 dest: /10.240.164.0:50010
>>> 2015-08-06 14:09:46,842 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:48711,
>>> dest: /10.240.164.0:50010, bytes: 6127, op: HDFS_WRITE, cliID:
>>> DFSClient_attempt_14388456554803_0001_r_000000_0_1296198547_30, offset: 0,
>>> srvID: a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842, duration:
>>> 28706244
>>> 2015-08-06 14:09:46,842 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842,
>>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>>> 2015-08-06 14:09:47,033 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:55267,
>>> dest: /10.240.164.0:50010, bytes: 426792, op: HDFS_WRITE, cliID:
>>> DFSClient_NONMAPREDUCE_-522245611_1, offset: 0, srvID:
>>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766158_25841, duration:
>>> 56549200180
>>> 2015-08-06 14:09:47,033 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766158_25841,
>>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>>> 2015-08-06 14:09:50,052 INFO  impl.FsDatasetAsyncDiskService
>>> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
>>> blk_1073766159_25842 file
>>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766159
>>> for deletion
>>> 2015-08-06 14:09:50,053 INFO  impl.FsDatasetAsyncDiskService
>>> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
>>> BP-369072949-10.240.200.196-1437998325049 blk_1073766159_25842 file
>>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766159
>>> 2015-08-06 14:09:52,456 ERROR datanode.DataNode
>>> (DataXceiver.java:run(278)) -
>>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
>>> operation  src: /127.0.0.1:44663 dst: /127.0.0.1:50010
>>> java.io.EOFException
>>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 2015-08-06 14:10:52,401 ERROR datanode.DataNode
>>> (DataXceiver.java:run(278)) -
>>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
>>> operation  src: /127.0.0.1:44709 dst: /127.0.0.1:50010
>>> java.io.EOFException
>>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 2015-08-06 14:11:02,052 INFO  impl.FsDatasetAsyncDiskService
>>> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
>>> blk_1073766158_25841 file
>>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766158
>>> for deletion
>>> 2015-08-06 14:11:02,053 INFO  impl.FsDatasetAsyncDiskService
>>> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
>>> BP-369072949-10.240.200.196-1437998325049 blk_1073766158_25841 file
>>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766158
>>> 2015-08-06 14:11:02,089 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:55265,
>>> dest: /10.240.164.0:50010, bytes: 434, op: HDFS_WRITE, cliID:
>>> DFSClient_NONMAPREDUCE_-522245611_1, offset: 0, srvID:
>>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766157_25840, duration:
>>> 131663017075
>>> 2015-08-06 14:11:02,089 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766157_25840,
>>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>>> 2015-08-06 14:11:02,992 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843 src: /
>>> 10.240.164.0:55469 dest: /10.240.164.0:50010
>>> 2015-08-06 14:11:03,043 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:55469,
>>> dest: /10.240.164.0:50010, bytes: 30845, op: HDFS_WRITE, cliID:
>>> DFSClient_NONMAPREDUCE_369501114_375, offset: 0, srvID:
>>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843, duration:
>>> 23219007
>>> 2015-08-06 14:11:03,043 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843,
>>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>>> 2015-08-06 14:11:03,288 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766161_25844 src: /
>>> 10.240.2.235:56643 dest: /10.240.164.0:50010
>>> 2015-08-06 14:11:03,319 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.2.235:56643,
>>> dest: /10.240.164.0:50010, bytes: 9446, op: HDFS_WRITE, cliID:
>>> DFSClient_NONMAPREDUCE_920026358_372, offset: 0, srvID:
>>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766161_25844, duration:
>>> 29200871
>>> 2015-08-06 14:11:03,319 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766161_25844,
>>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>>> 2015-08-06 14:11:03,475 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766162_25845 src: /
>>> 10.240.187.182:41291 dest: /10.240.164.0:50010
>>> 2015-08-06 14:11:03,501 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.187.182:41291,
>>> dest: /10.240.164.0:50010, bytes: 34213, op: HDFS_WRITE, cliID:
>>> DFSClient_NONMAPREDUCE_-643601862_395, offset: 0, srvID:
>>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766162_25845, duration:
>>> 24992990
>>> 2015-08-06 14:11:03,501 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766162_25845,
>>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>>> 2015-08-06 14:11:03,837 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846 src: /
>>> 10.240.200.196:48863 dest: /10.240.164.0:50010
>>> 2015-08-06 14:11:03,916 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:48863,
>>> dest: /10.240.164.0:50010, bytes: 47461, op: HDFS_WRITE, cliID:
>>> DFSClient_NONMAPREDUCE_-1443736010_402, offset: 0, srvID:
>>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846, duration:
>>> 78295982
>>> 2015-08-06 14:11:03,916 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846,
>>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>>> 2015-08-06 14:11:05,052 INFO  impl.FsDatasetAsyncDiskService
>>> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
>>> blk_1073766157_25840 file
>>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766157
>>> for deletion
>>> 2015-08-06 14:11:05,053 INFO  impl.FsDatasetAsyncDiskService
>>> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
>>> BP-369072949-10.240.200.196-1437998325049 blk_1073766157_25840 file
>>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766157
>>> 2015-08-06 14:11:10,083 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847 src: /
>>> 10.240.187.182:41294 dest: /10.240.164.0:50010
>>> 2015-08-06 14:11:10,111 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766165_25848 src: /
>>> 10.240.187.182:41296 dest: /10.240.164.0:50010
>>> 2015-08-06 14:11:10,119 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.187.182:41296,
>>> dest: /10.240.164.0:50010, bytes: 122645, op: HDFS_WRITE, cliID:
>>> DFSClient_NONMAPREDUCE_774922977_1, offset: 0, srvID:
>>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766165_25848, duration:
>>> 7281253
>>> 2015-08-06 14:11:10,119 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766165_25848,
>>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>>> 2015-08-06 14:11:13,627 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766167_25850 src: /
>>> 10.240.164.0:55473 dest: /10.240.164.0:50010
>>> 2015-08-06 14:11:13,965 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:55473,
>>> dest: /10.240.164.0:50010, bytes: 98, op: HDFS_WRITE, cliID:
>>> DFSClient_NONMAPREDUCE_722958591_1, offset: 0, srvID:
>>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766167_25850, duration:
>>> 332393414
>>> 2015-08-06 14:11:13,965 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766167_25850,
>>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>>> 2015-08-06 14:11:52,392 ERROR datanode.DataNode
>>> (DataXceiver.java:run(278)) -
>>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
>>> operation  src: /127.0.0.1:44757 dst: /127.0.0.1:50010
>>> java.io.EOFException
>>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 2015-08-06 14:12:10,514 INFO  datanode.DataNode
>>> (BlockReceiver.java:receiveBlock(888)) - Exception for
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847
>>> java.io.IOException: Premature EOF from inputStream
>>>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 2015-08-06 14:12:10,515 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1312)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847,
>>> type=LAST_IN_PIPELINE, downstreams=0:[]: Thread is interrupted.
>>> 2015-08-06 14:12:10,515 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847,
>>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>>> 2015-08-06 14:12:10,515 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(837)) - opWriteBlock
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847 received
>>> exception java.io.IOException: Premature EOF from inputStream
>>> 2015-08-06 14:12:10,515 ERROR datanode.DataNode
>>> (DataXceiver.java:run(278)) -
>>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing
>>> WRITE_BLOCK operation  src: /10.240.187.182:41294 dst: /
>>> 10.240.164.0:50010
>>> java.io.IOException: Premature EOF from inputStream
>>>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 2015-08-06 14:12:10,519 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847 src: /
>>> 10.240.187.182:41319 dest: /10.240.164.0:50010
>>> 2015-08-06 14:12:10,520 INFO  impl.FsDatasetImpl
>>> (FsDatasetImpl.java:recoverRbw(1322)) - Recover RBW replica
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847
>>> 2015-08-06 14:12:10,520 INFO  impl.FsDatasetImpl
>>> (FsDatasetImpl.java:recoverRbw(1333)) - Recovering ReplicaBeingWritten,
>>> blk_1073766164_25847, RBW
>>>   getNumBytes()     = 545
>>>   getBytesOnDisk()  = 545
>>>   getVisibleLength()= 545
>>>   getVolume()       = /hadoop/hdfs/data/current
>>>   getBlockFile()    =
>>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/rbw/blk_1073766164
>>>   bytesAcked=545
>>>   bytesOnDisk=545
>>> 2015-08-06 14:12:52,419 ERROR datanode.DataNode
>>> (DataXceiver.java:run(278)) -
>>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
>>> operation  src: /127.0.0.1:44786 dst: /127.0.0.1:50010
>>> java.io.EOFException
>>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 2015-08-06 14:12:56,277 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766168_25852 src: /
>>> 10.240.2.235:56693 dest: /10.240.164.0:50010
>>> 2015-08-06 14:12:56,843 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.2.235:56693,
>>> dest: /10.240.164.0:50010, bytes: 98, op: HDFS_WRITE, cliID:
>>> DFSClient_NONMAPREDUCE_241093651_1, offset: 0, srvID:
>>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766168_25852, duration:
>>> 564768014
>>> 2015-08-06 14:12:56,843 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766168_25852,
>>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>>> 2015-08-06 14:13:10,579 INFO  datanode.DataNode
>>> (BlockReceiver.java:receiveBlock(888)) - Exception for
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25851
>>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/10.240.164.0:50010
>>> remote=/10.240.187.182:41319]
>>>         at
>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>>>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>>>         at
>>> java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>>>         at java.io.DataInputStream.read(DataInputStream.java:149)
>>>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 2015-08-06 14:13:10,580 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1312)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25851,
>>> type=LAST_IN_PIPELINE, downstreams=0:[]: Thread is interrupted.
>>> 2015-08-06 14:13:10,580 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25851,
>>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>>> 2015-08-06 14:13:10,580 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(837)) - opWriteBlock
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25851 received
>>> exception java.net.SocketTimeoutException: 60000 millis timeout while
>>> waiting for channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/10.240.164.0:50010
>>> remote=/10.240.187.182:41319]
>>> 2015-08-06 14:13:10,580 ERROR datanode.DataNode
>>> (DataXceiver.java:run(278)) -
>>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing
>>> WRITE_BLOCK operation  src: /10.240.187.182:41319 dst: /
>>> 10.240.164.0:50010
>>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/10.240.164.0:50010
>>> remote=/10.240.187.182:41319]
>>>         at
>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>>>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>>>         at
>>> java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>>>         at java.io.DataInputStream.read(DataInputStream.java:149)
>>>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 2015-08-06 14:13:52,432 ERROR datanode.DataNode
>>> (DataXceiver.java:run(278)) -
>>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
>>> operation  src: /127.0.0.1:44818 dst: /127.0.0.1:50010
>>> java.io.EOFException
>>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 2015-08-06 14:14:52,399 ERROR datanode.DataNode
>>> (DataXceiver.java:run(278)) -
>>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
>>> operation  src: /127.0.0.1:44848 dst: /127.0.0.1:50010
>>> java.io.EOFException
>>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 2015-08-06 14:15:14,227 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766172_25856 src: /
>>> 10.240.187.182:41397 dest: /10.240.164.0:50010
>>> 2015-08-06 14:15:14,240 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.187.182:41397,
>>> dest: /10.240.164.0:50010, bytes: 92789, op: HDFS_WRITE, cliID:
>>> DFSClient_NONMAPREDUCE_774922977_1, offset: 0, srvID:
>>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073766172_25856, duration:
>>> 12340123
>>>
>>>
>>> Thank you for helping :)
>>> Adri
>>>
>>>
>>> *Desde*: "Vladimir Rodionov" <vladrodionov@gmail.com>
>>> *Enviado*: jueves, 06 de agosto de 2015 20:07
>>> *Para*: user@phoenix.apache.org, avila@datknosys.com
>>> *Asunto*: Re: RegionServers shutdown randomly
>>>
>>> What do DN and NN log say? Do you run any other workload on the same
>>> cluster? What is your cluster configuration?
>>> Max memory per RS, DN and other collocated processes?
>>>
>>> -Vlad
>>>
>>> On Thu, Aug 6, 2015 at 8:42 AM, Adrià Vilà <avila@datknosys.com> wrote:
>>>>
>>>> Hello,
>>>>
>>>> HBase RegionServers fail once in a while:
>>>>
>>>> - it can be any regionserver, not always de same
>>>>
>>>> - it can happen when all the cluster is idle (at least not executing
>>>> any human launched task)
>>>>
>>>> - it can happen at any time, not always the same
>>>>
>>>>
>>>> The cluster versions:
>>>>
>>>> - Phoenix 4.4 (or 4.5)
>>>>
>>>> - HBase 1.1.1
>>>>
>>>> - Hadoop/HDFS 2.7.1
>>>>
>>>> - Zookeeper 3.4.6
>>>>
>>>>
>>>>
>>>> Some configs:
>>>> -  ulimit -a
>>>> core file size          (blocks, -c) 0
>>>> data seg size           (kbytes, -d) unlimited
>>>> scheduling priority             (-e) 0
>>>> file size               (blocks, -f) unlimited
>>>> pending signals                 (-i) 103227
>>>> max locked memory       (kbytes, -l) 64
>>>> max memory size         (kbytes, -m) unlimited
>>>> open files                      (-n) 1024
>>>> pipe size            (512 bytes, -p) 8
>>>> POSIX message queues     (bytes, -q) 819200
>>>> real-time priority              (-r) 0
>>>> stack size              (kbytes, -s) 10240
>>>> cpu time               (seconds, -t) unlimited
>>>> max user processes              (-u) 103227
>>>> virtual memory          (kbytes, -v) unlimited
>>>> file locks                      (-x) unlimited
>>>> - have increased default timeouts for: hbase rpc, zookeeper session,
>>>> dks socket, regionserver lease and client scanner.
>>>>
>>>> Next you can find the logs for the master, the regionserver that failed
>>>> first, another failed and the datanode log for master and worker.
>>>>
>>>> The timing was aproximately:
>>>>
>>>> 14:05 start hbase
>>>> 14.11 w-0 down
>>>> 14.14 w-1 down
>>>> 14.15 stop hbase
>>>>
>>>>
>>>> -------------
>>>> hbase master log (m)
>>>> -------------
>>>> 2015-08-06 14:11:13,640 ERROR
>>>> [PriorityRpcServer.handler=19,queue=1,port=16000] master.MasterRpcServices:
>>>> Region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 reported a
>>>> fatal error:
>>>> ABORTING region server
>>>> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable exception
>>>> while closing region
>>>> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
>>>> still finishing close
>>>> Cause:
>>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>>> bad. Aborting...
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>>
>>>> --------------
>>>> hbase regionserver log (w-0)
>>>> --------------
>>>> 2015-08-06 14:11:13,611 INFO
>>>>  [PriorityRpcServer.handler=0,queue=0,port=16020]
>>>> regionserver.RSRpcServices: Close 888f017eb1c0557fbe7079b50626c891, moving
>>>> to hdp-m.c.dks-hadoop.internal,16020,1438869954062
>>>> 2015-08-06 14:11:13,615 INFO
>>>>  [StoreCloserThread-SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.-1]
>>>> regionserver.HStore: Closed 0
>>>> 2015-08-06 14:11:13,616 FATAL
>>>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.append-pool1-t1]
>>>> wal.FSHLog: Could not append. Requesting close of wal
>>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>>> bad. Aborting...
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>> 2015-08-06 14:11:13,617 ERROR [sync.4] wal.FSHLog: Error syncing,
>>>> request close of wal
>>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>>> bad. Aborting...
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>> 2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0]
>>>> regionserver.HRegionServer: ABORTING region server
>>>> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable exception
>>>> while closing region
>>>> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
>>>> still finishing close
>>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>>> bad. Aborting...
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>> 2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0]
>>>> regionserver.HRegionServer: RegionServer abort: loaded coprocessors are:
>>>> [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl,
>>>> org.apache.hadoop.hbase.regionserver.LocalIndexSplitter,
>>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver,
>>>> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver,
>>>> org.apache.phoenix.coprocessor.ScanRegionObserver,
>>>> org.apache.phoenix.hbase.index.Indexer,
>>>> org.apache.phoenix.coprocessor.SequenceRegionObserver,
>>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl]
>>>> 2015-08-06 14:11:13,627 INFO  [RS_CLOSE_REGION-hdp-w-0:16020-0]
>>>> regionserver.HRegionServer: Dump of metrics as JSON on abort: {
>>>>   "beans" : [ {
>>>>     "name" : "java.lang:type=Memory",
>>>>     "modelerType" : "sun.management.MemoryImpl",
>>>>     "Verbose" : true,
>>>>     "HeapMemoryUsage" : {
>>>>       "committed" : 2104754176,
>>>>       "init" : 2147483648,
>>>>       "max" : 2104754176,
>>>>       "used" : 262288688
>>>>     },
>>>>     "ObjectPendingFinalizationCount" : 0,
>>>>     "NonHeapMemoryUsage" : {
>>>>       "committed" : 137035776,
>>>>       "init" : 136773632,
>>>>       "max" : 184549376,
>>>>       "used" : 49168288
>>>>     },
>>>>     "ObjectName" : "java.lang:type=Memory"
>>>>   } ],
>>>>   "beans" : [ {
>>>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC",
>>>>     "modelerType" : "RegionServer,sub=IPC",
>>>>     "tag.Context" : "regionserver",
>>>>     "tag.Hostname" : "hdp-w-0"
>>>>   } ],
>>>>   "beans" : [ {
>>>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication",
>>>>     "modelerType" : "RegionServer,sub=Replication",
>>>>     "tag.Context" : "regionserver",
>>>>     "tag.Hostname" : "hdp-w-0"
>>>>   } ],
>>>>   "beans" : [ {
>>>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server",
>>>>     "modelerType" : "RegionServer,sub=Server",
>>>>     "tag.Context" : "regionserver",
>>>>     "tag.Hostname" : "hdp-w-0"
>>>>   } ]
>>>> }
>>>> 2015-08-06 14:11:13,640 ERROR [sync.0] wal.FSHLog: Error syncing,
>>>> request close of wal
>>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>>> bad. Aborting...
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>> 2015-08-06 14:11:13,640 WARN
>>>>  [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>>>> wal.FSHLog: Failed last sync but no outstanding unsync edits so falling
>>>> through to close; java.io.IOException: All datanodes
>>>> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>>> are bad. Aborting...
>>>> 2015-08-06 14:11:13,641 ERROR
>>>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>>>> wal.ProtobufLogWriter: Got IOException while writing trailer
>>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>>> bad. Aborting...
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>> 2015-08-06 14:11:13,641 WARN
>>>>  [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>>>> wal.FSHLog: Riding over failed WAL close of
>>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576,
>>>> cause="All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>>> are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS
>>>> SYNCED SO SHOULD BE OK
>>>> 2015-08-06 14:11:13,642 INFO
>>>>  [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>>>> wal.FSHLog: Rolled WAL
>>>> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576
>>>> with entries=101, filesize=30.38 KB; new WAL
>>>> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617
>>>> 2015-08-06 14:11:13,643 INFO  [RS_CLOSE_REGION-hdp-w-0:16020-0]
>>>> regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing
>>>> region
>>>> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
>>>> still finishing close
>>>> 2015-08-06 14:11:13,643 INFO
>>>>  [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>>>> wal.FSHLog: Archiving
>>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576
>>>> to
>>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576
>>>> 2015-08-06 14:11:13,643 ERROR [RS_CLOSE_REGION-hdp-w-0:16020-0]
>>>> executor.EventHandler: Caught throwable while processing event
>>>> M_RS_CLOSE_REGION
>>>> java.lang.RuntimeException: java.io.IOException: All datanodes
>>>> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>>> are bad. Aborting...
>>>>         at
>>>> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152)
>>>>         at
>>>> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>>> bad. Aborting...
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>>
>>>> ------------
>>>> hbase regionserver log (w-1)
>>>> ------------
>>>> 2015-08-06 14:11:14,267 INFO  [main-EventThread]
>>>> replication.ReplicationTrackerZKImpl:
>>>> /hbase-unsecure/rs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 znode
>>>> expired, triggering replicatorRemoved event
>>>> 2015-08-06 14:12:08,203 INFO  [ReplicationExecutor-0]
>>>> replication.ReplicationQueuesZKImpl: Atomically moving
>>>> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905's wals to my queue
>>>> 2015-08-06 14:12:56,252 INFO
>>>>  [PriorityRpcServer.handler=5,queue=1,port=16020]
>>>> regionserver.RSRpcServices: Close 918ed7c6568e7500fb434f4268c5bbc5, moving
>>>> to hdp-m.c.dks-hadoop.internal,16020,1438869954062
>>>> 2015-08-06 14:12:56,260 INFO
>>>>  [StoreCloserThread-SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.-1]
>>>> regionserver.HStore: Closed 0
>>>> 2015-08-06 14:12:56,261 FATAL
>>>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.append-pool1-t1]
>>>> wal.FSHLog: Could not append. Requesting close of wal
>>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>>> bad. Aborting...
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>> 2015-08-06 14:12:56,261 ERROR [sync.3] wal.FSHLog: Error syncing,
>>>> request close of wal
>>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>>> bad. Aborting...
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>> 2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0]
>>>> regionserver.HRegionServer: ABORTING region server
>>>> hdp-w-1.c.dks-hadoop.internal,16020,1438869946909: Unrecoverable exception
>>>> while closing region
>>>> SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.,
>>>> still finishing close
>>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>>> bad. Aborting...
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>> 2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0]
>>>> regionserver.HRegionServer: RegionServer abort: loaded coprocessors are:
>>>> [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl,
>>>> org.apache.hadoop.hbase.regionserver.LocalIndexSplitter,
>>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver,
>>>> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver,
>>>> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint,
>>>> org.apache.phoenix.coprocessor.ScanRegionObserver,
>>>> org.apache.phoenix.hbase.index.Indexer,
>>>> org.apache.phoenix.coprocessor.SequenceRegionObserver]
>>>> 2015-08-06 14:12:56,281 INFO  [RS_CLOSE_REGION-hdp-w-1:16020-0]
>>>> regionserver.HRegionServer: Dump of metrics as JSON on abort: {
>>>>   "beans" : [ {
>>>>     "name" : "java.lang:type=Memory",
>>>>     "modelerType" : "sun.management.MemoryImpl",
>>>>     "ObjectPendingFinalizationCount" : 0,
>>>>     "NonHeapMemoryUsage" : {
>>>>       "committed" : 137166848,
>>>>       "init" : 136773632,
>>>>       "max" : 184549376,
>>>>       "used" : 48667528
>>>>     },
>>>>     "HeapMemoryUsage" : {
>>>>       "committed" : 2104754176,
>>>>       "init" : 2147483648,
>>>>       "max" : 2104754176,
>>>>       "used" : 270075472
>>>>     },
>>>>     "Verbose" : true,
>>>>     "ObjectName" : "java.lang:type=Memory"
>>>>   } ],
>>>>   "beans" : [ {
>>>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC",
>>>>     "modelerType" : "RegionServer,sub=IPC",
>>>>     "tag.Context" : "regionserver",
>>>>     "tag.Hostname" : "hdp-w-1"
>>>>   } ],
>>>>   "beans" : [ {
>>>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication",
>>>>     "modelerType" : "RegionServer,sub=Replication",
>>>>     "tag.Context" : "regionserver",
>>>>     "tag.Hostname" : "hdp-w-1"
>>>>   } ],
>>>>   "beans" : [ {
>>>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server",
>>>>     "modelerType" : "RegionServer,sub=Server",
>>>>     "tag.Context" : "regionserver",
>>>>     "tag.Hostname" : "hdp-w-1"
>>>>   } ]
>>>> }
>>>> 2015-08-06 14:12:56,284 ERROR [sync.4] wal.FSHLog: Error syncing,
>>>> request close of wal
>>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>>> bad. Aborting...
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>> 2015-08-06 14:12:56,285 WARN
>>>>  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>>>> wal.FSHLog: Failed last sync but no outstanding unsync edits so falling
>>>> through to close; java.io.IOException: All datanodes
>>>> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>>> are bad. Aborting...
>>>> 2015-08-06 14:12:56,285 ERROR
>>>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>>>> wal.ProtobufLogWriter: Got IOException while writing trailer
>>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>>> bad. Aborting...
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>> 2015-08-06 14:12:56,285 WARN
>>>>  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>>>> wal.FSHLog: Riding over failed WAL close of
>>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359,
>>>> cause="All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>>> are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS
>>>> SYNCED SO SHOULD BE OK
>>>> 2015-08-06 14:12:56,287 INFO
>>>>  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>>>> wal.FSHLog: Rolled WAL
>>>> /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359
>>>> with entries=100, filesize=30.73 KB; new WAL
>>>> /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438870376262
>>>> 2015-08-06 14:12:56,288 INFO
>>>>  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>>>> wal.FSHLog: Archiving
>>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359
>>>> to
>>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359
>>>> 2015-08-06 14:12:56,315 INFO  [RS_CLOSE_REGION-hdp-w-1:16020-0]
>>>> regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing
>>>> region
>>>> SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.,
>>>> still finishing close
>>>> 2015-08-06 14:12:56,315 INFO
>>>>  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020]
>>>> regionserver.SplitLogWorker: Sending interrupt to stop the worker thread
>>>> 2015-08-06 14:12:56,315 ERROR [RS_CLOSE_REGION-hdp-w-1:16020-0]
>>>> executor.EventHandler: Caught throwable while processing event
>>>> M_RS_CLOSE_REGION
>>>> java.lang.RuntimeException: java.io.IOException: All datanodes
>>>> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>>> are bad. Aborting...
>>>>         at
>>>> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152)
>>>>         at
>>>> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>>> bad. Aborting...
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>>
>>>> -------------
>>>> m datanode log
>>>> -------------
>>>> 2015-07-27 14:11:16,082 INFO  datanode.DataNode
>>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>>> BP-369072949-10.240.200.196-1437998325049:blk_1073742677_1857,
>>>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>>>> 2015-07-27 14:11:16,132 INFO  datanode.DataNode
>>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>>> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858 src: /
>>>> 10.240.200.196:56767 dest: /10.240.200.196:50010
>>>> 2015-07-27 14:11:16,155 INFO  DataNode.clienttrace
>>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:56767,
>>>> dest: /10.240.200.196:50010, bytes: 117761, op: HDFS_WRITE, cliID:
>>>> DFSClient_NONMAPREDUCE_177514816_1, offset: 0, srvID:
>>>> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
>>>> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858, duration:
>>>> 6385289
>>>> 2015-07-27 14:11:16,155 INFO  datanode.DataNode
>>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>>> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858,
>>>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>>>> 2015-07-27 14:11:16,267 ERROR datanode.DataNode
>>>> (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver
>>>> error processing unknown operation  src: /127.0.0.1:60513 dst: /
>>>> 127.0.0.1:50010
>>>> java.io.EOFException
>>>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>>>         at
>>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>> 2015-07-27 14:11:16,405 INFO  datanode.DataNode
>>>> (DataNode.java:transferBlock(1943)) - DatanodeRegistration(
>>>> 10.240.200.196:50010,
>>>> datanodeUuid=329bbe62-bcea-4a6d-8c97-e800631deb81, infoPort=50075,
>>>> infoSecurePort=0, ipcPort=8010,
>>>> storageInfo=lv=-56;cid=CID-1247f294-77a9-4605-b6d3-4c1398bb5db0;nsid=2032226938;c=0)
>>>> Starting thread to transfer
>>>> BP-369072949-10.240.200.196-1437998325049:blk_1073742649_1829 to
>>>> 10.240.2.235:50010 10.240.164.0:50010
>>>>
>>>> -------------
>>>> w-0 datanode log
>>>> -------------
>>>> 2015-07-27 14:11:25,019 ERROR datanode.DataNode
>>>> (DataXceiver.java:run(278)) -
>>>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
>>>> operation  src: /127.0.0.1:47993 dst: /127.0.0.1:50010
>>>> java.io.EOFException
>>>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>>>         at
>>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>> 2015-07-27 14:11:25,077 INFO  DataNode.clienttrace
>>>> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
>>>> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073742631, srvID:
>>>> a5eea5a8-5112-46da-9f18-64274486c472, success: true
>>>>
>>>> -----------------------------
>>>> Thank you in advance,
>>>>
>>>> Adrià
>>>>
>>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>>
>
>
>
> --
> *  Regards*
> *  Sandeep Nemuri*
>



-- 
*  Regards*
*  Sandeep Nemuri*

Mime
View raw message