phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilgupt...@gmail.com>
Subject Re: RegionServers shutdown randomly
Date Sun, 09 Aug 2015 03:42:20 GMT
2015-08-06 14:11:13,640 ERROR
[PriorityRpcServer.handler=19,queue=1,port=16000] master.MasterRpcServices:
Region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 reported a
fatal error:
ABORTING region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905:
Unrecoverable exception while closing region
SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
still finishing close
Cause:
java.io.IOException: All datanodes DatanodeInfoWithStorage[
10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad.
Aborting...
        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)

Do you have some bad disk in your cluster? Above error looks like some HDFS
problem. How stable is your hdfs?

On Fri, Aug 7, 2015 at 2:06 AM, Adrià Vilà <avila@datknosys.com> wrote:

> I do other workloads but not while this error happened because I was
> testing it on purpouse. I've noticed that the RegionServers do fail
> randomly.
>
> NameNode heap: 4GB
> DataNode heap: 1GB
> NameNode threads: 100
>
> HDFS-site:
>     <property>
>       <name>dfs.blocksize</name>
>       <value>134217728</value>
>     </property>
>    <property>
>       <name>dfs.datanode.du.reserved</name>
>       <value>1073741824</value>
>     </property>
>
>
> HBase-site:
>     <property>
>       <name>hbase.client.keyvalue.maxsize</name>
>       <value>1048576</value>
>     </property>
>     <property>
>       <name>hbase.hregion.max.filesize</name>
>       <value>10737418240</value>
>     </property>
>     <property>
>       <name>hbase.hregion.memstore.block.multiplier</name>
>       <value>4</value>
>     </property>
>     <property>
>       <name>hbase.hregion.memstore.flush.size</name>
>       <value>134217728</value>
>     </property>
>     <property>
>       <name>hbase.regionserver.wal.codec</name>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
>     </property>
>
> Next I attach as many logs as I could find!
>
> ---------------
> NameNode log:
> ---------------
> 2015-08-06 14:11:10,079 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(3573)) - BLOCK* allocate
> blk_1073766164_25847{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
> ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW]]}
> for
> /apps/hbase/data/WALs/hdp-w-2.c.dks-hadoop.internal,16020,1438869941783/hdp-w-2.c.dks-hadoop.internal%2C16020%2C1438869941783..meta.1438870270071.meta
> 2015-08-06 14:11:10,095 INFO  hdfs.StateChange
> (FSNamesystem.java:fsync(3975)) - BLOCK* fsync:
> /apps/hbase/data/WALs/hdp-w-2.c.dks-hadoop.internal,16020,1438869941783/hdp-w-2.c.dks-hadoop.internal%2C16020%2C1438869941783..meta.1438870270071.meta
> for DFSClient_NONMAPREDUCE_774922977_1
> 2015-08-06 14:11:10,104 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(3573)) - BLOCK* allocate
> blk_1073766165_25848{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
> ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW]]}
> for
> /apps/hbase/data/data/hbase/meta/1588230740/.tmp/d8cb7421fb78489c97d7aa7767449acc
> 2015-08-06 14:11:10,120 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
> blockMap updated: 10.240.164.0:50010 is added to
> blk_1073766165_25848{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
> ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW]]}
> size 0
> 2015-08-06 14:11:10,120 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
> blockMap updated: 10.240.187.182:50010 is added to
> blk_1073766165_25848{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
> ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW]]}
> size 0
> 2015-08-06 14:11:10,122 INFO  hdfs.StateChange
> (FSNamesystem.java:completeFile(3493)) - DIR* completeFile:
> /apps/hbase/data/data/hbase/meta/1588230740/.tmp/d8cb7421fb78489c97d7aa7767449acc
> is closed by DFSClient_NONMAPREDUCE_774922977_1
> 2015-08-06 14:11:10,226 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(3573)) - BLOCK* allocate
> blk_1073766166_25849{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
> ReplicaUC[[DISK]DS-2fa04842-547c-417c-8aab-91eb4df999de:NORMAL:10.240.200.196:50010|RBW]]}
> for
> /apps/hbase/data/data/hbase/meta/1588230740/.tmp/e39d202af4494e5e9e7dd4b75a61a0a0
> 2015-08-06 14:11:10,421 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
> blockMap updated: 10.240.200.196:50010 is added to
> blk_1073766166_25849{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
> ReplicaUC[[DISK]DS-2fa04842-547c-417c-8aab-91eb4df999de:NORMAL:10.240.200.196:50010|RBW]]}
> size 0
> 2015-08-06 14:11:10,421 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
> blockMap updated: 10.240.187.182:50010 is added to
> blk_1073766166_25849{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
> ReplicaUC[[DISK]DS-2fa04842-547c-417c-8aab-91eb4df999de:NORMAL:10.240.200.196:50010|RBW]]}
> size 0
> 2015-08-06 14:11:10,423 INFO  hdfs.StateChange
> (FSNamesystem.java:completeFile(3493)) - DIR* completeFile:
> /apps/hbase/data/data/hbase/meta/1588230740/.tmp/e39d202af4494e5e9e7dd4b75a61a0a0
> is closed by DFSClient_NONMAPREDUCE_774922977_1
> 2015-08-06 14:11:13,623 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(3573)) - BLOCK* allocate
> blk_1073766167_25850{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW],
> ReplicaUC[[DISK]DS-605c96e7-f5dc-4a63-8f34-ea5865077c4b:NORMAL:10.240.2.235:50010|RBW]]}
> for
> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617
> 2015-08-06 14:11:13,638 INFO  hdfs.StateChange
> (FSNamesystem.java:fsync(3975)) - BLOCK* fsync:
> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617
> for DFSClient_NONMAPREDUCE_722958591_1
> 2015-08-06 14:11:13,965 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
> blockMap updated: 10.240.2.235:50010 is added to
> blk_1073766167_25850{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW],
> ReplicaUC[[DISK]DS-605c96e7-f5dc-4a63-8f34-ea5865077c4b:NORMAL:10.240.2.235:50010|RBW]]}
> size 90
> 2015-08-06 14:11:13,966 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
> blockMap updated: 10.240.164.0:50010 is added to
> blk_1073766167_25850{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW],
> ReplicaUC[[DISK]DS-605c96e7-f5dc-4a63-8f34-ea5865077c4b:NORMAL:10.240.2.235:50010|RBW]]}
> size 90
> 2015-08-06 14:11:13,968 INFO  hdfs.StateChange
> (FSNamesystem.java:completeFile(3493)) - DIR* completeFile:
> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617
> is closed by DFSClient_NONMAPREDUCE_722958591_1
>
> ---------------
> HBase master DataNode log:
> ---------------
> 2015-08-06 14:09:17,187 INFO  DataNode.clienttrace
> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073749044, srvID:
> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
> 2015-08-06 14:09:17,273 INFO  DataNode.clienttrace
> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073749049, srvID:
> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
> 2015-08-06 14:09:17,325 INFO  DataNode.clienttrace
> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073749051, srvID:
> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
> 2015-08-06 14:09:46,810 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(655)) - Receiving
> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842 src: /
> 10.240.200.196:34789 dest: /10.240.200.196:50010
> 2015-08-06 14:09:46,843 INFO  DataNode.clienttrace
> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:34789,
> dest: /10.240.200.196:50010, bytes: 6127, op: HDFS_WRITE, cliID:
> DFSClient_attempt_14388456554803_0001_r_000000_0_1296198547_30, offset: 0,
> srvID: 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842, duration:
> 29150527
> 2015-08-06 14:09:46,843 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842,
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2015-08-06 14:09:47,193 INFO  DataNode.clienttrace
> (DataXceiver.java:requestShortCircuitShm(468)) - cliID:
> DFSClient_NONMAPREDUCE_22636141_1, src: 127.0.0.1, dest: 127.0.0.1, op:
> REQUEST_SHORT_CIRCUIT_SHM, shmId: a70bde6b6e67f4a5394e209320b451f3, srvID:
> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
> 2015-08-06 14:09:47,211 INFO  DataNode.clienttrace
> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073766159, srvID:
> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
> 2015-08-06 14:09:52,887 INFO  datanode.ShortCircuitRegistry
> (ShortCircuitRegistry.java:processBlockInvalidation(251)) - Block
> 1073766159_BP-369072949-10.240.200.196-1437998325049 has been invalidated.
> Marking short-circuit slots as invalid: Slot(slotIdx=0,
> shm=RegisteredShm(a70bde6b6e67f4a5394e209320b451f3))
> 2015-08-06 14:09:52,887 INFO  impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
> blk_1073766159_25842 file
> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766159
> for deletion
> 2015-08-06 14:09:52,887 INFO  impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
> BP-369072949-10.240.200.196-1437998325049 blk_1073766159_25842 file
> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766159
> 2015-08-06 14:09:53,562 ERROR datanode.DataNode
> (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver
> error processing unknown operation  src: /127.0.0.1:35735 dst: /
> 127.0.0.1:50010
> java.io.EOFException
>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-08-06 14:10:53,649 ERROR datanode.DataNode
> (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver
> error processing unknown operation  src: /127.0.0.1:35826 dst: /
> 127.0.0.1:50010
> java.io.EOFException
>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-08-06 14:11:02,088 INFO  DataNode.clienttrace
> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:46835,
> dest: /10.240.200.196:50010, bytes: 434, op: HDFS_WRITE, cliID:
> DFSClient_NONMAPREDUCE_-522245611_1, offset: 0, srvID:
> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766157_25840, duration:
> 131662299497
> 2015-08-06 14:11:02,088 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766157_25840,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2015-08-06 14:11:03,018 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(655)) - Receiving
> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843 src: /
> 10.240.164.0:47039 dest: /10.240.200.196:50010
> 2015-08-06 14:11:03,042 INFO  DataNode.clienttrace
> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:47039,
> dest: /10.240.200.196:50010, bytes: 30845, op: HDFS_WRITE, cliID:
> DFSClient_NONMAPREDUCE_369501114_375, offset: 0, srvID:
> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843, duration:
> 13150343
> 2015-08-06 14:11:03,042 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2015-08-06 14:11:03,834 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(655)) - Receiving
> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846 src: /
> 10.240.200.196:34941 dest: /10.240.200.196:50010
> 2015-08-06 14:11:03,917 INFO  DataNode.clienttrace
> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:34941,
> dest: /10.240.200.196:50010, bytes: 47461, op: HDFS_WRITE, cliID:
> DFSClient_NONMAPREDUCE_-1443736010_402, offset: 0, srvID:
> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846, duration:
> 76492287
> 2015-08-06 14:11:03,917 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846,
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2015-08-06 14:11:04,887 INFO  datanode.ShortCircuitRegistry
> (ShortCircuitRegistry.java:processBlockInvalidation(251)) - Block
> 1073766154_BP-369072949-10.240.200.196-1437998325049 has been invalidated.
> Marking short-circuit slots as invalid: Slot(slotIdx=2,
> shm=RegisteredShm(15d8ae8eb51e4590b7e51f8c723d60eb))
> 2015-08-06 14:11:04,887 INFO  impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
> blk_1073766154_25837 file
> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766154
> for deletion
> 2015-08-06 14:11:04,894 INFO  datanode.ShortCircuitRegistry
> (ShortCircuitRegistry.java:processBlockInvalidation(251)) - Block
> 1073766155_BP-369072949-10.240.200.196-1437998325049 has been invalidated.
> Marking short-circuit slots as invalid: Slot(slotIdx=1,
> shm=RegisteredShm(15d8ae8eb51e4590b7e51f8c723d60eb))
> 2015-08-06 14:11:04,894 INFO  impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
> blk_1073766155_25838 file
> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766155
> for deletion
> 2015-08-06 14:11:04,894 INFO  impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
> blk_1073766156_25839 file
> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766156
> for deletion
> 2015-08-06 14:11:04,895 INFO  impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
> BP-369072949-10.240.200.196-1437998325049 blk_1073766154_25837 file
> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766154
> 2015-08-06 14:11:07,887 INFO  impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
> blk_1073766157_25840 file
> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766157
> for deletion
> 2015-08-06 14:11:07,889 INFO  impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
> BP-369072949-10.240.200.196-1437998325049 blk_1073766157_25840 file
> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766157
> 2015-08-06 14:11:10,241 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(655)) - Receiving
> BP-369072949-10.240.200.196-1437998325049:blk_1073766166_25849 src: /
> 10.240.187.182:60725 dest: /10.240.200.196:50010
> 2015-08-06 14:11:10,419 INFO  DataNode.clienttrace
> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.187.182:60725,
> dest: /10.240.200.196:50010, bytes: 1212340, op: HDFS_WRITE, cliID:
> DFSClient_NONMAPREDUCE_774922977_1, offset: 0, srvID:
> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766166_25849, duration:
> 172298145
> 2015-08-06 14:11:10,419 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766166_25849,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2015-08-06 14:11:53,594 ERROR datanode.DataNode
> (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver
> error processing unknown operation  src: /127.0.0.1:35992 dst: /
> 127.0.0.1:50010
> java.io.EOFException
>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-08-06 14:11:57,109 INFO  DataNode.clienttrace
> (DataXceiver.java:releaseShortCircuitFds(407)) - src: 127.0.0.1, dest:
> 127.0.0.1, op: RELEASE_SHORT_CIRCUIT_FDS, shmId:
> 0ee5b1d24e1d07dc72f681a4fbc06040, slotIdx: 0, srvID:
> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
>
> ---------------
> HBase worker DataNode log:
> ---------------
> 2015-08-06 14:09:41,378 INFO  datanode.VolumeScanner
> (VolumeScanner.java:markSuspectBlock(665)) -
> VolumeScanner(/hadoop/hdfs/data, DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01):
> Not scheduling suspect block
> BP-369072949-10.240.200.196-1437998325049:blk_1073749047_8295 for
> rescanning, because we rescanned it recently.
> 2015-08-06 14:09:46,812 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(655)) - Receiving
> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842 src: /
> 10.240.200.196:48711 dest: /10.240.164.0:50010
> 2015-08-06 14:09:46,842 INFO  DataNode.clienttrace
> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:48711,
> dest: /10.240.164.0:50010, bytes: 6127, op: HDFS_WRITE, cliID:
> DFSClient_attempt_14388456554803_0001_r_000000_0_1296198547_30, offset: 0,
> srvID: a5eea5a8-5112-46da-9f18-64274486c472, blockid:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842, duration:
> 28706244
> 2015-08-06 14:09:46,842 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2015-08-06 14:09:47,033 INFO  DataNode.clienttrace
> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:55267,
> dest: /10.240.164.0:50010, bytes: 426792, op: HDFS_WRITE, cliID:
> DFSClient_NONMAPREDUCE_-522245611_1, offset: 0, srvID:
> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766158_25841, duration:
> 56549200180
> 2015-08-06 14:09:47,033 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766158_25841,
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2015-08-06 14:09:50,052 INFO  impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
> blk_1073766159_25842 file
> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766159
> for deletion
> 2015-08-06 14:09:50,053 INFO  impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
> BP-369072949-10.240.200.196-1437998325049 blk_1073766159_25842 file
> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766159
> 2015-08-06 14:09:52,456 ERROR datanode.DataNode
> (DataXceiver.java:run(278)) -
> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
> operation  src: /127.0.0.1:44663 dst: /127.0.0.1:50010
> java.io.EOFException
>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-08-06 14:10:52,401 ERROR datanode.DataNode
> (DataXceiver.java:run(278)) -
> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
> operation  src: /127.0.0.1:44709 dst: /127.0.0.1:50010
> java.io.EOFException
>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-08-06 14:11:02,052 INFO  impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
> blk_1073766158_25841 file
> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766158
> for deletion
> 2015-08-06 14:11:02,053 INFO  impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
> BP-369072949-10.240.200.196-1437998325049 blk_1073766158_25841 file
> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766158
> 2015-08-06 14:11:02,089 INFO  DataNode.clienttrace
> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:55265,
> dest: /10.240.164.0:50010, bytes: 434, op: HDFS_WRITE, cliID:
> DFSClient_NONMAPREDUCE_-522245611_1, offset: 0, srvID:
> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766157_25840, duration:
> 131663017075
> 2015-08-06 14:11:02,089 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766157_25840,
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2015-08-06 14:11:02,992 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(655)) - Receiving
> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843 src: /
> 10.240.164.0:55469 dest: /10.240.164.0:50010
> 2015-08-06 14:11:03,043 INFO  DataNode.clienttrace
> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:55469,
> dest: /10.240.164.0:50010, bytes: 30845, op: HDFS_WRITE, cliID:
> DFSClient_NONMAPREDUCE_369501114_375, offset: 0, srvID:
> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843, duration:
> 23219007
> 2015-08-06 14:11:03,043 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843,
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2015-08-06 14:11:03,288 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(655)) - Receiving
> BP-369072949-10.240.200.196-1437998325049:blk_1073766161_25844 src: /
> 10.240.2.235:56643 dest: /10.240.164.0:50010
> 2015-08-06 14:11:03,319 INFO  DataNode.clienttrace
> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.2.235:56643,
> dest: /10.240.164.0:50010, bytes: 9446, op: HDFS_WRITE, cliID:
> DFSClient_NONMAPREDUCE_920026358_372, offset: 0, srvID:
> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766161_25844, duration:
> 29200871
> 2015-08-06 14:11:03,319 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766161_25844,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2015-08-06 14:11:03,475 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(655)) - Receiving
> BP-369072949-10.240.200.196-1437998325049:blk_1073766162_25845 src: /
> 10.240.187.182:41291 dest: /10.240.164.0:50010
> 2015-08-06 14:11:03,501 INFO  DataNode.clienttrace
> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.187.182:41291,
> dest: /10.240.164.0:50010, bytes: 34213, op: HDFS_WRITE, cliID:
> DFSClient_NONMAPREDUCE_-643601862_395, offset: 0, srvID:
> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766162_25845, duration:
> 24992990
> 2015-08-06 14:11:03,501 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766162_25845,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2015-08-06 14:11:03,837 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(655)) - Receiving
> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846 src: /
> 10.240.200.196:48863 dest: /10.240.164.0:50010
> 2015-08-06 14:11:03,916 INFO  DataNode.clienttrace
> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:48863,
> dest: /10.240.164.0:50010, bytes: 47461, op: HDFS_WRITE, cliID:
> DFSClient_NONMAPREDUCE_-1443736010_402, offset: 0, srvID:
> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846, duration:
> 78295982
> 2015-08-06 14:11:03,916 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2015-08-06 14:11:05,052 INFO  impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
> blk_1073766157_25840 file
> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766157
> for deletion
> 2015-08-06 14:11:05,053 INFO  impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
> BP-369072949-10.240.200.196-1437998325049 blk_1073766157_25840 file
> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766157
> 2015-08-06 14:11:10,083 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(655)) - Receiving
> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847 src: /
> 10.240.187.182:41294 dest: /10.240.164.0:50010
> 2015-08-06 14:11:10,111 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(655)) - Receiving
> BP-369072949-10.240.200.196-1437998325049:blk_1073766165_25848 src: /
> 10.240.187.182:41296 dest: /10.240.164.0:50010
> 2015-08-06 14:11:10,119 INFO  DataNode.clienttrace
> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.187.182:41296,
> dest: /10.240.164.0:50010, bytes: 122645, op: HDFS_WRITE, cliID:
> DFSClient_NONMAPREDUCE_774922977_1, offset: 0, srvID:
> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766165_25848, duration:
> 7281253
> 2015-08-06 14:11:10,119 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766165_25848,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2015-08-06 14:11:13,627 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(655)) - Receiving
> BP-369072949-10.240.200.196-1437998325049:blk_1073766167_25850 src: /
> 10.240.164.0:55473 dest: /10.240.164.0:50010
> 2015-08-06 14:11:13,965 INFO  DataNode.clienttrace
> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:55473,
> dest: /10.240.164.0:50010, bytes: 98, op: HDFS_WRITE, cliID:
> DFSClient_NONMAPREDUCE_722958591_1, offset: 0, srvID:
> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766167_25850, duration:
> 332393414
> 2015-08-06 14:11:13,965 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766167_25850,
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2015-08-06 14:11:52,392 ERROR datanode.DataNode
> (DataXceiver.java:run(278)) -
> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
> operation  src: /127.0.0.1:44757 dst: /127.0.0.1:50010
> java.io.EOFException
>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-08-06 14:12:10,514 INFO  datanode.DataNode
> (BlockReceiver.java:receiveBlock(888)) - Exception for
> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847
> java.io.IOException: Premature EOF from inputStream
>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-08-06 14:12:10,515 INFO  datanode.DataNode
> (BlockReceiver.java:run(1312)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847,
> type=LAST_IN_PIPELINE, downstreams=0:[]: Thread is interrupted.
> 2015-08-06 14:12:10,515 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2015-08-06 14:12:10,515 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(837)) - opWriteBlock
> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847 received
> exception java.io.IOException: Premature EOF from inputStream
> 2015-08-06 14:12:10,515 ERROR datanode.DataNode
> (DataXceiver.java:run(278)) -
> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing
> WRITE_BLOCK operation  src: /10.240.187.182:41294 dst: /10.240.164.0:50010
> java.io.IOException: Premature EOF from inputStream
>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-08-06 14:12:10,519 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(655)) - Receiving
> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847 src: /
> 10.240.187.182:41319 dest: /10.240.164.0:50010
> 2015-08-06 14:12:10,520 INFO  impl.FsDatasetImpl
> (FsDatasetImpl.java:recoverRbw(1322)) - Recover RBW replica
> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847
> 2015-08-06 14:12:10,520 INFO  impl.FsDatasetImpl
> (FsDatasetImpl.java:recoverRbw(1333)) - Recovering ReplicaBeingWritten,
> blk_1073766164_25847, RBW
>   getNumBytes()     = 545
>   getBytesOnDisk()  = 545
>   getVisibleLength()= 545
>   getVolume()       = /hadoop/hdfs/data/current
>   getBlockFile()    =
> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/rbw/blk_1073766164
>   bytesAcked=545
>   bytesOnDisk=545
> 2015-08-06 14:12:52,419 ERROR datanode.DataNode
> (DataXceiver.java:run(278)) -
> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
> operation  src: /127.0.0.1:44786 dst: /127.0.0.1:50010
> java.io.EOFException
>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-08-06 14:12:56,277 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(655)) - Receiving
> BP-369072949-10.240.200.196-1437998325049:blk_1073766168_25852 src: /
> 10.240.2.235:56693 dest: /10.240.164.0:50010
> 2015-08-06 14:12:56,843 INFO  DataNode.clienttrace
> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.2.235:56693,
> dest: /10.240.164.0:50010, bytes: 98, op: HDFS_WRITE, cliID:
> DFSClient_NONMAPREDUCE_241093651_1, offset: 0, srvID:
> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766168_25852, duration:
> 564768014
> 2015-08-06 14:12:56,843 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766168_25852,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2015-08-06 14:13:10,579 INFO  datanode.DataNode
> (BlockReceiver.java:receiveBlock(888)) - Exception for
> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25851
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.240.164.0:50010
> remote=/10.240.187.182:41319]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>         at java.io.DataInputStream.read(DataInputStream.java:149)
>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-08-06 14:13:10,580 INFO  datanode.DataNode
> (BlockReceiver.java:run(1312)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25851,
> type=LAST_IN_PIPELINE, downstreams=0:[]: Thread is interrupted.
> 2015-08-06 14:13:10,580 INFO  datanode.DataNode
> (BlockReceiver.java:run(1348)) - PacketResponder:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25851,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2015-08-06 14:13:10,580 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(837)) - opWriteBlock
> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25851 received
> exception java.net.SocketTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.240.164.0:50010
> remote=/10.240.187.182:41319]
> 2015-08-06 14:13:10,580 ERROR datanode.DataNode
> (DataXceiver.java:run(278)) -
> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing
> WRITE_BLOCK operation  src: /10.240.187.182:41319 dst: /10.240.164.0:50010
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.240.164.0:50010
> remote=/10.240.187.182:41319]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>         at java.io.DataInputStream.read(DataInputStream.java:149)
>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-08-06 14:13:52,432 ERROR datanode.DataNode
> (DataXceiver.java:run(278)) -
> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
> operation  src: /127.0.0.1:44818 dst: /127.0.0.1:50010
> java.io.EOFException
>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-08-06 14:14:52,399 ERROR datanode.DataNode
> (DataXceiver.java:run(278)) -
> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
> operation  src: /127.0.0.1:44848 dst: /127.0.0.1:50010
> java.io.EOFException
>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-08-06 14:15:14,227 INFO  datanode.DataNode
> (DataXceiver.java:writeBlock(655)) - Receiving
> BP-369072949-10.240.200.196-1437998325049:blk_1073766172_25856 src: /
> 10.240.187.182:41397 dest: /10.240.164.0:50010
> 2015-08-06 14:15:14,240 INFO  DataNode.clienttrace
> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.187.182:41397,
> dest: /10.240.164.0:50010, bytes: 92789, op: HDFS_WRITE, cliID:
> DFSClient_NONMAPREDUCE_774922977_1, offset: 0, srvID:
> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
> BP-369072949-10.240.200.196-1437998325049:blk_1073766172_25856, duration:
> 12340123
>
>
> Thank you for helping :)
> Adri
>
>
> *Desde*: "Vladimir Rodionov" <vladrodionov@gmail.com>
> *Enviado*: jueves, 06 de agosto de 2015 20:07
> *Para*: user@phoenix.apache.org, avila@datknosys.com
> *Asunto*: Re: RegionServers shutdown randomly
>
> What do DN and NN log say? Do you run any other workload on the same
> cluster? What is your cluster configuration?
> Max memory per RS, DN and other collocated processes?
>
> -Vlad
>
> On Thu, Aug 6, 2015 at 8:42 AM, Adrià Vilà <avila@datknosys.com> wrote:
>>
>> Hello,
>>
>> HBase RegionServers fail once in a while:
>>
>> - it can be any regionserver, not always de same
>>
>> - it can happen when all the cluster is idle (at least not executing any
>> human launched task)
>>
>> - it can happen at any time, not always the same
>>
>>
>> The cluster versions:
>>
>> - Phoenix 4.4 (or 4.5)
>>
>> - HBase 1.1.1
>>
>> - Hadoop/HDFS 2.7.1
>>
>> - Zookeeper 3.4.6
>>
>>
>>
>> Some configs:
>> -  ulimit -a
>> core file size          (blocks, -c) 0
>> data seg size           (kbytes, -d) unlimited
>> scheduling priority             (-e) 0
>> file size               (blocks, -f) unlimited
>> pending signals                 (-i) 103227
>> max locked memory       (kbytes, -l) 64
>> max memory size         (kbytes, -m) unlimited
>> open files                      (-n) 1024
>> pipe size            (512 bytes, -p) 8
>> POSIX message queues     (bytes, -q) 819200
>> real-time priority              (-r) 0
>> stack size              (kbytes, -s) 10240
>> cpu time               (seconds, -t) unlimited
>> max user processes              (-u) 103227
>> virtual memory          (kbytes, -v) unlimited
>> file locks                      (-x) unlimited
>> - have increased default timeouts for: hbase rpc, zookeeper session, dks
>> socket, regionserver lease and client scanner.
>>
>> Next you can find the logs for the master, the regionserver that failed
>> first, another failed and the datanode log for master and worker.
>>
>> The timing was aproximately:
>>
>> 14:05 start hbase
>> 14.11 w-0 down
>> 14.14 w-1 down
>> 14.15 stop hbase
>>
>>
>> -------------
>> hbase master log (m)
>> -------------
>> 2015-08-06 14:11:13,640 ERROR
>> [PriorityRpcServer.handler=19,queue=1,port=16000] master.MasterRpcServices:
>> Region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 reported a
>> fatal error:
>> ABORTING region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905:
>> Unrecoverable exception while closing region
>> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
>> still finishing close
>> Cause:
>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>> bad. Aborting...
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>
>> --------------
>> hbase regionserver log (w-0)
>> --------------
>> 2015-08-06 14:11:13,611 INFO
>>  [PriorityRpcServer.handler=0,queue=0,port=16020]
>> regionserver.RSRpcServices: Close 888f017eb1c0557fbe7079b50626c891, moving
>> to hdp-m.c.dks-hadoop.internal,16020,1438869954062
>> 2015-08-06 14:11:13,615 INFO
>>  [StoreCloserThread-SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.-1]
>> regionserver.HStore: Closed 0
>> 2015-08-06 14:11:13,616 FATAL
>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.append-pool1-t1]
>> wal.FSHLog: Could not append. Requesting close of wal
>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>> bad. Aborting...
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>> 2015-08-06 14:11:13,617 ERROR [sync.4] wal.FSHLog: Error syncing, request
>> close of wal
>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>> bad. Aborting...
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>> 2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0]
>> regionserver.HRegionServer: ABORTING region server
>> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable exception
>> while closing region
>> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
>> still finishing close
>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>> bad. Aborting...
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>> 2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0]
>> regionserver.HRegionServer: RegionServer abort: loaded coprocessors are:
>> [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl,
>> org.apache.hadoop.hbase.regionserver.LocalIndexSplitter,
>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver,
>> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver,
>> org.apache.phoenix.coprocessor.ScanRegionObserver,
>> org.apache.phoenix.hbase.index.Indexer,
>> org.apache.phoenix.coprocessor.SequenceRegionObserver,
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl]
>> 2015-08-06 14:11:13,627 INFO  [RS_CLOSE_REGION-hdp-w-0:16020-0]
>> regionserver.HRegionServer: Dump of metrics as JSON on abort: {
>>   "beans" : [ {
>>     "name" : "java.lang:type=Memory",
>>     "modelerType" : "sun.management.MemoryImpl",
>>     "Verbose" : true,
>>     "HeapMemoryUsage" : {
>>       "committed" : 2104754176,
>>       "init" : 2147483648,
>>       "max" : 2104754176,
>>       "used" : 262288688
>>     },
>>     "ObjectPendingFinalizationCount" : 0,
>>     "NonHeapMemoryUsage" : {
>>       "committed" : 137035776,
>>       "init" : 136773632,
>>       "max" : 184549376,
>>       "used" : 49168288
>>     },
>>     "ObjectName" : "java.lang:type=Memory"
>>   } ],
>>   "beans" : [ {
>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC",
>>     "modelerType" : "RegionServer,sub=IPC",
>>     "tag.Context" : "regionserver",
>>     "tag.Hostname" : "hdp-w-0"
>>   } ],
>>   "beans" : [ {
>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication",
>>     "modelerType" : "RegionServer,sub=Replication",
>>     "tag.Context" : "regionserver",
>>     "tag.Hostname" : "hdp-w-0"
>>   } ],
>>   "beans" : [ {
>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server",
>>     "modelerType" : "RegionServer,sub=Server",
>>     "tag.Context" : "regionserver",
>>     "tag.Hostname" : "hdp-w-0"
>>   } ]
>> }
>> 2015-08-06 14:11:13,640 ERROR [sync.0] wal.FSHLog: Error syncing, request
>> close of wal
>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>> bad. Aborting...
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>> 2015-08-06 14:11:13,640 WARN
>>  [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>> wal.FSHLog: Failed last sync but no outstanding unsync edits so falling
>> through to close; java.io.IOException: All datanodes
>> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>> are bad. Aborting...
>> 2015-08-06 14:11:13,641 ERROR
>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>> wal.ProtobufLogWriter: Got IOException while writing trailer
>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>> bad. Aborting...
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>> 2015-08-06 14:11:13,641 WARN
>>  [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>> wal.FSHLog: Riding over failed WAL close of
>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576,
>> cause="All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>> are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS
>> SYNCED SO SHOULD BE OK
>> 2015-08-06 14:11:13,642 INFO
>>  [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>> wal.FSHLog: Rolled WAL
>> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576
>> with entries=101, filesize=30.38 KB; new WAL
>> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617
>> 2015-08-06 14:11:13,643 INFO  [RS_CLOSE_REGION-hdp-w-0:16020-0]
>> regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing
>> region
>> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
>> still finishing close
>> 2015-08-06 14:11:13,643 INFO
>>  [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>> wal.FSHLog: Archiving
>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576
>> to
>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576
>> 2015-08-06 14:11:13,643 ERROR [RS_CLOSE_REGION-hdp-w-0:16020-0]
>> executor.EventHandler: Caught throwable while processing event
>> M_RS_CLOSE_REGION
>> java.lang.RuntimeException: java.io.IOException: All datanodes
>> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>> are bad. Aborting...
>>         at
>> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152)
>>         at
>> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[
>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>> bad. Aborting...
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>
>> ------------
>> hbase regionserver log (w-1)
>> ------------
>> 2015-08-06 14:11:14,267 INFO  [main-EventThread]
>> replication.ReplicationTrackerZKImpl:
>> /hbase-unsecure/rs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 znode
>> expired, triggering replicatorRemoved event
>> 2015-08-06 14:12:08,203 INFO  [ReplicationExecutor-0]
>> replication.ReplicationQueuesZKImpl: Atomically moving
>> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905's wals to my queue
>> 2015-08-06 14:12:56,252 INFO
>>  [PriorityRpcServer.handler=5,queue=1,port=16020]
>> regionserver.RSRpcServices: Close 918ed7c6568e7500fb434f4268c5bbc5, moving
>> to hdp-m.c.dks-hadoop.internal,16020,1438869954062
>> 2015-08-06 14:12:56,260 INFO
>>  [StoreCloserThread-SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.-1]
>> regionserver.HStore: Closed 0
>> 2015-08-06 14:12:56,261 FATAL
>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.append-pool1-t1]
>> wal.FSHLog: Could not append. Requesting close of wal
>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>> bad. Aborting...
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>> 2015-08-06 14:12:56,261 ERROR [sync.3] wal.FSHLog: Error syncing, request
>> close of wal
>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>> bad. Aborting...
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>> 2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0]
>> regionserver.HRegionServer: ABORTING region server
>> hdp-w-1.c.dks-hadoop.internal,16020,1438869946909: Unrecoverable exception
>> while closing region
>> SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.,
>> still finishing close
>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>> bad. Aborting...
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>> 2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0]
>> regionserver.HRegionServer: RegionServer abort: loaded coprocessors are:
>> [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl,
>> org.apache.hadoop.hbase.regionserver.LocalIndexSplitter,
>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver,
>> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver,
>> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint,
>> org.apache.phoenix.coprocessor.ScanRegionObserver,
>> org.apache.phoenix.hbase.index.Indexer,
>> org.apache.phoenix.coprocessor.SequenceRegionObserver]
>> 2015-08-06 14:12:56,281 INFO  [RS_CLOSE_REGION-hdp-w-1:16020-0]
>> regionserver.HRegionServer: Dump of metrics as JSON on abort: {
>>   "beans" : [ {
>>     "name" : "java.lang:type=Memory",
>>     "modelerType" : "sun.management.MemoryImpl",
>>     "ObjectPendingFinalizationCount" : 0,
>>     "NonHeapMemoryUsage" : {
>>       "committed" : 137166848,
>>       "init" : 136773632,
>>       "max" : 184549376,
>>       "used" : 48667528
>>     },
>>     "HeapMemoryUsage" : {
>>       "committed" : 2104754176,
>>       "init" : 2147483648,
>>       "max" : 2104754176,
>>       "used" : 270075472
>>     },
>>     "Verbose" : true,
>>     "ObjectName" : "java.lang:type=Memory"
>>   } ],
>>   "beans" : [ {
>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC",
>>     "modelerType" : "RegionServer,sub=IPC",
>>     "tag.Context" : "regionserver",
>>     "tag.Hostname" : "hdp-w-1"
>>   } ],
>>   "beans" : [ {
>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication",
>>     "modelerType" : "RegionServer,sub=Replication",
>>     "tag.Context" : "regionserver",
>>     "tag.Hostname" : "hdp-w-1"
>>   } ],
>>   "beans" : [ {
>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server",
>>     "modelerType" : "RegionServer,sub=Server",
>>     "tag.Context" : "regionserver",
>>     "tag.Hostname" : "hdp-w-1"
>>   } ]
>> }
>> 2015-08-06 14:12:56,284 ERROR [sync.4] wal.FSHLog: Error syncing, request
>> close of wal
>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>> bad. Aborting...
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>> 2015-08-06 14:12:56,285 WARN
>>  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>> wal.FSHLog: Failed last sync but no outstanding unsync edits so falling
>> through to close; java.io.IOException: All datanodes
>> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>> are bad. Aborting...
>> 2015-08-06 14:12:56,285 ERROR
>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>> wal.ProtobufLogWriter: Got IOException while writing trailer
>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>> bad. Aborting...
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>> 2015-08-06 14:12:56,285 WARN
>>  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>> wal.FSHLog: Riding over failed WAL close of
>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359,
>> cause="All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>> are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS
>> SYNCED SO SHOULD BE OK
>> 2015-08-06 14:12:56,287 INFO
>>  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>> wal.FSHLog: Rolled WAL
>> /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359
>> with entries=100, filesize=30.73 KB; new WAL
>> /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438870376262
>> 2015-08-06 14:12:56,288 INFO
>>  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>> wal.FSHLog: Archiving
>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359
>> to
>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359
>> 2015-08-06 14:12:56,315 INFO  [RS_CLOSE_REGION-hdp-w-1:16020-0]
>> regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing
>> region
>> SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.,
>> still finishing close
>> 2015-08-06 14:12:56,315 INFO  [regionserver/hdp-w-1.c.dks-hadoop.internal/
>> 10.240.2.235:16020] regionserver.SplitLogWorker: Sending interrupt to
>> stop the worker thread
>> 2015-08-06 14:12:56,315 ERROR [RS_CLOSE_REGION-hdp-w-1:16020-0]
>> executor.EventHandler: Caught throwable while processing event
>> M_RS_CLOSE_REGION
>> java.lang.RuntimeException: java.io.IOException: All datanodes
>> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>> are bad. Aborting...
>>         at
>> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152)
>>         at
>> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[
>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>> bad. Aborting...
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>
>> -------------
>> m datanode log
>> -------------
>> 2015-07-27 14:11:16,082 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073742677_1857,
>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>> 2015-07-27 14:11:16,132 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(655)) - Receiving
>> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858 src: /
>> 10.240.200.196:56767 dest: /10.240.200.196:50010
>> 2015-07-27 14:11:16,155 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:56767,
>> dest: /10.240.200.196:50010, bytes: 117761, op: HDFS_WRITE, cliID:
>> DFSClient_NONMAPREDUCE_177514816_1, offset: 0, srvID:
>> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858, duration:
>> 6385289
>> 2015-07-27 14:11:16,155 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858,
>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>> 2015-07-27 14:11:16,267 ERROR datanode.DataNode
>> (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver
>> error processing unknown operation  src: /127.0.0.1:60513 dst: /
>> 127.0.0.1:50010
>> java.io.EOFException
>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>         at java.lang.Thread.run(Thread.java:745)
>> 2015-07-27 14:11:16,405 INFO  datanode.DataNode
>> (DataNode.java:transferBlock(1943)) - DatanodeRegistration(
>> 10.240.200.196:50010, datanodeUuid=329bbe62-bcea-4a6d-8c97-e800631deb81,
>> infoPort=50075, infoSecurePort=0, ipcPort=8010,
>> storageInfo=lv=-56;cid=CID-1247f294-77a9-4605-b6d3-4c1398bb5db0;nsid=2032226938;c=0)
>> Starting thread to transfer
>> BP-369072949-10.240.200.196-1437998325049:blk_1073742649_1829 to
>> 10.240.2.235:50010 10.240.164.0:50010
>>
>> -------------
>> w-0 datanode log
>> -------------
>> 2015-07-27 14:11:25,019 ERROR datanode.DataNode
>> (DataXceiver.java:run(278)) -
>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
>> operation  src: /127.0.0.1:47993 dst: /127.0.0.1:50010
>> java.io.EOFException
>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>         at java.lang.Thread.run(Thread.java:745)
>> 2015-07-27 14:11:25,077 INFO  DataNode.clienttrace
>> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
>> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073742631, srvID:
>> a5eea5a8-5112-46da-9f18-64274486c472, success: true
>>
>> -----------------------------
>> Thank you in advance,
>>
>> Adrià
>>
>>
>


-- 
Thanks & Regards,
Anil Gupta

Mime
View raw message