phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrià Vilà" <av...@datknosys.com>
Subject Re: RegionServers shutdown randomly
Date Fri, 07 Aug 2015 09:06:30 GMT
I do other workloads but not while this error happened because I was testing it on purpouse. I've noticed that the RegionServers do fail randomly.
  
 NameNode heap: 4GB
 DataNode heap: 1GB
 NameNode threads: 100
  
 HDFS-site:
     <property>
      <name>dfs.blocksize</name>
      <value>134217728</value>
    </property>
    <property>
      <name>dfs.datanode.du.reserved</name>
      <value>1073741824</value>
    </property>
     
  
 HBase-site:
     <property>
      <name>hbase.client.keyvalue.maxsize</name>
      <value>1048576</value>
    </property>
      <property>
      <name>hbase.hregion.max.filesize</name>
      <value>10737418240</value>
    </property>
     <property>
      <name>hbase.hregion.memstore.block.multiplier</name>
      <value>4</value>
    </property>
     <property>
      <name>hbase.hregion.memstore.flush.size</name>
      <value>134217728</value>
    </property>
     <property>
      <name>hbase.regionserver.wal.codec</name>
      <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
    </property>
  
 Next I attach as many logs as I could find!

  
 ---------------
 NameNode log:
 ---------------
 2015-08-06 14:11:10,079 INFO  hdfs.StateChange (FSNamesystem.java:saveAllocatedBlock(3573)) - BLOCK* allocate blk_1073766164_25847{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW], ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW]]} for /apps/hbase/data/WALs/hdp-w-2.c.dks-hadoop.internal,16020,1438869941783/hdp-w-2.c.dks-hadoop.internal%2C16020%2C1438869941783..meta.1438870270071.meta
2015-08-06 14:11:10,095 INFO  hdfs.StateChange (FSNamesystem.java:fsync(3975)) - BLOCK* fsync: /apps/hbase/data/WALs/hdp-w-2.c.dks-hadoop.internal,16020,1438869941783/hdp-w-2.c.dks-hadoop.internal%2C16020%2C1438869941783..meta.1438870270071.meta for DFSClient_NONMAPREDUCE_774922977_1
2015-08-06 14:11:10,104 INFO  hdfs.StateChange (FSNamesystem.java:saveAllocatedBlock(3573)) - BLOCK* allocate blk_1073766165_25848{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW], ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW]]} for /apps/hbase/data/data/hbase/meta/1588230740/.tmp/d8cb7421fb78489c97d7aa7767449acc
2015-08-06 14:11:10,120 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock: blockMap updated: 10.240.164.0:50010 is added to blk_1073766165_25848{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW], ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW]]} size 0
2015-08-06 14:11:10,120 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock: blockMap updated: 10.240.187.182:50010 is added to blk_1073766165_25848{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW], ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW]]} size 0
2015-08-06 14:11:10,122 INFO  hdfs.StateChange (FSNamesystem.java:completeFile(3493)) - DIR* completeFile: /apps/hbase/data/data/hbase/meta/1588230740/.tmp/d8cb7421fb78489c97d7aa7767449acc is closed by DFSClient_NONMAPREDUCE_774922977_1
2015-08-06 14:11:10,226 INFO  hdfs.StateChange (FSNamesystem.java:saveAllocatedBlock(3573)) - BLOCK* allocate blk_1073766166_25849{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW], ReplicaUC[[DISK]DS-2fa04842-547c-417c-8aab-91eb4df999de:NORMAL:10.240.200.196:50010|RBW]]} for /apps/hbase/data/data/hbase/meta/1588230740/.tmp/e39d202af4494e5e9e7dd4b75a61a0a0
2015-08-06 14:11:10,421 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock: blockMap updated: 10.240.200.196:50010 is added to blk_1073766166_25849{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW], ReplicaUC[[DISK]DS-2fa04842-547c-417c-8aab-91eb4df999de:NORMAL:10.240.200.196:50010|RBW]]} size 0
2015-08-06 14:11:10,421 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock: blockMap updated: 10.240.187.182:50010 is added to blk_1073766166_25849{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW], ReplicaUC[[DISK]DS-2fa04842-547c-417c-8aab-91eb4df999de:NORMAL:10.240.200.196:50010|RBW]]} size 0
2015-08-06 14:11:10,423 INFO  hdfs.StateChange (FSNamesystem.java:completeFile(3493)) - DIR* completeFile: /apps/hbase/data/data/hbase/meta/1588230740/.tmp/e39d202af4494e5e9e7dd4b75a61a0a0 is closed by DFSClient_NONMAPREDUCE_774922977_1
2015-08-06 14:11:13,623 INFO  hdfs.StateChange (FSNamesystem.java:saveAllocatedBlock(3573)) - BLOCK* allocate blk_1073766167_25850{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW], ReplicaUC[[DISK]DS-605c96e7-f5dc-4a63-8f34-ea5865077c4b:NORMAL:10.240.2.235:50010|RBW]]} for /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617
2015-08-06 14:11:13,638 INFO  hdfs.StateChange (FSNamesystem.java:fsync(3975)) - BLOCK* fsync: /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617 for DFSClient_NONMAPREDUCE_722958591_1
2015-08-06 14:11:13,965 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock: blockMap updated: 10.240.2.235:50010 is added to blk_1073766167_25850{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW], ReplicaUC[[DISK]DS-605c96e7-f5dc-4a63-8f34-ea5865077c4b:NORMAL:10.240.2.235:50010|RBW]]} size 90
2015-08-06 14:11:13,966 INFO  BlockStateChange (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock: blockMap updated: 10.240.164.0:50010 is added to blk_1073766167_25850{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW], ReplicaUC[[DISK]DS-605c96e7-f5dc-4a63-8f34-ea5865077c4b:NORMAL:10.240.2.235:50010|RBW]]} size 90
2015-08-06 14:11:13,968 INFO  hdfs.StateChange (FSNamesystem.java:completeFile(3493)) - DIR* completeFile: /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617 is closed by DFSClient_NONMAPREDUCE_722958591_1
  
 ---------------
 HBase master DataNode log:
 ---------------
 2015-08-06 14:09:17,187 INFO  DataNode.clienttrace (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073749044, srvID: 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
2015-08-06 14:09:17,273 INFO  DataNode.clienttrace (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073749049, srvID: 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
2015-08-06 14:09:17,325 INFO  DataNode.clienttrace (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073749051, srvID: 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
2015-08-06 14:09:46,810 INFO  datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842 src: /10.240.200.196:34789 dest: /10.240.200.196:50010
2015-08-06 14:09:46,843 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:34789, dest: /10.240.200.196:50010, bytes: 6127, op: HDFS_WRITE, cliID: DFSClient_attempt_14388456554803_0001_r_000000_0_1296198547_30, offset: 0, srvID: 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842, duration: 29150527
2015-08-06 14:09:46,843 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2015-08-06 14:09:47,193 INFO  DataNode.clienttrace (DataXceiver.java:requestShortCircuitShm(468)) - cliID: DFSClient_NONMAPREDUCE_22636141_1, src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: a70bde6b6e67f4a5394e209320b451f3, srvID: 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
2015-08-06 14:09:47,211 INFO  DataNode.clienttrace (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073766159, srvID: 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
2015-08-06 14:09:52,887 INFO  datanode.ShortCircuitRegistry (ShortCircuitRegistry.java:processBlockInvalidation(251)) - Block 1073766159_BP-369072949-10.240.200.196-1437998325049 has been invalidated.  Marking short-circuit slots as invalid: Slot(slotIdx=0, shm=RegisteredShm(a70bde6b6e67f4a5394e209320b451f3))
2015-08-06 14:09:52,887 INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling blk_1073766159_25842 file /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766159 for deletion
2015-08-06 14:09:52,887 INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:run(295)) - Deleted BP-369072949-10.240.200.196-1437998325049 blk_1073766159_25842 file /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766159
2015-08-06 14:09:53,562 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver error processing unknown operation  src: /127.0.0.1:35735 dst: /127.0.0.1:50010
java.io.EOFException
        at java.io.DataInputStream.readShort(DataInputStream.java:315)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
        at java.lang.Thread.run(Thread.java:745)
2015-08-06 14:10:53,649 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver error processing unknown operation  src: /127.0.0.1:35826 dst: /127.0.0.1:50010
java.io.EOFException
        at java.io.DataInputStream.readShort(DataInputStream.java:315)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
        at java.lang.Thread.run(Thread.java:745)
2015-08-06 14:11:02,088 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:46835, dest: /10.240.200.196:50010, bytes: 434, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-522245611_1, offset: 0, srvID: 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073766157_25840, duration: 131662299497
2015-08-06 14:11:02,088 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766157_25840, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2015-08-06 14:11:03,018 INFO  datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843 src: /10.240.164.0:47039 dest: /10.240.200.196:50010
2015-08-06 14:11:03,042 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:47039, dest: /10.240.200.196:50010, bytes: 30845, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_369501114_375, offset: 0, srvID: 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843, duration: 13150343
2015-08-06 14:11:03,042 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2015-08-06 14:11:03,834 INFO  datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846 src: /10.240.200.196:34941 dest: /10.240.200.196:50010
2015-08-06 14:11:03,917 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:34941, dest: /10.240.200.196:50010, bytes: 47461, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1443736010_402, offset: 0, srvID: 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846, duration: 76492287
2015-08-06 14:11:03,917 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2015-08-06 14:11:04,887 INFO  datanode.ShortCircuitRegistry (ShortCircuitRegistry.java:processBlockInvalidation(251)) - Block 1073766154_BP-369072949-10.240.200.196-1437998325049 has been invalidated.  Marking short-circuit slots as invalid: Slot(slotIdx=2, shm=RegisteredShm(15d8ae8eb51e4590b7e51f8c723d60eb))
2015-08-06 14:11:04,887 INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling blk_1073766154_25837 file /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766154 for deletion
2015-08-06 14:11:04,894 INFO  datanode.ShortCircuitRegistry (ShortCircuitRegistry.java:processBlockInvalidation(251)) - Block 1073766155_BP-369072949-10.240.200.196-1437998325049 has been invalidated.  Marking short-circuit slots as invalid: Slot(slotIdx=1, shm=RegisteredShm(15d8ae8eb51e4590b7e51f8c723d60eb))
2015-08-06 14:11:04,894 INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling blk_1073766155_25838 file /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766155 for deletion
2015-08-06 14:11:04,894 INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling blk_1073766156_25839 file /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766156 for deletion
2015-08-06 14:11:04,895 INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:run(295)) - Deleted BP-369072949-10.240.200.196-1437998325049 blk_1073766154_25837 file /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766154
 2015-08-06 14:11:07,887 INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling blk_1073766157_25840 file /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766157 for deletion
2015-08-06 14:11:07,889 INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:run(295)) - Deleted BP-369072949-10.240.200.196-1437998325049 blk_1073766157_25840 file /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766157
2015-08-06 14:11:10,241 INFO  datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073766166_25849 src: /10.240.187.182:60725 dest: /10.240.200.196:50010
2015-08-06 14:11:10,419 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.187.182:60725, dest: /10.240.200.196:50010, bytes: 1212340, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_774922977_1, offset: 0, srvID: 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073766166_25849, duration: 172298145
2015-08-06 14:11:10,419 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766166_25849, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2015-08-06 14:11:53,594 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver error processing unknown operation  src: /127.0.0.1:35992 dst: /127.0.0.1:50010
java.io.EOFException
        at java.io.DataInputStream.readShort(DataInputStream.java:315)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
        at java.lang.Thread.run(Thread.java:745)
2015-08-06 14:11:57,109 INFO  DataNode.clienttrace (DataXceiver.java:releaseShortCircuitFds(407)) - src: 127.0.0.1, dest: 127.0.0.1, op: RELEASE_SHORT_CIRCUIT_FDS, shmId: 0ee5b1d24e1d07dc72f681a4fbc06040, slotIdx: 0, srvID: 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
  
 ---------------
 HBase worker DataNode log:
 ---------------
 2015-08-06 14:09:41,378 INFO  datanode.VolumeScanner (VolumeScanner.java:markSuspectBlock(665)) - VolumeScanner(/hadoop/hdfs/data, DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01): Not scheduling suspect block BP-369072949-10.240.200.196-1437998325049:blk_1073749047_8295 for rescanning, because we rescanned it recently.
2015-08-06 14:09:46,812 INFO  datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842 src: /10.240.200.196:48711 dest: /10.240.164.0:50010
2015-08-06 14:09:46,842 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:48711, dest: /10.240.164.0:50010, bytes: 6127, op: HDFS_WRITE, cliID: DFSClient_attempt_14388456554803_0001_r_000000_0_1296198547_30, offset: 0, srvID: a5eea5a8-5112-46da-9f18-64274486c472, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842, duration: 28706244
2015-08-06 14:09:46,842 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2015-08-06 14:09:47,033 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:55267, dest: /10.240.164.0:50010, bytes: 426792, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-522245611_1, offset: 0, srvID: a5eea5a8-5112-46da-9f18-64274486c472, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073766158_25841, duration: 56549200180
2015-08-06 14:09:47,033 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766158_25841, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2015-08-06 14:09:50,052 INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling blk_1073766159_25842 file /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766159 for deletion
2015-08-06 14:09:50,053 INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:run(295)) - Deleted BP-369072949-10.240.200.196-1437998325049 blk_1073766159_25842 file /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766159
2015-08-06 14:09:52,456 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown operation  src: /127.0.0.1:44663 dst: /127.0.0.1:50010
java.io.EOFException
        at java.io.DataInputStream.readShort(DataInputStream.java:315)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
        at java.lang.Thread.run(Thread.java:745)
2015-08-06 14:10:52,401 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown operation  src: /127.0.0.1:44709 dst: /127.0.0.1:50010
java.io.EOFException
        at java.io.DataInputStream.readShort(DataInputStream.java:315)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
        at java.lang.Thread.run(Thread.java:745)
2015-08-06 14:11:02,052 INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling blk_1073766158_25841 file /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766158 for deletion
2015-08-06 14:11:02,053 INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:run(295)) - Deleted BP-369072949-10.240.200.196-1437998325049 blk_1073766158_25841 file /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766158
2015-08-06 14:11:02,089 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:55265, dest: /10.240.164.0:50010, bytes: 434, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-522245611_1, offset: 0, srvID: a5eea5a8-5112-46da-9f18-64274486c472, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073766157_25840, duration: 131663017075
2015-08-06 14:11:02,089 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766157_25840, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2015-08-06 14:11:02,992 INFO  datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843 src: /10.240.164.0:55469 dest: /10.240.164.0:50010
2015-08-06 14:11:03,043 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:55469, dest: /10.240.164.0:50010, bytes: 30845, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_369501114_375, offset: 0, srvID: a5eea5a8-5112-46da-9f18-64274486c472, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843, duration: 23219007
2015-08-06 14:11:03,043 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2015-08-06 14:11:03,288 INFO  datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073766161_25844 src: /10.240.2.235:56643 dest: /10.240.164.0:50010
2015-08-06 14:11:03,319 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.2.235:56643, dest: /10.240.164.0:50010, bytes: 9446, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_920026358_372, offset: 0, srvID: a5eea5a8-5112-46da-9f18-64274486c472, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073766161_25844, duration: 29200871
2015-08-06 14:11:03,319 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766161_25844, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2015-08-06 14:11:03,475 INFO  datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073766162_25845 src: /10.240.187.182:41291 dest: /10.240.164.0:50010
2015-08-06 14:11:03,501 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.187.182:41291, dest: /10.240.164.0:50010, bytes: 34213, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-643601862_395, offset: 0, srvID: a5eea5a8-5112-46da-9f18-64274486c472, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073766162_25845, duration: 24992990
2015-08-06 14:11:03,501 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766162_25845, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2015-08-06 14:11:03,837 INFO  datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846 src: /10.240.200.196:48863 dest: /10.240.164.0:50010
2015-08-06 14:11:03,916 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:48863, dest: /10.240.164.0:50010, bytes: 47461, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1443736010_402, offset: 0, srvID: a5eea5a8-5112-46da-9f18-64274486c472, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846, duration: 78295982
2015-08-06 14:11:03,916 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2015-08-06 14:11:05,052 INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling blk_1073766157_25840 file /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766157 for deletion
 2015-08-06 14:11:05,053 INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:run(295)) - Deleted BP-369072949-10.240.200.196-1437998325049 blk_1073766157_25840 file /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766157
2015-08-06 14:11:10,083 INFO  datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847 src: /10.240.187.182:41294 dest: /10.240.164.0:50010
2015-08-06 14:11:10,111 INFO  datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073766165_25848 src: /10.240.187.182:41296 dest: /10.240.164.0:50010
2015-08-06 14:11:10,119 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.187.182:41296, dest: /10.240.164.0:50010, bytes: 122645, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_774922977_1, offset: 0, srvID: a5eea5a8-5112-46da-9f18-64274486c472, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073766165_25848, duration: 7281253
2015-08-06 14:11:10,119 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766165_25848, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2015-08-06 14:11:13,627 INFO  datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073766167_25850 src: /10.240.164.0:55473 dest: /10.240.164.0:50010
2015-08-06 14:11:13,965 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:55473, dest: /10.240.164.0:50010, bytes: 98, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_722958591_1, offset: 0, srvID: a5eea5a8-5112-46da-9f18-64274486c472, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073766167_25850, duration: 332393414
2015-08-06 14:11:13,965 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766167_25850, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2015-08-06 14:11:52,392 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown operation  src: /127.0.0.1:44757 dst: /127.0.0.1:50010
java.io.EOFException
        at java.io.DataInputStream.readShort(DataInputStream.java:315)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
        at java.lang.Thread.run(Thread.java:745)
2015-08-06 14:12:10,514 INFO  datanode.DataNode (BlockReceiver.java:receiveBlock(888)) - Exception for BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847
java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
        at java.lang.Thread.run(Thread.java:745)
2015-08-06 14:12:10,515 INFO  datanode.DataNode (BlockReceiver.java:run(1312)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847, type=LAST_IN_PIPELINE, downstreams=0:[]: Thread is interrupted.
2015-08-06 14:12:10,515 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2015-08-06 14:12:10,515 INFO  datanode.DataNode (DataXceiver.java:writeBlock(837)) - opWriteBlock BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847 received exception java.io.IOException: Premature EOF from inputStream
2015-08-06 14:12:10,515 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing WRITE_BLOCK operation  src: /10.240.187.182:41294 dst: /10.240.164.0:50010
java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
        at java.lang.Thread.run(Thread.java:745)
2015-08-06 14:12:10,519 INFO  datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847 src: /10.240.187.182:41319 dest: /10.240.164.0:50010
2015-08-06 14:12:10,520 INFO  impl.FsDatasetImpl (FsDatasetImpl.java:recoverRbw(1322)) - Recover RBW replica BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847
 2015-08-06 14:12:10,520 INFO  impl.FsDatasetImpl (FsDatasetImpl.java:recoverRbw(1333)) - Recovering ReplicaBeingWritten, blk_1073766164_25847, RBW
  getNumBytes()     = 545
  getBytesOnDisk()  = 545
  getVisibleLength()= 545
  getVolume()       = /hadoop/hdfs/data/current
  getBlockFile()    = /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/rbw/blk_1073766164
  bytesAcked=545
  bytesOnDisk=545
2015-08-06 14:12:52,419 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown operation  src: /127.0.0.1:44786 dst: /127.0.0.1:50010
java.io.EOFException
        at java.io.DataInputStream.readShort(DataInputStream.java:315)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
        at java.lang.Thread.run(Thread.java:745)
2015-08-06 14:12:56,277 INFO  datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073766168_25852 src: /10.240.2.235:56693 dest: /10.240.164.0:50010
2015-08-06 14:12:56,843 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.2.235:56693, dest: /10.240.164.0:50010, bytes: 98, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_241093651_1, offset: 0, srvID: a5eea5a8-5112-46da-9f18-64274486c472, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073766168_25852, duration: 564768014
2015-08-06 14:12:56,843 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766168_25852, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2015-08-06 14:13:10,579 INFO  datanode.DataNode (BlockReceiver.java:receiveBlock(888)) - Exception for BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25851
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.240.164.0:50010 remote=/10.240.187.182:41319]
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
        at java.lang.Thread.run(Thread.java:745)
2015-08-06 14:13:10,580 INFO  datanode.DataNode (BlockReceiver.java:run(1312)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25851, type=LAST_IN_PIPELINE, downstreams=0:[]: Thread is interrupted.
2015-08-06 14:13:10,580 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25851, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2015-08-06 14:13:10,580 INFO  datanode.DataNode (DataXceiver.java:writeBlock(837)) - opWriteBlock BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25851 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.240.164.0:50010 remote=/10.240.187.182:41319]
 2015-08-06 14:13:10,580 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing WRITE_BLOCK operation  src: /10.240.187.182:41319 dst: /10.240.164.0:50010
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.240.164.0:50010 remote=/10.240.187.182:41319]
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
        at java.lang.Thread.run(Thread.java:745)
2015-08-06 14:13:52,432 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown operation  src: /127.0.0.1:44818 dst: /127.0.0.1:50010
java.io.EOFException
        at java.io.DataInputStream.readShort(DataInputStream.java:315)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
        at java.lang.Thread.run(Thread.java:745)
2015-08-06 14:14:52,399 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown operation  src: /127.0.0.1:44848 dst: /127.0.0.1:50010
java.io.EOFException
        at java.io.DataInputStream.readShort(DataInputStream.java:315)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
        at java.lang.Thread.run(Thread.java:745)
2015-08-06 14:15:14,227 INFO  datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073766172_25856 src: /10.240.187.182:41397 dest: /10.240.164.0:50010
2015-08-06 14:15:14,240 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.187.182:41397, dest: /10.240.164.0:50010, bytes: 92789, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_774922977_1, offset: 0, srvID: a5eea5a8-5112-46da-9f18-64274486c472, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073766172_25856, duration: 12340123
  
  
 Thank you for helping :)
 Adri
  
  
 Desde: "Vladimir Rodionov" <vladrodionov@gmail.com>
 Enviado: jueves, 06 de agosto de 2015 20:07
Para: user@phoenix.apache.org, avila@datknosys.com
Asunto: Re: RegionServers shutdown randomly   
 What do DN and NN log say? Do you run any other workload on the same cluster? What is your cluster configuration?  Max memory per RS, DN and other collocated processes?  
 -Vlad

   On Thu, Aug 6, 2015 at 8:42 AM, Adrià Vilà <avila@datknosys.com> wrote:   Hello,
   
 HBase RegionServers fail once in a while:
  - it can be any regionserver, not always de same  - it can happen when all the cluster is idle (at least not executing any human launched task)   - it can happen at any time, not always the same
  
 The cluster versions:
  - Phoenix 4.4 (or 4.5)  - HBase 1.1.1  - Hadoop/HDFS 2.7.1  - Zookeeper 3.4.6     Some configs:
 -  ulimit -a
 core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 103227
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 103227
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
 - have increased default timeouts for: hbase rpc, zookeeper session, dks socket, regionserver lease and client scanner.
  
 Next you can find the logs for the master, the regionserver that failed first, another failed and the datanode log for master and worker.

  
 The timing was aproximately:
    14:05 start hbase
 14.11 w-0 down
 14.14 w-1 down
 14.15 stop hbase

  
  -------------
 hbase master log (m)
 -------------
 2015-08-06 14:11:13,640 ERROR [PriorityRpcServer.handler=19,queue=1,port=16000] master.MasterRpcServices: Region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 reported a fatal error:
 ABORTING region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable exception while closing region SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891., still finishing close
 Cause:
 java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
  
 --------------
 hbase regionserver log (w-0)
 --------------
 2015-08-06 14:11:13,611 INFO  [PriorityRpcServer.handler=0,queue=0,port=16020] regionserver.RSRpcServices: Close 888f017eb1c0557fbe7079b50626c891, moving to hdp-m.c.dks-hadoop.internal,16020,1438869954062
 2015-08-06 14:11:13,615 INFO  [StoreCloserThread-SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.-1] regionserver.HStore: Closed 0
 2015-08-06 14:11:13,616 FATAL [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.append-pool1-t1] wal.FSHLog: Could not append. Requesting close of wal
 java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
 2015-08-06 14:11:13,617 ERROR [sync.4] wal.FSHLog: Error syncing, request close of wal
 java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
 2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0] regionserver.HRegionServer: ABORTING region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable exception while closing region SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891., still finishing close
 java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
 2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, org.apache.hadoop.hbase.regionserver.LocalIndexSplitter, org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, org.apache.phoenix.coprocessor.ScanRegionObserver, org.apache.phoenix.hbase.index.Indexer, org.apache.phoenix.coprocessor.SequenceRegionObserver, org.apache.phoenix.coprocessor.MetaDataEndpointImpl]
 2015-08-06 14:11:13,627 INFO  [RS_CLOSE_REGION-hdp-w-0:16020-0] regionserver.HRegionServer: Dump of metrics as JSON on abort: {
   "beans" : [ {
     "name" : "java.lang:type=Memory",
     "modelerType" : "sun.management.MemoryImpl",
     "Verbose" : true,
     "HeapMemoryUsage" : {
       "committed" : 2104754176,
       "init" : 2147483648,
       "max" : 2104754176,
       "used" : 262288688
     },
     "ObjectPendingFinalizationCount" : 0,
     "NonHeapMemoryUsage" : {
       "committed" : 137035776,
       "init" : 136773632,
       "max" : 184549376,
       "used" : 49168288
     },
     "ObjectName" : "java.lang:type=Memory"
   } ],
   "beans" : [ {
     "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC",
     "modelerType" : "RegionServer,sub=IPC",
     "tag.Context" : "regionserver",
     "tag.Hostname" : "hdp-w-0"
   } ],
   "beans" : [ {
     "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication",
     "modelerType" : "RegionServer,sub=Replication",
     "tag.Context" : "regionserver",
     "tag.Hostname" : "hdp-w-0"
   } ],
   "beans" : [ {
     "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server",
     "modelerType" : "RegionServer,sub=Server",
     "tag.Context" : "regionserver",
     "tag.Hostname" : "hdp-w-0"
   } ]
 }
 2015-08-06 14:11:13,640 ERROR [sync.0] wal.FSHLog: Error syncing, request close of wal
 java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
 2015-08-06 14:11:13,640 WARN  [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] wal.FSHLog: Failed last sync but no outstanding unsync edits so falling through to close; java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
 2015-08-06 14:11:13,641 ERROR [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] wal.ProtobufLogWriter: Got IOException while writing trailer
 java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
 2015-08-06 14:11:13,641 WARN  [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] wal.FSHLog: Riding over failed WAL close of hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576, cause="All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK
 2015-08-06 14:11:13,642 INFO  [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] wal.FSHLog: Rolled WAL /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576 with entries=101, filesize=30.38 KB; new WAL /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617
 2015-08-06 14:11:13,643 INFO  [RS_CLOSE_REGION-hdp-w-0:16020-0] regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing region SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891., still finishing close
 2015-08-06 14:11:13,643 INFO  [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller] wal.FSHLog: Archiving hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576 to hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576
 2015-08-06 14:11:13,643 ERROR [RS_CLOSE_REGION-hdp-w-0:16020-0] executor.EventHandler: Caught throwable while processing event M_RS_CLOSE_REGION
 java.lang.RuntimeException: java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
         at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152)
         at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
         at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
         
 ------------
 hbase regionserver log (w-1)
 ------------
 2015-08-06 14:11:14,267 INFO  [main-EventThread] replication.ReplicationTrackerZKImpl: /hbase-unsecure/rs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 znode expired, triggering replicatorRemoved event
 2015-08-06 14:12:08,203 INFO  [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl: Atomically moving hdp-w-0.c.dks-hadoop.internal,16020,1438869946905's wals to my queue
 2015-08-06 14:12:56,252 INFO  [PriorityRpcServer.handler=5,queue=1,port=16020] regionserver.RSRpcServices: Close 918ed7c6568e7500fb434f4268c5bbc5, moving to hdp-m.c.dks-hadoop.internal,16020,1438869954062
 2015-08-06 14:12:56,260 INFO  [StoreCloserThread-SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.-1] regionserver.HStore: Closed 0
 2015-08-06 14:12:56,261 FATAL [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.append-pool1-t1] wal.FSHLog: Could not append. Requesting close of wal
 java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
 2015-08-06 14:12:56,261 ERROR [sync.3] wal.FSHLog: Error syncing, request close of wal
 java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
 2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0] regionserver.HRegionServer: ABORTING region server hdp-w-1.c.dks-hadoop.internal,16020,1438869946909: Unrecoverable exception while closing region SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5., still finishing close
 java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
 2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, org.apache.hadoop.hbase.regionserver.LocalIndexSplitter, org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint, org.apache.phoenix.coprocessor.ScanRegionObserver, org.apache.phoenix.hbase.index.Indexer, org.apache.phoenix.coprocessor.SequenceRegionObserver]
 2015-08-06 14:12:56,281 INFO  [RS_CLOSE_REGION-hdp-w-1:16020-0] regionserver.HRegionServer: Dump of metrics as JSON on abort: {
   "beans" : [ {
     "name" : "java.lang:type=Memory",
     "modelerType" : "sun.management.MemoryImpl",
     "ObjectPendingFinalizationCount" : 0,
     "NonHeapMemoryUsage" : {
       "committed" : 137166848,
       "init" : 136773632,
       "max" : 184549376,
       "used" : 48667528
     },
     "HeapMemoryUsage" : {
       "committed" : 2104754176,
       "init" : 2147483648,
       "max" : 2104754176,
       "used" : 270075472
     },
     "Verbose" : true,
     "ObjectName" : "java.lang:type=Memory"
   } ],
   "beans" : [ {
     "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC",
     "modelerType" : "RegionServer,sub=IPC",
     "tag.Context" : "regionserver",
     "tag.Hostname" : "hdp-w-1"
   } ],
   "beans" : [ {
     "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication",
     "modelerType" : "RegionServer,sub=Replication",
     "tag.Context" : "regionserver",
     "tag.Hostname" : "hdp-w-1"
   } ],
   "beans" : [ {
     "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server",
     "modelerType" : "RegionServer,sub=Server",
     "tag.Context" : "regionserver",
     "tag.Hostname" : "hdp-w-1"
   } ]
 }
 2015-08-06 14:12:56,284 ERROR [sync.4] wal.FSHLog: Error syncing, request close of wal
 java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
 2015-08-06 14:12:56,285 WARN  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] wal.FSHLog: Failed last sync but no outstanding unsync edits so falling through to close; java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
 2015-08-06 14:12:56,285 ERROR [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] wal.ProtobufLogWriter: Got IOException while writing trailer
 java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
 2015-08-06 14:12:56,285 WARN  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] wal.FSHLog: Riding over failed WAL close of hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359, cause="All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK
 2015-08-06 14:12:56,287 INFO  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] wal.FSHLog: Rolled WAL /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359 with entries=100, filesize=30.73 KB; new WAL /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438870376262
 2015-08-06 14:12:56,288 INFO  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller] wal.FSHLog: Archiving hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359 to hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359
 2015-08-06 14:12:56,315 INFO  [RS_CLOSE_REGION-hdp-w-1:16020-0] regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing region SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5., still finishing close
 2015-08-06 14:12:56,315 INFO  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020] regionserver.SplitLogWorker: Sending interrupt to stop the worker thread
 2015-08-06 14:12:56,315 ERROR [RS_CLOSE_REGION-hdp-w-1:16020-0] executor.EventHandler: Caught throwable while processing event M_RS_CLOSE_REGION
 java.lang.RuntimeException: java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
         at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152)
         at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
         at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are bad. Aborting...
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
  
 -------------
 m datanode log
 -------------
 2015-07-27 14:11:16,082 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073742677_1857, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
 2015-07-27 14:11:16,132 INFO  datanode.DataNode (DataXceiver.java:writeBlock(655)) - Receiving BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858 src: /10.240.200.196:56767 dest: /10.240.200.196:50010
 2015-07-27 14:11:16,155 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:56767, dest: /10.240.200.196:50010, bytes: 117761, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_177514816_1, offset: 0, srvID: 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid: BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858, duration: 6385289
 2015-07-27 14:11:16,155 INFO  datanode.DataNode (BlockReceiver.java:run(1348)) - PacketResponder: BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
 2015-07-27 14:11:16,267 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver error processing unknown operation  src: /127.0.0.1:60513 dst: /127.0.0.1:50010
 java.io.EOFException
         at java.io.DataInputStream.readShort(DataInputStream.java:315)
         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
         at java.lang.Thread.run(Thread.java:745)
 2015-07-27 14:11:16,405 INFO  datanode.DataNode (DataNode.java:transferBlock(1943)) - DatanodeRegistration(10.240.200.196:50010, datanodeUuid=329bbe62-bcea-4a6d-8c97-e800631deb81, infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-56;cid=CID-1247f294-77a9-4605-b6d3-4c1398bb5db0;nsid=2032226938;c=0) Starting thread to transfer BP-369072949-10.240.200.196-1437998325049:blk_1073742649_1829 to 10.240.2.235:50010 10.240.164.0:50010
  
 -------------
 w-0 datanode log
 -------------
 2015-07-27 14:11:25,019 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown operation  src: /127.0.0.1:47993 dst: /127.0.0.1:50010
 java.io.EOFException
         at java.io.DataInputStream.readShort(DataInputStream.java:315)
         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
         at java.lang.Thread.run(Thread.java:745)
 2015-07-27 14:11:25,077 INFO  DataNode.clienttrace (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073742631, srvID: a5eea5a8-5112-46da-9f18-64274486c472, success: true

  
 -----------------------------
 Thank you in advance,
   
 Adrià 

   



Mime
View raw message