phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandeep Nemuri <nhsande...@gmail.com>
Subject Re: RegionServers shutdown randomly
Date Sun, 09 Aug 2015 07:53:53 GMT
Is HDFS operating normally ?
ᐧ

On Sun, Aug 9, 2015 at 9:12 AM, anil gupta <anilgupta84@gmail.com> wrote:

> 2015-08-06 14:11:13,640 ERROR
> [PriorityRpcServer.handler=19,queue=1,port=16000] master.MasterRpcServices:
> Region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 reported a
> fatal error:
> ABORTING region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905:
> Unrecoverable exception while closing region
> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
> still finishing close
> Cause:
> java.io.IOException: All datanodes DatanodeInfoWithStorage[
> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
> bad. Aborting...
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>
> Do you have some bad disk in your cluster? Above error looks like some
> HDFS problem. How stable is your hdfs?
>
> On Fri, Aug 7, 2015 at 2:06 AM, Adrià Vilà <avila@datknosys.com> wrote:
>
>> I do other workloads but not while this error happened because I was
>> testing it on purpouse. I've noticed that the RegionServers do fail
>> randomly.
>>
>> NameNode heap: 4GB
>> DataNode heap: 1GB
>> NameNode threads: 100
>>
>> HDFS-site:
>>     <property>
>>       <name>dfs.blocksize</name>
>>       <value>134217728</value>
>>     </property>
>>    <property>
>>       <name>dfs.datanode.du.reserved</name>
>>       <value>1073741824</value>
>>     </property>
>>
>>
>> HBase-site:
>>     <property>
>>       <name>hbase.client.keyvalue.maxsize</name>
>>       <value>1048576</value>
>>     </property>
>>     <property>
>>       <name>hbase.hregion.max.filesize</name>
>>       <value>10737418240</value>
>>     </property>
>>     <property>
>>       <name>hbase.hregion.memstore.block.multiplier</name>
>>       <value>4</value>
>>     </property>
>>     <property>
>>       <name>hbase.hregion.memstore.flush.size</name>
>>       <value>134217728</value>
>>     </property>
>>     <property>
>>       <name>hbase.regionserver.wal.codec</name>
>>
>> <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
>>     </property>
>>
>> Next I attach as many logs as I could find!
>>
>> ---------------
>> NameNode log:
>> ---------------
>> 2015-08-06 14:11:10,079 INFO  hdfs.StateChange
>> (FSNamesystem.java:saveAllocatedBlock(3573)) - BLOCK* allocate
>> blk_1073766164_25847{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>> primaryNodeIndex=-1,
>> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
>> ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW]]}
>> for
>> /apps/hbase/data/WALs/hdp-w-2.c.dks-hadoop.internal,16020,1438869941783/hdp-w-2.c.dks-hadoop.internal%2C16020%2C1438869941783..meta.1438870270071.meta
>> 2015-08-06 14:11:10,095 INFO  hdfs.StateChange
>> (FSNamesystem.java:fsync(3975)) - BLOCK* fsync:
>> /apps/hbase/data/WALs/hdp-w-2.c.dks-hadoop.internal,16020,1438869941783/hdp-w-2.c.dks-hadoop.internal%2C16020%2C1438869941783..meta.1438870270071.meta
>> for DFSClient_NONMAPREDUCE_774922977_1
>> 2015-08-06 14:11:10,104 INFO  hdfs.StateChange
>> (FSNamesystem.java:saveAllocatedBlock(3573)) - BLOCK* allocate
>> blk_1073766165_25848{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>> primaryNodeIndex=-1,
>> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
>> ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW]]}
>> for
>> /apps/hbase/data/data/hbase/meta/1588230740/.tmp/d8cb7421fb78489c97d7aa7767449acc
>> 2015-08-06 14:11:10,120 INFO  BlockStateChange
>> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
>> blockMap updated: 10.240.164.0:50010 is added to
>> blk_1073766165_25848{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>> primaryNodeIndex=-1,
>> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
>> ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW]]}
>> size 0
>> 2015-08-06 14:11:10,120 INFO  BlockStateChange
>> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
>> blockMap updated: 10.240.187.182:50010 is added to
>> blk_1073766165_25848{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>> primaryNodeIndex=-1,
>> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
>> ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW]]}
>> size 0
>> 2015-08-06 14:11:10,122 INFO  hdfs.StateChange
>> (FSNamesystem.java:completeFile(3493)) - DIR* completeFile:
>> /apps/hbase/data/data/hbase/meta/1588230740/.tmp/d8cb7421fb78489c97d7aa7767449acc
>> is closed by DFSClient_NONMAPREDUCE_774922977_1
>> 2015-08-06 14:11:10,226 INFO  hdfs.StateChange
>> (FSNamesystem.java:saveAllocatedBlock(3573)) - BLOCK* allocate
>> blk_1073766166_25849{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>> primaryNodeIndex=-1,
>> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
>> ReplicaUC[[DISK]DS-2fa04842-547c-417c-8aab-91eb4df999de:NORMAL:10.240.200.196:50010|RBW]]}
>> for
>> /apps/hbase/data/data/hbase/meta/1588230740/.tmp/e39d202af4494e5e9e7dd4b75a61a0a0
>> 2015-08-06 14:11:10,421 INFO  BlockStateChange
>> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
>> blockMap updated: 10.240.200.196:50010 is added to
>> blk_1073766166_25849{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>> primaryNodeIndex=-1,
>> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
>> ReplicaUC[[DISK]DS-2fa04842-547c-417c-8aab-91eb4df999de:NORMAL:10.240.200.196:50010|RBW]]}
>> size 0
>> 2015-08-06 14:11:10,421 INFO  BlockStateChange
>> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
>> blockMap updated: 10.240.187.182:50010 is added to
>> blk_1073766166_25849{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>> primaryNodeIndex=-1,
>> replicas=[ReplicaUC[[DISK]DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2:NORMAL:10.240.187.182:50010|RBW],
>> ReplicaUC[[DISK]DS-2fa04842-547c-417c-8aab-91eb4df999de:NORMAL:10.240.200.196:50010|RBW]]}
>> size 0
>> 2015-08-06 14:11:10,423 INFO  hdfs.StateChange
>> (FSNamesystem.java:completeFile(3493)) - DIR* completeFile:
>> /apps/hbase/data/data/hbase/meta/1588230740/.tmp/e39d202af4494e5e9e7dd4b75a61a0a0
>> is closed by DFSClient_NONMAPREDUCE_774922977_1
>> 2015-08-06 14:11:13,623 INFO  hdfs.StateChange
>> (FSNamesystem.java:saveAllocatedBlock(3573)) - BLOCK* allocate
>> blk_1073766167_25850{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>> primaryNodeIndex=-1,
>> replicas=[ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW],
>> ReplicaUC[[DISK]DS-605c96e7-f5dc-4a63-8f34-ea5865077c4b:NORMAL:10.240.2.235:50010|RBW]]}
>> for
>> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617
>> 2015-08-06 14:11:13,638 INFO  hdfs.StateChange
>> (FSNamesystem.java:fsync(3975)) - BLOCK* fsync:
>> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617
>> for DFSClient_NONMAPREDUCE_722958591_1
>> 2015-08-06 14:11:13,965 INFO  BlockStateChange
>> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
>> blockMap updated: 10.240.2.235:50010 is added to
>> blk_1073766167_25850{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>> primaryNodeIndex=-1,
>> replicas=[ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW],
>> ReplicaUC[[DISK]DS-605c96e7-f5dc-4a63-8f34-ea5865077c4b:NORMAL:10.240.2.235:50010|RBW]]}
>> size 90
>> 2015-08-06 14:11:13,966 INFO  BlockStateChange
>> (BlockManager.java:logAddStoredBlock(2629)) - BLOCK* addStoredBlock:
>> blockMap updated: 10.240.164.0:50010 is added to
>> blk_1073766167_25850{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>> primaryNodeIndex=-1,
>> replicas=[ReplicaUC[[DISK]DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01:NORMAL:10.240.164.0:50010|RBW],
>> ReplicaUC[[DISK]DS-605c96e7-f5dc-4a63-8f34-ea5865077c4b:NORMAL:10.240.2.235:50010|RBW]]}
>> size 90
>> 2015-08-06 14:11:13,968 INFO  hdfs.StateChange
>> (FSNamesystem.java:completeFile(3493)) - DIR* completeFile:
>> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617
>> is closed by DFSClient_NONMAPREDUCE_722958591_1
>>
>> ---------------
>> HBase master DataNode log:
>> ---------------
>> 2015-08-06 14:09:17,187 INFO  DataNode.clienttrace
>> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
>> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073749044, srvID:
>> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
>> 2015-08-06 14:09:17,273 INFO  DataNode.clienttrace
>> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
>> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073749049, srvID:
>> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
>> 2015-08-06 14:09:17,325 INFO  DataNode.clienttrace
>> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
>> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073749051, srvID:
>> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
>> 2015-08-06 14:09:46,810 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(655)) - Receiving
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842 src: /
>> 10.240.200.196:34789 dest: /10.240.200.196:50010
>> 2015-08-06 14:09:46,843 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:34789,
>> dest: /10.240.200.196:50010, bytes: 6127, op: HDFS_WRITE, cliID:
>> DFSClient_attempt_14388456554803_0001_r_000000_0_1296198547_30, offset: 0,
>> srvID: 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842, duration:
>> 29150527
>> 2015-08-06 14:09:46,843 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842,
>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>> 2015-08-06 14:09:47,193 INFO  DataNode.clienttrace
>> (DataXceiver.java:requestShortCircuitShm(468)) - cliID:
>> DFSClient_NONMAPREDUCE_22636141_1, src: 127.0.0.1, dest: 127.0.0.1, op:
>> REQUEST_SHORT_CIRCUIT_SHM, shmId: a70bde6b6e67f4a5394e209320b451f3, srvID:
>> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
>> 2015-08-06 14:09:47,211 INFO  DataNode.clienttrace
>> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
>> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073766159, srvID:
>> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
>> 2015-08-06 14:09:52,887 INFO  datanode.ShortCircuitRegistry
>> (ShortCircuitRegistry.java:processBlockInvalidation(251)) - Block
>> 1073766159_BP-369072949-10.240.200.196-1437998325049 has been invalidated.
>> Marking short-circuit slots as invalid: Slot(slotIdx=0,
>> shm=RegisteredShm(a70bde6b6e67f4a5394e209320b451f3))
>> 2015-08-06 14:09:52,887 INFO  impl.FsDatasetAsyncDiskService
>> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
>> blk_1073766159_25842 file
>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766159
>> for deletion
>> 2015-08-06 14:09:52,887 INFO  impl.FsDatasetAsyncDiskService
>> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
>> BP-369072949-10.240.200.196-1437998325049 blk_1073766159_25842 file
>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766159
>> 2015-08-06 14:09:53,562 ERROR datanode.DataNode
>> (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver
>> error processing unknown operation  src: /127.0.0.1:35735 dst: /
>> 127.0.0.1:50010
>> java.io.EOFException
>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>         at java.lang.Thread.run(Thread.java:745)
>> 2015-08-06 14:10:53,649 ERROR datanode.DataNode
>> (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver
>> error processing unknown operation  src: /127.0.0.1:35826 dst: /
>> 127.0.0.1:50010
>> java.io.EOFException
>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>         at java.lang.Thread.run(Thread.java:745)
>> 2015-08-06 14:11:02,088 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:46835,
>> dest: /10.240.200.196:50010, bytes: 434, op: HDFS_WRITE, cliID:
>> DFSClient_NONMAPREDUCE_-522245611_1, offset: 0, srvID:
>> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766157_25840, duration:
>> 131662299497
>> 2015-08-06 14:11:02,088 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766157_25840,
>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>> 2015-08-06 14:11:03,018 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(655)) - Receiving
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843 src: /
>> 10.240.164.0:47039 dest: /10.240.200.196:50010
>> 2015-08-06 14:11:03,042 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:47039,
>> dest: /10.240.200.196:50010, bytes: 30845, op: HDFS_WRITE, cliID:
>> DFSClient_NONMAPREDUCE_369501114_375, offset: 0, srvID:
>> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843, duration:
>> 13150343
>> 2015-08-06 14:11:03,042 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843,
>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>> 2015-08-06 14:11:03,834 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(655)) - Receiving
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846 src: /
>> 10.240.200.196:34941 dest: /10.240.200.196:50010
>> 2015-08-06 14:11:03,917 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:34941,
>> dest: /10.240.200.196:50010, bytes: 47461, op: HDFS_WRITE, cliID:
>> DFSClient_NONMAPREDUCE_-1443736010_402, offset: 0, srvID:
>> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846, duration:
>> 76492287
>> 2015-08-06 14:11:03,917 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846,
>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>> 2015-08-06 14:11:04,887 INFO  datanode.ShortCircuitRegistry
>> (ShortCircuitRegistry.java:processBlockInvalidation(251)) - Block
>> 1073766154_BP-369072949-10.240.200.196-1437998325049 has been invalidated.
>> Marking short-circuit slots as invalid: Slot(slotIdx=2,
>> shm=RegisteredShm(15d8ae8eb51e4590b7e51f8c723d60eb))
>> 2015-08-06 14:11:04,887 INFO  impl.FsDatasetAsyncDiskService
>> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
>> blk_1073766154_25837 file
>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766154
>> for deletion
>> 2015-08-06 14:11:04,894 INFO  datanode.ShortCircuitRegistry
>> (ShortCircuitRegistry.java:processBlockInvalidation(251)) - Block
>> 1073766155_BP-369072949-10.240.200.196-1437998325049 has been invalidated.
>> Marking short-circuit slots as invalid: Slot(slotIdx=1,
>> shm=RegisteredShm(15d8ae8eb51e4590b7e51f8c723d60eb))
>> 2015-08-06 14:11:04,894 INFO  impl.FsDatasetAsyncDiskService
>> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
>> blk_1073766155_25838 file
>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766155
>> for deletion
>> 2015-08-06 14:11:04,894 INFO  impl.FsDatasetAsyncDiskService
>> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
>> blk_1073766156_25839 file
>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766156
>> for deletion
>> 2015-08-06 14:11:04,895 INFO  impl.FsDatasetAsyncDiskService
>> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
>> BP-369072949-10.240.200.196-1437998325049 blk_1073766154_25837 file
>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766154
>> 2015-08-06 14:11:07,887 INFO  impl.FsDatasetAsyncDiskService
>> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
>> blk_1073766157_25840 file
>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766157
>> for deletion
>> 2015-08-06 14:11:07,889 INFO  impl.FsDatasetAsyncDiskService
>> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
>> BP-369072949-10.240.200.196-1437998325049 blk_1073766157_25840 file
>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766157
>> 2015-08-06 14:11:10,241 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(655)) - Receiving
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766166_25849 src: /
>> 10.240.187.182:60725 dest: /10.240.200.196:50010
>> 2015-08-06 14:11:10,419 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.187.182:60725,
>> dest: /10.240.200.196:50010, bytes: 1212340, op: HDFS_WRITE, cliID:
>> DFSClient_NONMAPREDUCE_774922977_1, offset: 0, srvID:
>> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766166_25849, duration:
>> 172298145
>> 2015-08-06 14:11:10,419 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766166_25849,
>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>> 2015-08-06 14:11:53,594 ERROR datanode.DataNode
>> (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver
>> error processing unknown operation  src: /127.0.0.1:35992 dst: /
>> 127.0.0.1:50010
>> java.io.EOFException
>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>         at java.lang.Thread.run(Thread.java:745)
>> 2015-08-06 14:11:57,109 INFO  DataNode.clienttrace
>> (DataXceiver.java:releaseShortCircuitFds(407)) - src: 127.0.0.1, dest:
>> 127.0.0.1, op: RELEASE_SHORT_CIRCUIT_FDS, shmId:
>> 0ee5b1d24e1d07dc72f681a4fbc06040, slotIdx: 0, srvID:
>> 329bbe62-bcea-4a6d-8c97-e800631deb81, success: true
>>
>> ---------------
>> HBase worker DataNode log:
>> ---------------
>> 2015-08-06 14:09:41,378 INFO  datanode.VolumeScanner
>> (VolumeScanner.java:markSuspectBlock(665)) -
>> VolumeScanner(/hadoop/hdfs/data, DS-1c5fb71e-2e69-4b8d-8957-96db7b81ae01):
>> Not scheduling suspect block
>> BP-369072949-10.240.200.196-1437998325049:blk_1073749047_8295 for
>> rescanning, because we rescanned it recently.
>> 2015-08-06 14:09:46,812 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(655)) - Receiving
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842 src: /
>> 10.240.200.196:48711 dest: /10.240.164.0:50010
>> 2015-08-06 14:09:46,842 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:48711,
>> dest: /10.240.164.0:50010, bytes: 6127, op: HDFS_WRITE, cliID:
>> DFSClient_attempt_14388456554803_0001_r_000000_0_1296198547_30, offset: 0,
>> srvID: a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842, duration:
>> 28706244
>> 2015-08-06 14:09:46,842 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766159_25842,
>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>> 2015-08-06 14:09:47,033 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:55267,
>> dest: /10.240.164.0:50010, bytes: 426792, op: HDFS_WRITE, cliID:
>> DFSClient_NONMAPREDUCE_-522245611_1, offset: 0, srvID:
>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766158_25841, duration:
>> 56549200180
>> 2015-08-06 14:09:47,033 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766158_25841,
>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>> 2015-08-06 14:09:50,052 INFO  impl.FsDatasetAsyncDiskService
>> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
>> blk_1073766159_25842 file
>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766159
>> for deletion
>> 2015-08-06 14:09:50,053 INFO  impl.FsDatasetAsyncDiskService
>> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
>> BP-369072949-10.240.200.196-1437998325049 blk_1073766159_25842 file
>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766159
>> 2015-08-06 14:09:52,456 ERROR datanode.DataNode
>> (DataXceiver.java:run(278)) -
>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
>> operation  src: /127.0.0.1:44663 dst: /127.0.0.1:50010
>> java.io.EOFException
>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>         at java.lang.Thread.run(Thread.java:745)
>> 2015-08-06 14:10:52,401 ERROR datanode.DataNode
>> (DataXceiver.java:run(278)) -
>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
>> operation  src: /127.0.0.1:44709 dst: /127.0.0.1:50010
>> java.io.EOFException
>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>         at java.lang.Thread.run(Thread.java:745)
>> 2015-08-06 14:11:02,052 INFO  impl.FsDatasetAsyncDiskService
>> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
>> blk_1073766158_25841 file
>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766158
>> for deletion
>> 2015-08-06 14:11:02,053 INFO  impl.FsDatasetAsyncDiskService
>> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
>> BP-369072949-10.240.200.196-1437998325049 blk_1073766158_25841 file
>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766158
>> 2015-08-06 14:11:02,089 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:55265,
>> dest: /10.240.164.0:50010, bytes: 434, op: HDFS_WRITE, cliID:
>> DFSClient_NONMAPREDUCE_-522245611_1, offset: 0, srvID:
>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766157_25840, duration:
>> 131663017075
>> 2015-08-06 14:11:02,089 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766157_25840,
>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>> 2015-08-06 14:11:02,992 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(655)) - Receiving
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843 src: /
>> 10.240.164.0:55469 dest: /10.240.164.0:50010
>> 2015-08-06 14:11:03,043 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:55469,
>> dest: /10.240.164.0:50010, bytes: 30845, op: HDFS_WRITE, cliID:
>> DFSClient_NONMAPREDUCE_369501114_375, offset: 0, srvID:
>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843, duration:
>> 23219007
>> 2015-08-06 14:11:03,043 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766160_25843,
>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>> 2015-08-06 14:11:03,288 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(655)) - Receiving
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766161_25844 src: /
>> 10.240.2.235:56643 dest: /10.240.164.0:50010
>> 2015-08-06 14:11:03,319 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.2.235:56643,
>> dest: /10.240.164.0:50010, bytes: 9446, op: HDFS_WRITE, cliID:
>> DFSClient_NONMAPREDUCE_920026358_372, offset: 0, srvID:
>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766161_25844, duration:
>> 29200871
>> 2015-08-06 14:11:03,319 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766161_25844,
>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>> 2015-08-06 14:11:03,475 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(655)) - Receiving
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766162_25845 src: /
>> 10.240.187.182:41291 dest: /10.240.164.0:50010
>> 2015-08-06 14:11:03,501 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.187.182:41291,
>> dest: /10.240.164.0:50010, bytes: 34213, op: HDFS_WRITE, cliID:
>> DFSClient_NONMAPREDUCE_-643601862_395, offset: 0, srvID:
>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766162_25845, duration:
>> 24992990
>> 2015-08-06 14:11:03,501 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766162_25845,
>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>> 2015-08-06 14:11:03,837 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(655)) - Receiving
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846 src: /
>> 10.240.200.196:48863 dest: /10.240.164.0:50010
>> 2015-08-06 14:11:03,916 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:48863,
>> dest: /10.240.164.0:50010, bytes: 47461, op: HDFS_WRITE, cliID:
>> DFSClient_NONMAPREDUCE_-1443736010_402, offset: 0, srvID:
>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846, duration:
>> 78295982
>> 2015-08-06 14:11:03,916 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766163_25846,
>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>> 2015-08-06 14:11:05,052 INFO  impl.FsDatasetAsyncDiskService
>> (FsDatasetAsyncDiskService.java:deleteAsync(217)) - Scheduling
>> blk_1073766157_25840 file
>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766157
>> for deletion
>> 2015-08-06 14:11:05,053 INFO  impl.FsDatasetAsyncDiskService
>> (FsDatasetAsyncDiskService.java:run(295)) - Deleted
>> BP-369072949-10.240.200.196-1437998325049 blk_1073766157_25840 file
>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/finalized/subdir0/subdir95/blk_1073766157
>> 2015-08-06 14:11:10,083 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(655)) - Receiving
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847 src: /
>> 10.240.187.182:41294 dest: /10.240.164.0:50010
>> 2015-08-06 14:11:10,111 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(655)) - Receiving
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766165_25848 src: /
>> 10.240.187.182:41296 dest: /10.240.164.0:50010
>> 2015-08-06 14:11:10,119 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.187.182:41296,
>> dest: /10.240.164.0:50010, bytes: 122645, op: HDFS_WRITE, cliID:
>> DFSClient_NONMAPREDUCE_774922977_1, offset: 0, srvID:
>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766165_25848, duration:
>> 7281253
>> 2015-08-06 14:11:10,119 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766165_25848,
>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>> 2015-08-06 14:11:13,627 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(655)) - Receiving
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766167_25850 src: /
>> 10.240.164.0:55473 dest: /10.240.164.0:50010
>> 2015-08-06 14:11:13,965 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.164.0:55473,
>> dest: /10.240.164.0:50010, bytes: 98, op: HDFS_WRITE, cliID:
>> DFSClient_NONMAPREDUCE_722958591_1, offset: 0, srvID:
>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766167_25850, duration:
>> 332393414
>> 2015-08-06 14:11:13,965 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766167_25850,
>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>> 2015-08-06 14:11:52,392 ERROR datanode.DataNode
>> (DataXceiver.java:run(278)) -
>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
>> operation  src: /127.0.0.1:44757 dst: /127.0.0.1:50010
>> java.io.EOFException
>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>         at java.lang.Thread.run(Thread.java:745)
>> 2015-08-06 14:12:10,514 INFO  datanode.DataNode
>> (BlockReceiver.java:receiveBlock(888)) - Exception for
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847
>> java.io.IOException: Premature EOF from inputStream
>>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
>>         at java.lang.Thread.run(Thread.java:745)
>> 2015-08-06 14:12:10,515 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1312)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847,
>> type=LAST_IN_PIPELINE, downstreams=0:[]: Thread is interrupted.
>> 2015-08-06 14:12:10,515 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847,
>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>> 2015-08-06 14:12:10,515 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(837)) - opWriteBlock
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847 received
>> exception java.io.IOException: Premature EOF from inputStream
>> 2015-08-06 14:12:10,515 ERROR datanode.DataNode
>> (DataXceiver.java:run(278)) -
>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing
>> WRITE_BLOCK operation  src: /10.240.187.182:41294 dst: /
>> 10.240.164.0:50010
>> java.io.IOException: Premature EOF from inputStream
>>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
>>         at java.lang.Thread.run(Thread.java:745)
>> 2015-08-06 14:12:10,519 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(655)) - Receiving
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847 src: /
>> 10.240.187.182:41319 dest: /10.240.164.0:50010
>> 2015-08-06 14:12:10,520 INFO  impl.FsDatasetImpl
>> (FsDatasetImpl.java:recoverRbw(1322)) - Recover RBW replica
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25847
>> 2015-08-06 14:12:10,520 INFO  impl.FsDatasetImpl
>> (FsDatasetImpl.java:recoverRbw(1333)) - Recovering ReplicaBeingWritten,
>> blk_1073766164_25847, RBW
>>   getNumBytes()     = 545
>>   getBytesOnDisk()  = 545
>>   getVisibleLength()= 545
>>   getVolume()       = /hadoop/hdfs/data/current
>>   getBlockFile()    =
>> /hadoop/hdfs/data/current/BP-369072949-10.240.200.196-1437998325049/current/rbw/blk_1073766164
>>   bytesAcked=545
>>   bytesOnDisk=545
>> 2015-08-06 14:12:52,419 ERROR datanode.DataNode
>> (DataXceiver.java:run(278)) -
>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
>> operation  src: /127.0.0.1:44786 dst: /127.0.0.1:50010
>> java.io.EOFException
>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>         at java.lang.Thread.run(Thread.java:745)
>> 2015-08-06 14:12:56,277 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(655)) - Receiving
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766168_25852 src: /
>> 10.240.2.235:56693 dest: /10.240.164.0:50010
>> 2015-08-06 14:12:56,843 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.2.235:56693,
>> dest: /10.240.164.0:50010, bytes: 98, op: HDFS_WRITE, cliID:
>> DFSClient_NONMAPREDUCE_241093651_1, offset: 0, srvID:
>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766168_25852, duration:
>> 564768014
>> 2015-08-06 14:12:56,843 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766168_25852,
>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>> 2015-08-06 14:13:10,579 INFO  datanode.DataNode
>> (BlockReceiver.java:receiveBlock(888)) - Exception for
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25851
>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>> channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/10.240.164.0:50010
>> remote=/10.240.187.182:41319]
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>>         at java.io.DataInputStream.read(DataInputStream.java:149)
>>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
>>         at java.lang.Thread.run(Thread.java:745)
>> 2015-08-06 14:13:10,580 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1312)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25851,
>> type=LAST_IN_PIPELINE, downstreams=0:[]: Thread is interrupted.
>> 2015-08-06 14:13:10,580 INFO  datanode.DataNode
>> (BlockReceiver.java:run(1348)) - PacketResponder:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25851,
>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>> 2015-08-06 14:13:10,580 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(837)) - opWriteBlock
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766164_25851 received
>> exception java.net.SocketTimeoutException: 60000 millis timeout while
>> waiting for channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/10.240.164.0:50010
>> remote=/10.240.187.182:41319]
>> 2015-08-06 14:13:10,580 ERROR datanode.DataNode
>> (DataXceiver.java:run(278)) -
>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing
>> WRITE_BLOCK operation  src: /10.240.187.182:41319 dst: /
>> 10.240.164.0:50010
>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>> channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/10.240.164.0:50010
>> remote=/10.240.187.182:41319]
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>>         at java.io.DataInputStream.read(DataInputStream.java:149)
>>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
>>         at java.lang.Thread.run(Thread.java:745)
>> 2015-08-06 14:13:52,432 ERROR datanode.DataNode
>> (DataXceiver.java:run(278)) -
>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
>> operation  src: /127.0.0.1:44818 dst: /127.0.0.1:50010
>> java.io.EOFException
>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>         at java.lang.Thread.run(Thread.java:745)
>> 2015-08-06 14:14:52,399 ERROR datanode.DataNode
>> (DataXceiver.java:run(278)) -
>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
>> operation  src: /127.0.0.1:44848 dst: /127.0.0.1:50010
>> java.io.EOFException
>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>         at java.lang.Thread.run(Thread.java:745)
>> 2015-08-06 14:15:14,227 INFO  datanode.DataNode
>> (DataXceiver.java:writeBlock(655)) - Receiving
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766172_25856 src: /
>> 10.240.187.182:41397 dest: /10.240.164.0:50010
>> 2015-08-06 14:15:14,240 INFO  DataNode.clienttrace
>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.187.182:41397,
>> dest: /10.240.164.0:50010, bytes: 92789, op: HDFS_WRITE, cliID:
>> DFSClient_NONMAPREDUCE_774922977_1, offset: 0, srvID:
>> a5eea5a8-5112-46da-9f18-64274486c472, blockid:
>> BP-369072949-10.240.200.196-1437998325049:blk_1073766172_25856, duration:
>> 12340123
>>
>>
>> Thank you for helping :)
>> Adri
>>
>>
>> *Desde*: "Vladimir Rodionov" <vladrodionov@gmail.com>
>> *Enviado*: jueves, 06 de agosto de 2015 20:07
>> *Para*: user@phoenix.apache.org, avila@datknosys.com
>> *Asunto*: Re: RegionServers shutdown randomly
>>
>> What do DN and NN log say? Do you run any other workload on the same
>> cluster? What is your cluster configuration?
>> Max memory per RS, DN and other collocated processes?
>>
>> -Vlad
>>
>> On Thu, Aug 6, 2015 at 8:42 AM, Adrià Vilà <avila@datknosys.com> wrote:
>>>
>>> Hello,
>>>
>>> HBase RegionServers fail once in a while:
>>>
>>> - it can be any regionserver, not always de same
>>>
>>> - it can happen when all the cluster is idle (at least not executing any
>>> human launched task)
>>>
>>> - it can happen at any time, not always the same
>>>
>>>
>>> The cluster versions:
>>>
>>> - Phoenix 4.4 (or 4.5)
>>>
>>> - HBase 1.1.1
>>>
>>> - Hadoop/HDFS 2.7.1
>>>
>>> - Zookeeper 3.4.6
>>>
>>>
>>>
>>> Some configs:
>>> -  ulimit -a
>>> core file size          (blocks, -c) 0
>>> data seg size           (kbytes, -d) unlimited
>>> scheduling priority             (-e) 0
>>> file size               (blocks, -f) unlimited
>>> pending signals                 (-i) 103227
>>> max locked memory       (kbytes, -l) 64
>>> max memory size         (kbytes, -m) unlimited
>>> open files                      (-n) 1024
>>> pipe size            (512 bytes, -p) 8
>>> POSIX message queues     (bytes, -q) 819200
>>> real-time priority              (-r) 0
>>> stack size              (kbytes, -s) 10240
>>> cpu time               (seconds, -t) unlimited
>>> max user processes              (-u) 103227
>>> virtual memory          (kbytes, -v) unlimited
>>> file locks                      (-x) unlimited
>>> - have increased default timeouts for: hbase rpc, zookeeper session, dks
>>> socket, regionserver lease and client scanner.
>>>
>>> Next you can find the logs for the master, the regionserver that failed
>>> first, another failed and the datanode log for master and worker.
>>>
>>> The timing was aproximately:
>>>
>>> 14:05 start hbase
>>> 14.11 w-0 down
>>> 14.14 w-1 down
>>> 14.15 stop hbase
>>>
>>>
>>> -------------
>>> hbase master log (m)
>>> -------------
>>> 2015-08-06 14:11:13,640 ERROR
>>> [PriorityRpcServer.handler=19,queue=1,port=16000] master.MasterRpcServices:
>>> Region server hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 reported a
>>> fatal error:
>>> ABORTING region server
>>> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable exception
>>> while closing region
>>> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
>>> still finishing close
>>> Cause:
>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>
>>> --------------
>>> hbase regionserver log (w-0)
>>> --------------
>>> 2015-08-06 14:11:13,611 INFO
>>>  [PriorityRpcServer.handler=0,queue=0,port=16020]
>>> regionserver.RSRpcServices: Close 888f017eb1c0557fbe7079b50626c891, moving
>>> to hdp-m.c.dks-hadoop.internal,16020,1438869954062
>>> 2015-08-06 14:11:13,615 INFO
>>>  [StoreCloserThread-SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.-1]
>>> regionserver.HStore: Closed 0
>>> 2015-08-06 14:11:13,616 FATAL
>>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.append-pool1-t1]
>>> wal.FSHLog: Could not append. Requesting close of wal
>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>> 2015-08-06 14:11:13,617 ERROR [sync.4] wal.FSHLog: Error syncing,
>>> request close of wal
>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>> 2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0]
>>> regionserver.HRegionServer: ABORTING region server
>>> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905: Unrecoverable exception
>>> while closing region
>>> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
>>> still finishing close
>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>> 2015-08-06 14:11:13,617 FATAL [RS_CLOSE_REGION-hdp-w-0:16020-0]
>>> regionserver.HRegionServer: RegionServer abort: loaded coprocessors are:
>>> [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl,
>>> org.apache.hadoop.hbase.regionserver.LocalIndexSplitter,
>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver,
>>> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver,
>>> org.apache.phoenix.coprocessor.ScanRegionObserver,
>>> org.apache.phoenix.hbase.index.Indexer,
>>> org.apache.phoenix.coprocessor.SequenceRegionObserver,
>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl]
>>> 2015-08-06 14:11:13,627 INFO  [RS_CLOSE_REGION-hdp-w-0:16020-0]
>>> regionserver.HRegionServer: Dump of metrics as JSON on abort: {
>>>   "beans" : [ {
>>>     "name" : "java.lang:type=Memory",
>>>     "modelerType" : "sun.management.MemoryImpl",
>>>     "Verbose" : true,
>>>     "HeapMemoryUsage" : {
>>>       "committed" : 2104754176,
>>>       "init" : 2147483648,
>>>       "max" : 2104754176,
>>>       "used" : 262288688
>>>     },
>>>     "ObjectPendingFinalizationCount" : 0,
>>>     "NonHeapMemoryUsage" : {
>>>       "committed" : 137035776,
>>>       "init" : 136773632,
>>>       "max" : 184549376,
>>>       "used" : 49168288
>>>     },
>>>     "ObjectName" : "java.lang:type=Memory"
>>>   } ],
>>>   "beans" : [ {
>>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC",
>>>     "modelerType" : "RegionServer,sub=IPC",
>>>     "tag.Context" : "regionserver",
>>>     "tag.Hostname" : "hdp-w-0"
>>>   } ],
>>>   "beans" : [ {
>>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication",
>>>     "modelerType" : "RegionServer,sub=Replication",
>>>     "tag.Context" : "regionserver",
>>>     "tag.Hostname" : "hdp-w-0"
>>>   } ],
>>>   "beans" : [ {
>>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server",
>>>     "modelerType" : "RegionServer,sub=Server",
>>>     "tag.Context" : "regionserver",
>>>     "tag.Hostname" : "hdp-w-0"
>>>   } ]
>>> }
>>> 2015-08-06 14:11:13,640 ERROR [sync.0] wal.FSHLog: Error syncing,
>>> request close of wal
>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>> 2015-08-06 14:11:13,640 WARN
>>>  [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>>> wal.FSHLog: Failed last sync but no outstanding unsync edits so falling
>>> through to close; java.io.IOException: All datanodes
>>> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>> are bad. Aborting...
>>> 2015-08-06 14:11:13,641 ERROR
>>> [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>>> wal.ProtobufLogWriter: Got IOException while writing trailer
>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>> 2015-08-06 14:11:13,641 WARN
>>>  [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>>> wal.FSHLog: Riding over failed WAL close of
>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576,
>>> cause="All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>> are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS
>>> SYNCED SO SHOULD BE OK
>>> 2015-08-06 14:11:13,642 INFO
>>>  [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>>> wal.FSHLog: Rolled WAL
>>> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576
>>> with entries=101, filesize=30.38 KB; new WAL
>>> /apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438870273617
>>> 2015-08-06 14:11:13,643 INFO  [RS_CLOSE_REGION-hdp-w-0:16020-0]
>>> regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing
>>> region
>>> SYSTEM.SEQUENCE,]\x00\x00\x00,1438013446516.888f017eb1c0557fbe7079b50626c891.,
>>> still finishing close
>>> 2015-08-06 14:11:13,643 INFO
>>>  [regionserver/hdp-w-0.c.dks-hadoop.internal/10.240.164.0:16020.logRoller]
>>> wal.FSHLog: Archiving
>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576
>>> to
>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-0.c.dks-hadoop.internal%2C16020%2C1438869946905.default.1438869949576
>>> 2015-08-06 14:11:13,643 ERROR [RS_CLOSE_REGION-hdp-w-0:16020-0]
>>> executor.EventHandler: Caught throwable while processing event
>>> M_RS_CLOSE_REGION
>>> java.lang.RuntimeException: java.io.IOException: All datanodes
>>> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>> are bad. Aborting...
>>>         at
>>> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152)
>>>         at
>>> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>
>>> ------------
>>> hbase regionserver log (w-1)
>>> ------------
>>> 2015-08-06 14:11:14,267 INFO  [main-EventThread]
>>> replication.ReplicationTrackerZKImpl:
>>> /hbase-unsecure/rs/hdp-w-0.c.dks-hadoop.internal,16020,1438869946905 znode
>>> expired, triggering replicatorRemoved event
>>> 2015-08-06 14:12:08,203 INFO  [ReplicationExecutor-0]
>>> replication.ReplicationQueuesZKImpl: Atomically moving
>>> hdp-w-0.c.dks-hadoop.internal,16020,1438869946905's wals to my queue
>>> 2015-08-06 14:12:56,252 INFO
>>>  [PriorityRpcServer.handler=5,queue=1,port=16020]
>>> regionserver.RSRpcServices: Close 918ed7c6568e7500fb434f4268c5bbc5, moving
>>> to hdp-m.c.dks-hadoop.internal,16020,1438869954062
>>> 2015-08-06 14:12:56,260 INFO
>>>  [StoreCloserThread-SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.-1]
>>> regionserver.HStore: Closed 0
>>> 2015-08-06 14:12:56,261 FATAL
>>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.append-pool1-t1]
>>> wal.FSHLog: Could not append. Requesting close of wal
>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>> 2015-08-06 14:12:56,261 ERROR [sync.3] wal.FSHLog: Error syncing,
>>> request close of wal
>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>> 2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0]
>>> regionserver.HRegionServer: ABORTING region server
>>> hdp-w-1.c.dks-hadoop.internal,16020,1438869946909: Unrecoverable exception
>>> while closing region
>>> SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.,
>>> still finishing close
>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>> 2015-08-06 14:12:56,262 FATAL [RS_CLOSE_REGION-hdp-w-1:16020-0]
>>> regionserver.HRegionServer: RegionServer abort: loaded coprocessors are:
>>> [org.apache.phoenix.coprocessor.ServerCachingEndpointImpl,
>>> org.apache.hadoop.hbase.regionserver.LocalIndexSplitter,
>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver,
>>> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver,
>>> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint,
>>> org.apache.phoenix.coprocessor.ScanRegionObserver,
>>> org.apache.phoenix.hbase.index.Indexer,
>>> org.apache.phoenix.coprocessor.SequenceRegionObserver]
>>> 2015-08-06 14:12:56,281 INFO  [RS_CLOSE_REGION-hdp-w-1:16020-0]
>>> regionserver.HRegionServer: Dump of metrics as JSON on abort: {
>>>   "beans" : [ {
>>>     "name" : "java.lang:type=Memory",
>>>     "modelerType" : "sun.management.MemoryImpl",
>>>     "ObjectPendingFinalizationCount" : 0,
>>>     "NonHeapMemoryUsage" : {
>>>       "committed" : 137166848,
>>>       "init" : 136773632,
>>>       "max" : 184549376,
>>>       "used" : 48667528
>>>     },
>>>     "HeapMemoryUsage" : {
>>>       "committed" : 2104754176,
>>>       "init" : 2147483648,
>>>       "max" : 2104754176,
>>>       "used" : 270075472
>>>     },
>>>     "Verbose" : true,
>>>     "ObjectName" : "java.lang:type=Memory"
>>>   } ],
>>>   "beans" : [ {
>>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC",
>>>     "modelerType" : "RegionServer,sub=IPC",
>>>     "tag.Context" : "regionserver",
>>>     "tag.Hostname" : "hdp-w-1"
>>>   } ],
>>>   "beans" : [ {
>>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication",
>>>     "modelerType" : "RegionServer,sub=Replication",
>>>     "tag.Context" : "regionserver",
>>>     "tag.Hostname" : "hdp-w-1"
>>>   } ],
>>>   "beans" : [ {
>>>     "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server",
>>>     "modelerType" : "RegionServer,sub=Server",
>>>     "tag.Context" : "regionserver",
>>>     "tag.Hostname" : "hdp-w-1"
>>>   } ]
>>> }
>>> 2015-08-06 14:12:56,284 ERROR [sync.4] wal.FSHLog: Error syncing,
>>> request close of wal
>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>> 2015-08-06 14:12:56,285 WARN
>>>  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>>> wal.FSHLog: Failed last sync but no outstanding unsync edits so falling
>>> through to close; java.io.IOException: All datanodes
>>> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>> are bad. Aborting...
>>> 2015-08-06 14:12:56,285 ERROR
>>> [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>>> wal.ProtobufLogWriter: Got IOException while writing trailer
>>> java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>> 2015-08-06 14:12:56,285 WARN
>>>  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>>> wal.FSHLog: Riding over failed WAL close of
>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359,
>>> cause="All datanodes DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>> are bad. Aborting...", errors=1; THIS FILE WAS NOT CLOSED BUT ALL EDITS
>>> SYNCED SO SHOULD BE OK
>>> 2015-08-06 14:12:56,287 INFO
>>>  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>>> wal.FSHLog: Rolled WAL
>>> /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359
>>> with entries=100, filesize=30.73 KB; new WAL
>>> /apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438870376262
>>> 2015-08-06 14:12:56,288 INFO
>>>  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020.logRoller]
>>> wal.FSHLog: Archiving
>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/WALs/hdp-w-1.c.dks-hadoop.internal,16020,1438869946909/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359
>>> to
>>> hdfs://hdp-m.c.dks-hadoop.internal:8020/apps/hbase/data/oldWALs/hdp-w-1.c.dks-hadoop.internal%2C16020%2C1438869946909.default.1438869950359
>>> 2015-08-06 14:12:56,315 INFO  [RS_CLOSE_REGION-hdp-w-1:16020-0]
>>> regionserver.HRegionServer: STOPPED: Unrecoverable exception while closing
>>> region
>>> SYSTEM.SEQUENCE,\x7F\x00\x00\x00,1438013446516.918ed7c6568e7500fb434f4268c5bbc5.,
>>> still finishing close
>>> 2015-08-06 14:12:56,315 INFO
>>>  [regionserver/hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020]
>>> regionserver.SplitLogWorker: Sending interrupt to stop the worker thread
>>> 2015-08-06 14:12:56,315 ERROR [RS_CLOSE_REGION-hdp-w-1:16020-0]
>>> executor.EventHandler: Caught throwable while processing event
>>> M_RS_CLOSE_REGION
>>> java.lang.RuntimeException: java.io.IOException: All datanodes
>>> DatanodeInfoWithStorage[10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK]
>>> are bad. Aborting...
>>>         at
>>> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:152)
>>>         at
>>> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>> 10.240.187.182:50010,DS-8c63ac70-2f98-4084-91ee-a847b4f48ce2,DISK] are
>>> bad. Aborting...
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
>>>
>>> -------------
>>> m datanode log
>>> -------------
>>> 2015-07-27 14:11:16,082 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073742677_1857,
>>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>>> 2015-07-27 14:11:16,132 INFO  datanode.DataNode
>>> (DataXceiver.java:writeBlock(655)) - Receiving
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858 src: /
>>> 10.240.200.196:56767 dest: /10.240.200.196:50010
>>> 2015-07-27 14:11:16,155 INFO  DataNode.clienttrace
>>> (BlockReceiver.java:finalizeBlock(1375)) - src: /10.240.200.196:56767,
>>> dest: /10.240.200.196:50010, bytes: 117761, op: HDFS_WRITE, cliID:
>>> DFSClient_NONMAPREDUCE_177514816_1, offset: 0, srvID:
>>> 329bbe62-bcea-4a6d-8c97-e800631deb81, blockid:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858, duration:
>>> 6385289
>>> 2015-07-27 14:11:16,155 INFO  datanode.DataNode
>>> (BlockReceiver.java:run(1348)) - PacketResponder:
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073742678_1858,
>>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
>>> 2015-07-27 14:11:16,267 ERROR datanode.DataNode
>>> (DataXceiver.java:run(278)) - hdp-m.c.dks-hadoop.internal:50010:DataXceiver
>>> error processing unknown operation  src: /127.0.0.1:60513 dst: /
>>> 127.0.0.1:50010
>>> java.io.EOFException
>>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 2015-07-27 14:11:16,405 INFO  datanode.DataNode
>>> (DataNode.java:transferBlock(1943)) - DatanodeRegistration(
>>> 10.240.200.196:50010,
>>> datanodeUuid=329bbe62-bcea-4a6d-8c97-e800631deb81, infoPort=50075,
>>> infoSecurePort=0, ipcPort=8010,
>>> storageInfo=lv=-56;cid=CID-1247f294-77a9-4605-b6d3-4c1398bb5db0;nsid=2032226938;c=0)
>>> Starting thread to transfer
>>> BP-369072949-10.240.200.196-1437998325049:blk_1073742649_1829 to
>>> 10.240.2.235:50010 10.240.164.0:50010
>>>
>>> -------------
>>> w-0 datanode log
>>> -------------
>>> 2015-07-27 14:11:25,019 ERROR datanode.DataNode
>>> (DataXceiver.java:run(278)) -
>>> hdp-w-0.c.dks-hadoop.internal:50010:DataXceiver error processing unknown
>>> operation  src: /127.0.0.1:47993 dst: /127.0.0.1:50010
>>> java.io.EOFException
>>>         at java.io.DataInputStream.readShort(DataInputStream.java:315)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 2015-07-27 14:11:25,077 INFO  DataNode.clienttrace
>>> (DataXceiver.java:requestShortCircuitFds(369)) - src: 127.0.0.1, dest:
>>> 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073742631, srvID:
>>> a5eea5a8-5112-46da-9f18-64274486c472, success: true
>>>
>>> -----------------------------
>>> Thank you in advance,
>>>
>>> Adrià
>>>
>>>
>>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>



-- 
*  Regards*
*  Sandeep Nemuri*

Mime
View raw message