flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denes Arvay <de...@cloudera.com>
Subject Re: A puzzy problem about flume Failover Sink Processor
Date Sat, 01 Apr 2017 13:18:06 GMT
Hi,

If you are using HDFS HA you don't have to do any special configuration on
Flume side. One HDFS Sink is enough, just use the logical nameservice id in
the hdfs.path property instead of the host name.

See:
https://community.cloudera.com/t5/Data-Ingestion-Integration/Flume-HDFS-HA/td-p/29141

Regards,
Denes

On Sat, Apr 1, 2017 at 12:31 PM hui.pan@wbkit.com <hui.pan@wbkit.com> wrote:


I have a puzzy problem about flume Failover Sink Processor,Description as below:
  Enviroment is that we do HA configuration in HDFS,that is we have 2
namenode,1 status for active(host name:n1) and another for
standby(hostname:n2).
  We configure our flume.conf like this:
   # Flume agent config
   c1.sources = source-cd
   c1.channels = c_cd
   c1.sinks = s_cd1  s_cd2  //s_cd1's hostname is n1,s_cd2's hotname is n2

   c1.sinkgroups = g1
   c1.sinkgroups.g1.sinks = s_cd1 s_cd2
   c1.sinkgroups.g1.processor.type = failover
   c1.sinkgroups.g1.processor.priority.s_cd1 = 5
   c1.sinkgroups.g1.processor.priority.s_cd2 = 10
   c1.sinkgroups.g1.processor.maxpenalty = 10000

   c1.sources.source-cd.type = org.apache.flume.source.taildir.TaildirSource
   c1.sources.source-cd.channels = c_cd

   c1.channels.c_cd.type = memory
   c1.channels.c_cd.capacity = 5000000
   c1.channels.c_cd.transactionCapacity = 1000

   c1.sinks.s_cd1.type = hdfs
   c1.sinks.s_cd1.hdfs.path = hdfs://192.168.1.31:8020/PATH
   c1.sinks.s_cd1.channel = c_cd
   c1.sinks.s_cd1.hdfs.batchSize = 1000
   c1.sinks.s_cd1.hdfs.useLocalTimeStamp = true
   c1.sinks.s_cd1.hdfs.rollSize = 0
   c1.sinks.s_cd1.hdfs.rollCount = 1000000
   c1.sinks.s_cd1.hdfs.rollInterval = 600

   c1.sinks.s_cd2.type = hdfs
   c1.sinks.s_cd2.hdfs.path = hdfs://192.168.1.32:8020/PATH
   c1.sinks.s_cd2.channel = c_cd
   c1.sinks.s_cd2.hdfs.batchSize = 1000
   c1.sinks.s_cd2.hdfs.useLocalTimeStamp = true
   c1.sinks.s_cd2.hdfs.rollSize = 0
   c1.sinks.s_cd2.hdfs.rollCount = 1000000
   c1.sinks.s_cd2.hdfs.rollInterval = 600

   In the flume.conf,we know that the priority of n2 is higher than
n1,so data should be put into n2.Unfortunately,n2 is breakdown when
run big data.according to flume
 mechanism,unfinished data should be flow to n1,but the result is not
like this.The log is like this:


2017-04-01 15:31:28,285 WARN org.apache.hadoop.hdfs.LeaseRenewer:
Failed to renew lease for [DFSClient_NONMAPREDUCE_-1862131888_34] for
525 seconds.  Will retry shortly ...
java.net.ConnectException: Call From archive.cloudera.com/192.168.1.31
 to slave01:8020 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor6.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1408)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy20.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:576)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy21.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:922)
at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:423)
at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448)
at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:713)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524)
at org.apache.hadoop.ipc.Client.call(Client.java:1447)
... 16 more

 According to this log,n2 is breakdown meanwhile n1 doesn't get any
data.So we recovery n2.and the log is like this:


 2017-03-31 21:16:35,783 WARN
org.apache.flume.sink.hdfs.HDFSEventSink: HDFS IO error
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
Operation category WRITE is not supported in state standby. Visit
https://s.apache.org/sbnn-error
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1826)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1404)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:592)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:111)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:393)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

 So how can we do to make n1 get data when n2 is breakdown.Is there
any wrong about my configuration or this situation is unsuitable for
this.Thanks a lot.


------------------------------
Hollis @ Wbkit
Apr 1 .2017

Mime
View raw message