flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: Cause and solution for org.apache.flume.EventDeliveryException: NettyAvroRpcClient: Handshake timed out after 20000ms
Date Thu, 04 Oct 2012 03:15:24 GMT
Hi Soo,


The Handshake timeout is only there in place to work around
https://issues.apache.org/jira/browse/AVRO-1122. As a result of this
bug, Avro handshake can deadlock and thus cause the sink to hang. If
this is happening very rarely it is probably ok, but if it is
happening consistently, try setting the "connect-timeout" parameter
for the Avro Sink to a higher value (default is 20000 ms - this param
is in ms).


The second set of HBase Exceptions seems to be because HBase is unable
to communicate with the servers in question. Can you verify that your
cluster is healthy and region servers are not dead/machines not
offline? Also, you should probably use the AsyncHBaseSink which
provides the same guarantees but gives far better performance.


Regards,

Hari


On Wed, Oct 3, 2012 at 7:57 PM, Soo Kim <philipjkim@gmail.com> wrote:
>
> Hi,
>
> I'm using Flume NG (1.2.0), and having some trouble with timeout exception.
>
> I have 2 servers. A log file is appended at server A. And server B has HBase.
> So I installed Flume NG to server A as a source, and to server B as a sink.
>
> flume.conf for server A is:
> src_agent.sources = tail
> src_agent.channels = memoryChannel
> src_agent.sinks = avroSink
>
> src_agent.sources.tail.type = exec
> src_agent.sources.tail.command = tail -F /home/www/crash_report/log/crash.log
> src_agent.sources.tail.channels = memoryChannel
>
> src_agent.sinks.avroSink.channel = memoryChannel
> src_agent.sinks.avroSink.type = avro
> src_agent.sinks.avroSink.hostname = serverb.hostname.com
> src_agent.sinks.avroSink.port = 10001
>
> src_agent.channels.memoryChannel.type = memory
> src_agent.channels.memoryChannel.capacity = 10000
>
> And flume.conf for server B is:
> dst_agent.sources = avro
> dst_agent.channels = memoryChannel
> dst_agent.sinks = hbaseSink
>
> dst_agent.sources.avro.type = avro
> dst_agent.sources.avro.bind = 0.0.0.0
> dst_agent.sources.avro.port = 10001
> dst_agent.sources.avro.channels = memoryChannel
>
> dst_agent.sinks.hbaseSink.channel = memoryChannel
> dst_agent.sinks.hbaseSink.type = org.apache.flume.sink.hbase.HBaseSink
> dst_agent.sinks.hbaseSink.table = crash_report_flume
> dst_agent.sinks.hbaseSink.columnFamily = f
> dst_agent.sinks.hbaseSink.column = foo
> dst_agent.sinks.hbaseSink.serializer =
> org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
> dst_agent.sinks.hbaseSink.serializer.payloadColumn = pcol
> dst_agent.sinks.hbaseSink.serializer.incrementColumn = icol
>
> dst_agent.channels.memoryChannel.type = memory
> dst_agent.channels.memoryChannel.capacity = 10000
>
> Flume startup script for server A is:
> $FLUME_HOME/bin/flume-ng agent --conf $FLUME_HOME/conf --conf-file
> $FLUME_HOME/conf/flume-hbase-src.conf --name src_agent
> -Dflume.root.logger=INFO,console 1>>flume.log 2>>flume.log &
>
> Script for server B:
> $FLUME_HOME/bin/flume-ng agent --conf $FLUME_HOME/conf --conf-file
> $FLUME_HOME/conf/flume-hbase-dst.conf --name dst_agent
> -Dflume.root.logger=INFO,console 1>>flume.log 2>>flume.log &
>
> All worked fine after startup. Logs were collected to HBase as
> expected. But after a couple of hours I found something went wrong.
>
> Here's the flume error logs in server A:
> 2012-10-02 20:28:59,641
> (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)]
> Unable to deliver event. Exception follows.
> org.apache.flume.EventDeliveryException: Failed to send events
>         at org.apache.flume.sink.AvroSink.process(AvroSink.java:325)
>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:636)
> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient
> { host: serverb.hostname.com, port: 10001 }: Failed to send batch
>         at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
>         at org.apache.flume.sink.AvroSink.process(AvroSink.java:309)
>         ... 3 more
> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient
> { host: serverb.hostname.com, port: 10001 }: Handshake timed out after
> 20000ms
>         at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280)
>         at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
>         ... 4 more
> Caused by: java.util.concurrent.TimeoutException
>         at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:258)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:119)
>         at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:278)
>         ... 5 more
>
> There were lots of timeout exceptions, and the following exception
> occured sparsely:
> 2012-10-04 09:44:06,408
> (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)]
> Unable to deliver event. Exception follows.
> org.apache.flume.EventDeliveryException: Failed to send events
>         at org.apache.flume.sink.AvroSink.process(AvroSink.java:325)
>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:636)
> Caused by: org.apache.flume.FlumeException: NettyAvroRpcClient { host:
> host: serverb.hostname.com, port: 10001 }: RPC connection error
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:117)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:93)
>         at org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:507)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:88)
>         at org.apache.flume.sink.AvroSink.createConnection(AvroSink.java:182)
>         at org.apache.flume.sink.AvroSink.verifyConnection(AvroSink.java:222)
>         at org.apache.flume.sink.AvroSink.process(AvroSink.java:282)
>         ... 3 more
> Caused by: java.io.IOException: Error connecting to host:
> serverb.hostname.com/192.168.1.2:10001
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:249)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:198)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:147)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:106)
>         ... 9 more
> Caused by: java.nio.channels.ClosedChannelException
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$1.operationComplete(NioClientSocketPipelineSink.java:157)
>         at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:381)
>         at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:367)
>         at org.jboss.netty.channel.DefaultChannelFuture.setSuccess(DefaultChannelFuture.java:316)
>         at org.jboss.netty.channel.AbstractChannel$ChannelCloseFuture.setClosed(AbstractChannel.java:351)
>         at org.jboss.netty.channel.AbstractChannel.setClosed(AbstractChannel.java:188)
>         at org.jboss.netty.channel.socket.nio.NioSocketChannel.setClosed(NioSocketChannel.java:146)
>         at org.jboss.netty.channel.socket.nio.NioWorker.close(NioWorker.java:592)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.close(NioClientSocketPipelineSink.java:415)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processConnectTimeout(NioClientSocketPipelineSink.java:379)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:299)
>         at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>         at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         ... 1 more
>
> I'd like to know what's the root cause for those errors, and how I can
> solve them.
>
> FYI here's error logs in server B:
> 2012-10-02 23:13:41,479
> (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
> org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:221)]
> Failed to commit transaction.Transaction rolled back.
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed 100 actions: servers with issues: 192.168.1.3:60020,
>         at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1424)
>         at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:624)
>         at org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:207)
>         at org.apache.flume.sink.hbase.HBaseSink.process(HBaseSink.java:200)
>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-10-02 23:13:41,480
> (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
> org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:228)]
> Failed to commit transaction.Transaction rolled back.
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed 100 actions: servers with issues: 192.168.1.3:60020,
>         at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1424)
>         at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:624)
>         at org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:207)
>         at org.apache.flume.sink.hbase.HBaseSink.process(HBaseSink.java:200)
>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-10-02 23:13:41,480
> (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)]
> Unable to deliver event. Exception follows.
> org.apache.flume.EventDeliveryException: Failed to commit
> transaction.Transaction rolled back.
>         at org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:230)
>         at org.apache.flume.sink.hbase.HBaseSink.process(HBaseSink.java:200)
>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed 100 actions: servers with issues: 192.168.1.3:60020,
>         at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1424)
>         at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:624)
>         at org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:207)
>         ... 4 more
>
> --
> Thanks,
> Soo Philip Kim
> Email : philipjkim@gmail.com
> Phone : +82-10-7222-6078

Mime
View raw message