flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagadish Bihani <jagadish.bih...@pubmatic.com>
Subject Flume netcat source related problems
Date Tue, 04 Sep 2012 10:50:27 GMT

I encountered an problem in my scenario with netcat source. Setup is
Host A: Netcat source -file channel -avro sink
Host B: Avro source - file channel - HDFS sink
But to simplify it I have created a single agent with "Netcat Source" 
and "file roll sink"*
*It is *:
*Host A: Netcat source - file channel - File_roll sink

1. To simulate the our production scenario. I have created a script 
which runs for 15 sec and in the
while loop writes requests netcat source on a given port. For a large 
value of the sleep events are
delivered correctly to the destination. But as I reduce the delay events 
are given to the source but they
are not delivered to the destination. e.g. I write 9108 records within 
15 sec using script and only 1708
got delivered. And I don't get any exception. If it is flow control 
related problem then I should have seen
some exception in agent logs. But with file channel and huge disk space, 
is it a problem?

***Machine Configuration:*
RAM : 8 GB
JVM : 200 MB
CPU: 2.0 GHz Quad core processor

*Flume Agent Confi**guration*
adServerAgent.sources = netcatSource
adServerAgent.channels = fileChannel memoryChannel
adServerAgent.sinks = fileSink

# For each one of the sources, the type is defined
adServerAgent.sources.netcatSource.type = netcat
adServerAgent.sources.netcatSource.bind =
adServerAgent.sources.netcatSource.port = 55355

# The channel can be defined as follows.
adServerAgent.sources.netcatSource.channels = fileChannel
#adServerAgent.sources.netcatSource.channels = memoryChannel

# Each sink's type must be defined
adServerAgent.sinks.fileSink.type = file_roll
adServerAgent.sinks.fileSink.sink.directory = /root/flume/flume_sink

#Specify the channel the sink should use
#adServerAgent.sinks.fileSink.channel = memoryChannel
adServerAgent.sinks.fileSink.channel = fileChannel

adServerAgent.channels.memoryChannel.type =memory
adServerAgent.channels.memoryChannel.capacity = 100000
adServerAgent.channels.memoryChannel.transactionCapacity = 10000


*Script  snippet being used:*
         local $SIG{ALRM} = sub { die "alarm\n"; };
         alarm $TIMEOUT;
         my $i=0;
         my $str = "";
         my $counter=1;
                         $str = "";
                         for($i=0; $i < $NO_ELE_PER_ROW; $i++)
                                 $str .= $counter."\t";
                         #print $socket "$str\n";
                         $socket->send($str."\n") or die "Didn't send";

                         if($? != 0)
                                 print "Failed for $str \n";
                         print "$str\n";
         alarm 0;
if ($@) {

- Script is working fine as for the very large delay all events are 
getting transmitted correctly.*
*- Same problem occurs with memory channel too but with lower values of 

**Problem 2:*
-- With this setup I am getting very low throughput i.e. I am able to 
transfer only ~ 1 KB/sec data
to the destination file sink. Similar performance was achieved using 
HDFS sink.
-- I had tried increasing batch sizes in my original scenario without 
much gain in throughput.
-- I had seen using 'tail -F' as source almost 10 times better throughput.
-- Is there any tunable parameter for netcat source?

Please help me in above 2 cases - i)netcat source use  cases
ii) Typical flume's expected throughput with file channel and file/HDFS 
sink on the single machine.


View raw message