flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 周梦想 <abloz...@gmail.com>
Subject flume tail source problem and performance
Date Tue, 29 Jan 2013 07:24:40 GMT
hello,
1. I want to tail a log source and write it to hdfs. below is configure:
config [ag1, tail("/home/zhouhh/game.log",startFromEnd=true),
agentDFOSink("hadoop48",35853) ;]
config [ag2, tail("/home/zhouhh/game.log",startFromEnd=true),
agentDFOSink("hadoop48",35853) ;]
config [co1, collectorSource( 35853 ),  [collectorSink(
"hdfs://hadoop48:54310/user/flume/%y%m/%d","%{host}-",5000,raw),collectorSink(
"hdfs://hadoop48:54310/user/flume/%y%m","%{host}-",10000,raw)]]


I found if I restart the agent node, it will resend the content of game.log
to collector. There are some solutions to send logs from where I haven't
sent before? Or I have to make a mark myself or remove the logs manually
when restart the agent node?

2. I tested performance of flume, and found it's a bit slow.
if I using configure as above, there are only 50MB/minute.
I changed the configure to below:
ag1:tail("/home/zhouhh/game.log",startFromEnd=true)|batch(1000) gzip
agentDFOSink("hadoop48",35853);

config [co1, collectorSource( 35853 ), [collectorSink(
"hdfs://hadoop48:54310/user/flume/%y%m/%d","%{host}-",5000,raw),collectorSink(
"hdfs://hadoop48:54310/user/flume/%y%m","%{host}-",10000,raw)]]

I sent 300MB log, it will spent about 3 minutes, so it's about 100MB/minute.

while I send the log from ag1 to co1 via scp, It's about 30MB/second.

someone give me any ideas?

thanks!

Andy

Mime
View raw message