hello,
1. I want to tail a log source and write it to hdfs. below is configure:
config [ag1, tail("/home/zhouhh/game.log",startFromEnd=true), agentDFOSink("hadoop48",35853) ;]
config [ag2, tail("/home/zhouhh/game.log",startFromEnd=true), agentDFOSink("hadoop48",35853) ;]
config [co1, collectorSource( 35853 ),  [collectorSink( "hdfs://hadoop48:54310/user/flume/%y%m/%d","%{host}-",5000,raw),collectorSink( "hdfs://hadoop48:54310/user/flume/%y%m","%{host}-",10000,raw)]]


I found if I restart the agent node, it will resend the content of game.log to collector. There are some solutions to send logs from where I haven't sent before? Or I have to make a mark myself or remove the logs manually when restart the agent node?

2. I tested performance of flume, and found it's a bit slow.
if I using configure as above, there are only 50MB/minute. 
I changed the configure to below:
ag1:tail("/home/zhouhh/game.log",startFromEnd=true)|batch(1000) gzip agentDFOSink("hadoop48",35853); 

config [co1, collectorSource( 35853 ), [collectorSink( "hdfs://hadoop48:54310/user/flume/%y%m/%d","%{host}-",5000,raw),collectorSink( "hdfs://hadoop48:54310/user/flume/%y%m","%{host}-",10000,raw)]]

I sent 300MB log, it will spent about 3 minutes, so it's about 100MB/minute.

while I send the log from ag1 to co1 via scp, It's about 30MB/second.

someone give me any ideas?

thanks!

Andy