flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cochran, David" <david.coch...@bsee.gov>
Subject Re: Dupes
Date Fri, 05 Apr 2013 14:22:02 GMT
Hi Israel,

I copied out the portions of my config that pertain to the server that I'm
seeing this bad behavior from (and sanitized it a little)  Otherwise my
config is like 1400 lines now, trying to stay with a single config and dist
it out to each server for consistency to save headaches.

Dave



# define sources, channels, and sinks for each log file
node_usvsm01.sources = source21061 source21062 source21063
node_usvsm01.channels = channel21061 channel21062 channel21063
node_usvsm01.sinks = sink21061 sink21062 sink21063

# source file usvsm01 - smaccess.log
node_usvsm01.sources.source21061.type = exec
node_usvsm01.sources.source21061.command = tail -F
/opt/siteminder/CA/log/smaccess.log
node_usvsm01.sources.source21061.channels = channel21061
# source file imsnolusvsm01 - smps.log
node_usvsm01.sources.source21062.type = exec
node_usvsm01.sources.source21062.command = tail -F
/opt/siteminder/CA/log/smps.log
node_usvsm01.sources.source21062.channels = channel21062
# source file imsnolusvsm01 - smtracedefault.log
node_usvsm01.sources.source21063.type = exec
node_usvsm01.sources.source21063.command = tail -F
/opt/siteminder/CA/log/smtracedefault.log
node_usvsm01.sources.source21063.channels = channel21063

node_usvsm01.channels.channel21061.type = memory
node_usvsm01.channels.channel21061.capacity = 100000
node_usvsm01.channels.channel21061.transactionCapactiy = 1000
node_usvsm01.channels.channel21062.type = memory
node_usvsm01.channels.channel21062.capacity = 100000
node_usvsm01.channels.channel21062.transactionCapactiy = 1000
node_usvsm01.channels.channel21063.type = memory
node_usvsm01.channels.channel21063.capacity = 100000
node_usvsm01.channels.channel21063.transactionCapactiy = 1000

# send channels --> flume @ usvinf01
node_usvsm01.sinks.sink21061.type = avro
node_usvsm01.sinks.sink21061.channel = channel21061
node_usvsm01.sinks.sink21061.hostname = usinf01
node_usvsm01.sinks.sink21061.port = 21061
node_usvsm01.sinks.sink21062.type = avro
node_usvsm01.sinks.sink21062.channel = channel21062
node_usvsm01.sinks.sink21062.hostname = usinf01
node_usvsm01.sinks.sink21062.port = 21062
node_usvsm01.sinks.sink21063.type = avro
node_usvsm01.sinks.sink21063.channel = channel21063
node_usvsm01.sinks.sink21063.hostname = usinf01
node_usvsm01.sinks.sink21063.port = 21063


node102.sources = source21061 source21062 source21063
node102.channels = channel21061 channel21062 channel21063
node102.sinks = sink21061 sink21062 sink21063

#  - usvsm01 -
# source file usvsm01 - smaccess.log
node102.sources.source21061.type = avro
node102.sources.source21061.bind = 0.0.0.0
node102.sources.source21061.port = 21061
node102.sources.source21061.channels = channel21061
# source file usvsm01 - smps.log
node102.sources.source21062.type = avro
node102.sources.source21062.bind = 0.0.0.0
node102.sources.source21062.port = 21062
node102.sources.source21062.channels = channel21062
# source file usvsm01 - smtracedefault.log
node102.sources.source21063.type = avro
node102.sources.source21063.bind = 0.0.0.0
node102.sources.source21063.port = 21063
node102.sources.source21063.channels = channel21063

#  - usvsm01 -
node102.channels.channel21061.type = memory
node102.channels.channel21061.capacity = 100000
node102.channels.channel21061.transactionCapactiy = 1000
node102.channels.channel21062.type = memory
node102.channels.channel21062.capacity = 100000
node102.channels.channel21062.transactionCapactiy = 1000
node102.channels.channel21063.type = memory
node102.channels.channel21063.capacity = 100000
node102.channels.channel21063.transactionCapactiy = 1000

# usvsm01 -
# source file usvsm01 - smaccess.log
node102.sinks.sink21061.type = FILE_ROLL
node102.sinks.sink21061.channel = channel21061
node102.sinks.sink21061.sink.directory =
/flume_logs/usvsm01/siteminder/smaccess_log
node102.sinks.sink21061.sink.rollInterval = 86400
node102.sinks.sink21061.sink.serializer = TEXT
# source file usvsm01 - smps.log
node102.sinks.sink21062.type = FILE_ROLL
node102.sinks.sink21062.channel = channel21062
node102.sinks.sink21062.sink.directory =
/flume_logs/usvsm01/siteminder/smps_log
node102.sinks.sink21062.sink.rollInterval = 86400
node102.sinks.sink21062.sink.serializer = TEXT
# source file usvsm01 - smtracedefault.log
node102.sinks.sink21063.type = FILE_ROLL
node102.sinks.sink21063.channel = channel21063
node102.sinks.sink21063.sink.directory =
/flume_logs/usvsm01/siteminder/smtracedefault_log
node102.sinks.sink21063.sink.rollInterval = 86400
node102.sinks.sink21063.sink.serializer = TEXT





On Fri, Apr 5, 2013 at 9:00 AM, Israel Ekpo <israel@aicer.org> wrote:

> Hi Dave,
>
> Could you post your agents configuration file?
>
> Sometimes, little mis-configurations can result in un-intended or
> undefined behaviors.
>
>
>
> On Fri, Apr 5, 2013 at 9:52 AM, Cochran, David <david.cochran@bsee.gov>wrote:
>
>> I'm seeing a LOT of random dupes in some of my log files....
>>
>> This is pretty consistent in one in particular that's being tail'ed
>> averages ~20M per day, everyday.  On the only sink (FILE_ROLL) the
>> resulting 24hour log is 55M.  Just some quick counts grep'ing a random time
>> (ie 07:23) shows the sink log with a dozen or so more lines with the same
>> timestamp than the source has every minute.
>>
>> But this is happening like clockwork everyday for the last couple months
>> when I started using Flume on this box.
>>
>> I did check that there wasn't another source from this or another server
>> sending to the same port...and the entries of the log file look proper for
>> that app.
>>
>> The logs are not rolling at the same time on the source/sink and I've not
>> yet taken the time to set up copies of each begining and ending at the same
>> times and run a diff against them, but a preliminary 'eyeball diff' just
>> shows dupes.  I will note on the source a line with the exact same text may
>> appear more than once as the logging mechanism does not log more precise
>> then hour/minute.
>>
>> All in all, dupes are better than drops, but is there anything in
>> particular I should look for to try to find the cause of and eliminate this?
>>
>>
>> Thanks in advance for any thoughts,
>> Dave
>>
>>
>>
>>

Mime
View raw message