flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ossi <los...@gmail.com>
Subject Reliability of Flume (autoE2EChain)
Date Wed, 30 Nov 2011 07:25:27 GMT

I'm new on the list and I do hope that some of you can help me. :)

We are testing flume with fully distributed configuration and isolated
1 master server (server5)
1 collector (server6)
2 agents (server2 and server3)

Both agent servers has 8 logical nodes collecting apache httpd logs.
There is 2 apache instances running and we want to collect
http and https both with access and errors separately.

Suddenly Flume ceased to write some files to hdfs from the other server,
but not all.
First it ceased with aa_error_log... (wrote that only few moments) and
later after running
fine for several hours it ceased to write aa_access_logs.

There isn't any error messages in master, collector or agent logs. And from
agent point of view
it seemed that it has been delivering those files all the time (not sure
how to read those logs).
Seems like collector just suddenly stopped delivering those files to hdfs.

It seems that collector was somehow in bad shape, since it's Jetty didn't
function too well either:
it opened http://localhost:35862/, but got stalled while tried to get
flumeagent.jsp file.

After restart of collector (on next day) it continued to write files to
hdfs, but it missed all the
files from past 8 hours. Also web interface worked fine.

Unfortunately we don't have any logs available, since we lost them due to
bug https://issues.cloudera.org/browse/FLUME-631.

So, does anybody have any idea what could have caused this or do we need to
wait if it happens again?

Log collection was configured like this (for both aa and bb) using "flume
shell -c server5 -s flume-aa.txt":

cat flume-aa.txt
exec map server3 aa-agent-http-fe-1
exec map server3 aa-agent-http-fe-2
exec map server3 aa-agent-https-fe-1
exec map server3 aa-agent-https-fe-2
exec map server3 aa-agent-http-error-fe-1
exec map server3 aa-agent-http-error-fe-2
exec map server3 aa-agent-https-error-fe-1
exec map server3 aa-agent-https-error-fe-2

exec map server6 aa-collector-http-fe
exec map server6 aa-collector-https-fe
exec map server6 aa-collector-http-error-fe
exec map server6 aa-collector-https-error-fe

exec config aa-agent-http-fe-1 aa-flow-http-fe
'tailDir("/logs/aa/httpd-fe-1/", "aa_access_log-\\d{4}-\\d{2}-\\d{2}$",
true)' autoE2EChain
exec config aa-agent-http-fe-2 aa-flow-http-fe
'tailDir("/logs/aa/httpd-fe-2/", "aa_access_log-\\d{4}-\\d{2}-\\d{2}$",
true)' autoE2EChain

exec config aa-collector-http-fe aa-flow-http-fe autoCollectorSource

exec config aa-agent-https-fe-1 aa-flow-https-fe
'tailDir("/logs/aa/httpd-fe-1/", "ssl-aa_access_log-\\d{4}-\\d{2}-\\d{2}$",
true)' autoE2EChain
exec config aa-agent-https-fe-2 aa-flow-https-fe
'tailDir("/logs/aa/httpd-fe-2/", "ssl-aa_access_log-\\d{4}-\\d{2}-\\d{2}$",
true)' autoE2EChain

exec config aa-collector-https-fe aa-flow-https-fe autoCollectorSource

exec config aa-agent-http-error-fe-1 aa-flow-http-error-fe
'tailDir("/logs/aa/httpd-fe-1/", "aa_error_log-\\d{4}-\\d{2}-\\d{2}$",
true)' autoE2EChain
exec config aa-agent-http-error-fe-2 aa-flow-http-error-fe
'tailDir("/logs/aa/httpd-fe-2/", "aa_error_log-\\d{4}-\\d{2}-\\d{2}$",
true)' autoE2EChain
exec config aa-collector-http-error-fe aa-flow-http-error-fe

exec config aa-agent-https-error-fe-1 aa-flow-https-error-fe
'tailDir("/logs/aa/httpd-fe-1/", "ssl-aa_error_log-\\d{4}-\\d{2}-\\d{2}$",
true)' autoE2EChain
exec config aa-agent-https-error-fe-2 aa-flow-https-error-fe
'tailDir("/logs/aa/httpd-fe-2/", "ssl-aa_error_log-\\d{4}-\\d{2}-\\d{2}$",
true)' autoE2EChain
exec config aa-collector-https-error-fe aa-flow-https-error-fe

waitForNodesActive 0 aa-agent-http-fe-1 aa-agent-http-fe-2
aa-agent-https-fe-1 aa-agent-https-fe-2 aa-agent-http-error-fe-1
aa-agent-http-error-fe-2 aa-agent-https-error-fe-1
aa-agent-https-error-fe-2 aa-collector-http-fe aa-collector-https-fe
aa-collector-http-error-fe aa-collector-https-error-fe

exec refreshAll

View raw message