flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Stricks <...@wapolabs.com>
Subject Distributed Deployment Questions
Date Fri, 02 Mar 2012 22:45:06 GMT
Hey folks,

I'm looking for some advice on a couple of issues I"m having. My setup is
Flume v.094--cdh3u2, single master, six collectors (three flows, all
autoCollectorSource), ~80 agents (three flows, autoE2E).

1. I have begun to have collectors fail with "ERROR connector.DirectDriver:
Exiting driver logicalNode <node_name> in error state ThriftEventSource |
Collector because null", which looks very similar to the issue address in
FLUME-757 (https://issues.apache.org/jira/browse/FLUME-757).  Any
update/advice on how to address this? Is it an issue of limiting the size
of the files being transmitted to the collectors, or the frequency of
transmission? This never happened on 093, and it's a little concerning to
see after upgrading.

2. I'm a constantly getting "WARN httpclient.RestS3Service: Response
- Unexpected response code 404, expected 200", even though the data is
being written to S3. I know this has been brought up before, but is there
any advice on when to determine if it's a valid error?

3. My agents are on machines that are launched and terminated somewhat
frequently due to maintenance, etc.  I have the user data scripts set up so
that each agent server, upon being launched, starts a Flume shell, connects
to the master, and executes its own configuration commands.  Often, my
master will fail when too many agent configurations are being submitted.
The number of threads grows exponentially at these times, and then fails.
I'm curious if anyone else experiences this over-concurrency problem, or
how you would recommend avoiding it. Any ideas for how to have the master
'notice' a new agent and execute its configuration itself, which seems like
it would be an effective rate limiter, so to speak?

Thanks a ton for the help!

Jay S.

View raw message