flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Grimes <frankgrime...@yahoo.com>
Subject Collector node failing with java.net.SocketException: Too many open files
Date Thu, 26 Jan 2012 15:51:26 GMT
Hi All,

We are using flume-0.9.5 (specifically, http://svn.apache.org/repos/asf/incubator/flume/trunk@1179275)
and occasionally our Collector node accumulates too many open TCP connections and starts madly
logging the following errors:

WARN org.apache.thrift.server.TSaneThreadPoolServer: Transport error occurred during acceptance
of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files
       at org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:139)
       at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
       at org.apache.thrift.server.TSaneThreadPoolServer$1.run(TSaneThreadPoolServer.java:175)
Caused by: java.net.SocketException: Too many open files
       at java.net.PlainSocketImpl.socketAccept(Native Method)
       at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
       at java.net.ServerSocket.implAccept(ServerSocket.java:462)
       at java.net.ServerSocket.accept(ServerSocket.java:430)
       at org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:134)
       ... 2 more

This quickly fills up the disk as the log file grows to multiple gigabytes in size.

After some investigation, it appears that even though the Agent nodes show single open connections
to the Collector, the Collector node appears to have a bunch of zombie TCP connections open
back to the Agent nodes.
i.e.
"lsof -n | grep PORT" on the Agent node shows 1 established connection
However, the Collector node shows hundreds of established connections for that same port which
don't seem to tie up to any connections I can find on the Agent node.

So we're concluding that the Collector node is somehow leaking connections.

Has anyone seen this kind of thing before?

Could this be related to https://issues.apache.org/jira/browse/FLUME-857?
Or could this be a Thrift bug that could be avoided by switching to Avro sources/sinks?

Any hints/tips are most welcome.

Thanks,

Frank Grimes
Mime
View raw message