flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zijad Purkovic <zijadpurko...@gmail.com>
Subject Re: Collector node failing with java.net.SocketException: Too many open files
Date Thu, 26 Jan 2012 16:57:57 GMT
Hi Frank,

Can you show output of ulimit -n from your collector node?

On Thu, Jan 26, 2012 at 4:51 PM, Frank Grimes <frankgrimes97@yahoo.com> wrote:
> Hi All,
>
> We are using flume-0.9.5
> (specifically, http://svn.apache.org/repos/asf/incubator/flume/trunk@1179275)
> and occasionally our Collector node accumulates too many open TCP
> connections and starts madly logging the following errors:
>
> WARN org.apache.thrift.server.TSaneThreadPoolServer: Transport error
> occurred during acceptance of message.
> org.apache.thrift.transport.TTransportException: java.net.SocketException:
> Too many open files
>        at
> org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:139)
>        at
> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
>        at
> org.apache.thrift.server.TSaneThreadPoolServer$1.run(TSaneThreadPoolServer.java:175)
> Caused by: java.net.SocketException: Too many open files
>        at java.net.PlainSocketImpl.socketAccept(Native Method)
>        at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
>        at java.net.ServerSocket.implAccept(ServerSocket.java:462)
>        at java.net.ServerSocket.accept(ServerSocket.java:430)
>        at
> org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:134)
>        ... 2 more
>
>
> This quickly fills up the disk as the log file grows to multiple gigabytes
> in size.
>
> After some investigation, it appears that even though the Agent nodes show
> single open connections to the Collector, the Collector node appears to have
> a bunch of zombie TCP connections open back to the Agent nodes.
> i.e.
> "lsof -n | grep PORT" on the Agent node shows 1 established connection
> However, the Collector node shows hundreds of established connections for
> that same port which don't seem to tie up to any connections I can find on
> the Agent node.
>
> So we're concluding that the Collector node is somehow leaking connections.
>
> Has anyone seen this kind of thing before?
>
> Could this be related to https://issues.apache.org/jira/browse/FLUME-857?
> Or could this be a Thrift bug that could be avoided by switching to Avro
> sources/sinks?
>
> Any hints/tips are most welcome.
>
> Thanks,
>
> Frank Grimes



-- 
Zijad Purković
Dobrovoljnih davalaca krvi 3/19, Zavidovići
061/ 690 - 241

Mime
View raw message