flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Grimes <frankgrime...@yahoo.com>
Subject Re: Collector node failing with java.net.SocketException: Too many open files
Date Thu, 26 Jan 2012 17:04:34 GMT
It's 1024, but we really shouldn't  need to up that value... doing so would just delay the
failure.


On 2012-01-26, at 11:57 AM, Zijad Purkovic wrote:

> Hi Frank,
> 
> Can you show output of ulimit -n from your collector node?
> 
> On Thu, Jan 26, 2012 at 4:51 PM, Frank Grimes <frankgrimes97@yahoo.com> wrote:
>> Hi All,
>> 
>> We are using flume-0.9.5
>> (specifically, http://svn.apache.org/repos/asf/incubator/flume/trunk@1179275)
>> and occasionally our Collector node accumulates too many open TCP
>> connections and starts madly logging the following errors:
>> 
>> WARN org.apache.thrift.server.TSaneThreadPoolServer: Transport error
>> occurred during acceptance of message.
>> org.apache.thrift.transport.TTransportException: java.net.SocketException:
>> Too many open files
>>        at
>> org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:139)
>>        at
>> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
>>        at
>> org.apache.thrift.server.TSaneThreadPoolServer$1.run(TSaneThreadPoolServer.java:175)
>> Caused by: java.net.SocketException: Too many open files
>>        at java.net.PlainSocketImpl.socketAccept(Native Method)
>>        at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
>>        at java.net.ServerSocket.implAccept(ServerSocket.java:462)
>>        at java.net.ServerSocket.accept(ServerSocket.java:430)
>>        at
>> org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:134)
>>        ... 2 more
>> 
>> 
>> This quickly fills up the disk as the log file grows to multiple gigabytes
>> in size.
>> 
>> After some investigation, it appears that even though the Agent nodes show
>> single open connections to the Collector, the Collector node appears to have
>> a bunch of zombie TCP connections open back to the Agent nodes.
>> i.e.
>> "lsof -n | grep PORT" on the Agent node shows 1 established connection
>> However, the Collector node shows hundreds of established connections for
>> that same port which don't seem to tie up to any connections I can find on
>> the Agent node.
>> 
>> So we're concluding that the Collector node is somehow leaking connections.
>> 
>> Has anyone seen this kind of thing before?
>> 
>> Could this be related to https://issues.apache.org/jira/browse/FLUME-857?
>> Or could this be a Thrift bug that could be avoided by switching to Avro
>> sources/sinks?
>> 
>> Any hints/tips are most welcome.
>> 
>> Thanks,
>> 
>> Frank Grimes
> 
> 
> 
> -- 
> Zijad Purković
> Dobrovoljnih davalaca krvi 3/19, Zavidovići
> 061/ 690 - 241


Mime
View raw message