flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: Flume 1.3.0 - NFS + File Channel Performance
Date Tue, 18 Dec 2012 16:43:06 GMT
We'd need those thread dumps to help confirm but I bet that FLUME-1609
results in a NFS call on each operation on the channel.

If that is true, that would explain why it works well on local disk.

Brock

On Tue, Dec 18, 2012 at 10:17 AM, Brock Noland <brock@cloudera.com> wrote:
> Hi,
>
> Hmm, yes in general performance is not going to be great over NFS, but
> there haven't been any FC changes that stick out here.
>
> Could you take 10 thread dumps of the agent running the file channel
> and 10 thread dumps of the agent sending data to the agent with the
> file channel? (You can address them to myself directly since the list
> won't take attachements.)
>
> Are there any patterns, like it works for 40 seconds then times out
> and then works for 39 seconds, etc?
>
> Brock
>
> On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf
> <Rudolf.Rakos@morganstanley.com> wrote:
>> Hi,
>>
>>
>>
>> We’ve run into a strange problem regarding NFS and File Channel performance
>> while evaluating the new version of Flume.
>>
>> We had no issues with the previous version (1.2.0).
>>
>>
>>
>> Our configuration looks like this:
>>
>> ·         Node1:
>> (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel ->
Avro
>> Sink (-> Node 2)
>>
>> ·         Node2:
>> (Node1s ->) Avro Source -> File Channel -> Custom Sink
>>
>>
>>
>> Both the checkpoint and the data directories of the File Channels are on NFS
>> shares. We use the same share for checkpoint and data directories, but
>> different shares for each Node. Unfortunately it is not an option for us to
>> use local directories.
>>
>> The events are about 1KB large, and the batch sizes are the following:
>>
>> ·         Avro RPC Clients: 1000
>>
>> ·         Custom Sources: 2000
>>
>> ·         Avro Sink: 5000
>>
>> ·         Custom Sink: 10000
>>
>>
>>
>> We are experiencing very slow File Channel performance compared to the
>> previous version, and high amount of timeouts (almost always) in the Avro
>> RPC Clients and the Avro Sink.
>>
>> Something like this:
>>
>> ·         2012-12-18 15:43:31,828
>> [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN
>> org.apache.flume.sink.AvroSink - Failed to send event batch
>> org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: ***,
>> port: *** }: Failed to send batch
>>         at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>         ***
>>         at
>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>> [flume-ng-core-1.3.0.jar:1.3.0]
>>         at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
>> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient {
>> host: ***, port: *** }: Handshake timed out after 20000ms
>>         at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>         at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>         ... 5 common frames omitted
>> Caused by: java.util.concurrent.TimeoutException: null
>>         at
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
>> ~[na:1.6.0_31]
>>         at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>> ~[na:1.6.0_31]
>>         at
>> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:278)
>> ~[flume-ng-sdk-1.3.0.jar:1.3.0]
>>         ... 6 common frames omitted
>>
>> (I had to remove some details, sorry for that.)
>>
>>
>>
>> We managed to narrow down the root cause of the issue to the File Channel,
>> because:
>>
>> ·         Everything works fine if we switch to the Memory Channel or to the
>> Old File Channel (1.2.0).
>>
>> ·         Everything works fine if we use local directories.
>>
>> We’ve tested this on multiple different PCs (both Windows and Linux).
>>
>>
>>
>> I spent the day debugging and profiling, but I could not find anything worth
>> mentioning (nothing with excessive CPU usage, no threads are waiting too
>> much, etc…). The only problem is that File Channel takes and puts take way
>> more time than with the previous version.
>>
>>
>>
>>
>>
>> Could someone please try the File Channel on an NFS share?
>>
>> Does anyone have similar issues?
>>
>>
>>
>> Thank you for your help.
>>
>>
>>
>> Regards,
>>
>> Rudolf
>>
>>
>>
>> Rudolf Rakos
>> Morgan Stanley | ISG Technology
>> Lechner Odon fasor 8 | Floor 06
>> Budapest, 1095
>> Phone: +36 1 881-4011
>> Rudolf.Rakos@morganstanley.com
>>
>>
>> Be carbon conscious. Please consider our environment before printing this
>> email.
>>
>>
>>
>>
>> ________________________________
>>
>> NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions
>> or views contained herein are not intended to be, and do not constitute,
>> advice within the meaning of Section 975 of the Dodd-Frank Wall Street
>> Reform and Consumer Protection Act. If you have received this communication
>> in error, please destroy all electronic and paper copies and notify the
>> sender immediately. Mistransmission is not intended to waive confidentiality
>> or privilege. Morgan Stanley reserves the right, to the extent permitted
>> under applicable law, to monitor electronic communications. This message is
>> subject to terms available at the following link:
>> http://www.morganstanley.com/disclaimers If you cannot access these links,
>> please notify us by reply message and we will send the contents to you. By
>> messaging with Morgan Stanley you consent to the foregoing.
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Mime
View raw message