flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: HDFS Sink performance
Date Wed, 15 Jul 2015 20:41:44 GMT
Roshan - how about posting that on the Flume wiki?


Thanks,
Hari

On Wed, Jul 15, 2015 at 1:07 PM, Roshan Naik <roshan@hortonworks.com> wrote:

>  Lohit,
> You may want to search the mailing list for 'Flume perf measurements' .
> You should find the recent measurements I posted.
> -roshan
>
>   From: lohit <lohit.vijayarenu@gmail.com>
> Reply-To: "user@flume.apache.org" <user@flume.apache.org>
> Date: Wednesday, July 15, 2015 11:19 AM
> To: "user@flume.apache.org" <user@flume.apache.org>
> Subject: Re: HDFS Sink performance
>
>   Thanks for the reply Hari. Multiple Sinks make sense, but this would
> also mean there is lot more files on HDFS. I will try multiple sinks and
> see how fast this can go to.
> Given that single HDFS stream can do much higher throughput, may be there
> is way to have threadpool for SinkRunner-PollingRunner-DefaultSinkProcessor
> instead of single thread per sink.
>
> 2015-07-15 11:11 GMT-07:00 Hari Shreedharan <hshreedharan@cloudera.com>:
>
>> Hi Lohit,
>>
>>  HDFS sinks (in fact, most sinks) are single-threaded by design. This is
>> meant to make writing the sinks easier, but all channels can handle
>> multiple sinks reading from them. So to improve the efficiency, you
>> basically configure several sinks which read off the same channel. Make
>> sure that each sink though writes to files with different HDFS paths or
>> different file prefixes (else HDFS client API will complain about leases).
>>
>>
>> Thanks,
>> Hari
>>
>> On Wed, Jul 15, 2015 at 9:10 AM, lohit <lohit.vijayarenu@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>>  Does anyone have some numbers which they can share around HDFS sink
>>> performance. From our testing, for single sink writing to HDFS
>>> (CompressedStream) and reading from MemoryChannel can only do about 35000
>>> events per second (each event is about 1K) in size. After compression this
>>> turns out to be ~10MB/s write stream to HDFS file. Which is pretty low. Our
>>> configuration looks like this
>>>
>>>  agent.sinks.hdfsSink.type = hdfs
>>> agent.sinks.hdfsSink.channel = memoryChannel
>>> agent.sinks.hdfsSink.hdfs.path = /tmp/lohit
>>> agent.sinks.hdfsSink.hdfs.codeC = lzo
>>> agent.sinks.hdfsSink.hdfs.fileType = CompressedStream
>>> agent.sinks.hdfsSink.hdfs.writeFormat = Writable
>>> agent.sinks.hdfsSink.hdfs.rollInterval = 3600
>>> agent.sinks.hdfsSink.hdfs.rollSize = 1073741824
>>> agent.sinks.hdfsSink.hdfs.rollCount = 0
>>> agent.sinks.hdfsSink.hdfs.batchSize = 10000
>>> agent.sinks.hdfsSink.hdfs.txnEventMax = 10000
>>>
>>>  agent.channels.memoryChannel.type = memory
>>>
>>>  agent.channels.memoryChannel.capacity = 3000000
>>> agent.channels.memoryChannel.transactionCapacity = 10000
>>>
>>>  --
>>> Have a Nice Day!
>>> Lohit
>>>
>>
>>
>
>
>  --
> Have a Nice Day!
> Lohit
>

Mime
View raw message