flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lohit <lohit.vijayar...@gmail.com>
Subject Re: HDFS Sink performance
Date Thu, 23 Jul 2015 16:27:38 GMT
Majority of messages need not be persisted to disk for us. So, we are
interested in MemoryChannel.
There has been gradual performance degradation from 1.3.1 -> 1.4.0 ->
1.6.0.
See this graph below, were I have a constant stream of messages (blue
line). While this is happening I swap different versions of flumes for
agent.
Orange line shows messages dropped. (Flat line is when data is streamed to
HDFS) and I have marked flat lines with different versions.



2015-07-22 19:48 GMT-07:00 Roshan Naik <roshan@hortonworks.com>:

>
>  My guess is that most of you will probably use File channel in
> production with HDFS sink? In which scenario the common observation seems
> to be that the File channel becomes the primary bottleneck. Going by
> Robert's observations too seems to have dropped also since v1.3.
>
>  Robert,  can u confirm how many data dirs  were used for your readings
> with FCh ?
>
>  -roshan
>
>
>
>   From: lohit <lohit.vijayarenu@gmail.com>
> Reply-To: "user@flume.apache.org" <user@flume.apache.org>
> Date: Wednesday, July 22, 2015 3:01 PM
> To: "user@flume.apache.org" <user@flume.apache.org>
>
> Subject: Re: HDFS Sink performance
>
>   Thanks for sharing these number Robert. Curious, I did the same
> experiment.
> Flume 1.3.1 version has higher throughput than 1.6.0 (I was able to get
> sustained 60MB/s with Flume 1.3.1)
> No config or setup change, just changing flume version shows this
> difference. We should probably look at change set between 1.3.1 and 1.5 to
> see if there was any obvious changes.
>
> 2015-07-22 14:00 GMT-07:00 Robert B Hamilton <robert.hamilton@gm.com>:
>
>> Here is a comparison between versions 1.3, 1.5, and 1.6.
>> I would estimate that error bars are plus or minus 15%.
>>
>> All parameters are identical, as between runs all I change is the version
>> of flume.
>> Lohit’s numbers are fairly consistent with this, because if we double the
>> sinks from my 4 to his 8 and assuming linear scalability we would expect to
>> get somewhere close to 30-40MB/s.
>>
>> It looks like the drop off is more pronounced for the larger event size.
>> This is of concern to us because we are looking at this for a high volume
>> feed with message sizes up to 80 kB.
>>
>> ------------------------------------------
>> HDFSx4 sink, Memory channel
>> --------------------------------------
>> Payload     V1.3      v1.5     v1.6
>> (kB)              MB/s
>> ----------      -----     -----    -----
>> 1                    27         17         20
>> 25                  56         15         15
>>
>>
>>
>> From: Hari Shreedharan [mailto:hshreedharan@cloudera.com]
>> Sent: Wednesday, July 22, 2015 1:27 PM
>> To: user@flume.apache.org
>> Subject: Re: HDFS Sink performance
>>
>> That is a bit disconcerting. Are you using the same HDFS setup and same
>> config for both tests? Would it be possible for you to take a look at Flume
>> 1.6.0? Such drops in performance should be taken care of.
>>
>>
>>
>> Thanks,
>> Hari
>>
>> On Wed, Jul 22, 2015 at 11:04 AM, Robert B Hamilton <
>> robert.hamilton@gm.com> wrote:
>> My mailer totally scrambled the numbers, probably by inserting special
>> characters.
>> Sorry, here are the actual results....
>>
>> All rates in MB/s
>> Payload in KB
>>
>> Flume 1.3.1
>> Payload   rate memchRate Fch
>> 25                  34                      29
>> 25                  31                  27.6
>> 25                  50                  23.3
>> 25                  46.5                  27.2
>> 50                  31.3                  23.8
>> 50                  37.4                  31.3
>> 50                  32.3                  31.8
>> 80                  30.5                  25.8
>> 80                  46.2                  25.2
>> 80                  39.1                  25.8
>> 80                  56.5                  25.1
>>
>> Flume 1.5.
>> Payload  rate memchRate Fch
>> 25                  18.7                  15.6
>> 50                  18.3                  17.3
>> 80                  18.4                   15.6
>>
>> -----Original Message-----
>> From: Robert B Hamilton [mailto:robert.hamilton@gm.com]
>> Sent: Wednesday, July 22, 2015 11:00 AM
>>  To: user@flume.apache.org
>> Subject: RE: HDFS Sink performance
>>
>>  I only see that kind of throughput for event sizes of 25kB to 50kB or
>> larger.
>>
>> These particular tests are done on flume version 1.3.1.
>> But because you asked,  I thought to do a few quick runs on 1.5.0.1 and
>> added those results below.  The results are significantly different for 1.5
>> and I wonder if this is a cause for concern.
>>
>> None of this has been peer reviewed so it should be considered as
>> tentative.
>>
>> As to the HDD, here is result of a quick and dirty dd test.
>>
>>   dd if=/dev/zero of=100M bs=1M count=100 conv=fsync oflag=sync
>>    104857600 bytes (105 MB) copied, 0.685646 s, 153 MB/s
>>
>>
>> Source data: each record consists of random ascii strings of constant
>> length (25k,50k,or 80k depending on the run).
>> Source: spooldir
>> Channel: file channel single dataDir, or memory channel.
>> Sink: four HDFS, SequenceFile, Text, Batch size=10, rollInterval=20
>> seconds.
>>
>> Batch size was kept small because of memory channel capacity. Increasing
>> batch size for file channel did not improve performance so I kept it at 10.
>>
>> Here I have numbers for some runs where the payload is varied from
>> 25K,50K, and 80K. I include memory channel for comparison.
>>
>> Multiple runs were peformed for each event size. As you can see the
>> throughput can vary from run to run because these particular measurements
>> were done on an environment that is not tightly controlled.  Think of them
>> as "in situ" measurements :)
>>
>> Flume 1.3.1 memory channel and file channel
>> -------------------------------------------------------
>> Payload  Rate memch Rate(filechl)
>> (kB)(MB/s)       (MB/s)
>> -----------------------------------------------------
>> 253429
>> 253127.6
>> 255023.3
>> 2546.527.2
>> 5031.223.8
>> 5037.431.3
>> 5032.331.8
>> 8030.525.8
>> 8046.225.2
>> 8039.125.8
>> 8056.525.1
>>
>>
>> Flume 1.5 File Channel and Memory Channel
>> ---------------------------------------------------
>> Event size  Rate memch Rate filech
>> (KB)        (MB/s)  (MB/s)
>> ---------------------------------------------------
>> 2518.715.6
>> 5018.317.3
>> 8018.415.6
>>
>> -----Original Message-----
>>  From: Roshan Naik [mailto:roshan@hortonworks.com]
>> Sent: Friday, July 17, 2015 6:21 PM
>> To: user@flume.apache.org
>> Subject: Re: HDFS Sink performance
>>
>> I Updated the Flume wiki with my measurements. Also added section with
>> Hive sink measurements.
>>
>>
>> https://cwiki.apache.org/confluence/display/FLUME/Performance+Measurements+
>> -+round+2
>> <https://cwiki.apache.org/confluence/display/FLUME/Performance+Measurements+-+round+2>
>>
>>
>> @Robert:
>>   What sort of a HDD are you using ?
>>   What is event size ?
>>   Which version of flume ?
>>
>> -roshan
>>
>>
>>
>>
>> On 7/17/15 12:51 PM, "Robert B Hamilton" <robert.hamilton@gm.com> wrote:
>>
>> >Our testing has shown up to 60MB/s to HDFS if we use up to 8 or 10
>> >sinks per agent, and with a file channel with a single dataDir.
>> >
>> >
>> >From: lohit [mailto:lohit.vijayarenu@gmail.com]
>> >Sent: Wednesday, July 15, 2015 11:11 AM
>> >To: user@flume.apache.org
>>  >Subject: HDFS Sink performance
>> >
>> >Hello,
>> >
>> >Does anyone have some numbers which they can share around HDFS sink
>> >performance. From our testing, for single sink writing to HDFS
>> >(CompressedStream) and reading from MemoryChannel can only do about
>> >35000 events per second (each event is about 1K) in size. After
>> >compression this turns out to be ~10MB/s write stream to HDFS file.
>> >Which is pretty low. Our configuration looks like this
>> >
>> >agent.sinks.hdfsSink.type = hdfs
>> >agent.sinks.hdfsSink.channel = memoryChannel
>> >agent.sinks.hdfsSink.hdfs.path = /tmp/lohit
>> >agent.sinks.hdfsSink.hdfs.codeC = lzo
>> >agent.sinks.hdfsSink.hdfs.fileType = CompressedStream
>> >agent.sinks.hdfsSink.hdfs.writeFormat = Writable
>> >agent.sinks.hdfsSink.hdfs.rollInterval = 3600
>> >agent.sinks.hdfsSink.hdfs.rollSize = 1073741824
>> >agent.sinks.hdfsSink.hdfs.rollCount = 0
>> >agent.sinks.hdfsSink.hdfs.batchSize = 10000
>> >agent.sinks.hdfsSink.hdfs.txnEventMax = 10000
>> >
>> >agent.channels.memoryChannel.type = memory
>> >
>> >agent.channels.memoryChannel.capacity = 3000000
>> >agent.channels.memoryChannel.transactionCapacity = 10000
>> >
>> >--
>> >Have a Nice Day!
>> >Lohit
>> >
>> >
>> >Nothing in this message is intended to constitute an electronic
>> >signature unless a specific statement to the contrary is included in
>> this message.
>> >
>> >Confidentiality Note: This message is intended only for the person or
>> >entity to which it is addressed. It may contain confidential and/or
>> >privileged material. Any review, transmission, dissemination or other
>> >use, or taking of any action in reliance upon this message by persons
>> >or entities other than the intended recipient is prohibited and may be
>> >unlawful. If you received this message in error, please contact the
>> >sender and delete it from your computer.
>>
>>
>>
>> Nothing in this message is intended to constitute an electronic signature
>> unless a specific statement to the contrary is included in this message.
>>
>> Confidentiality Note: This message is intended only for the person or
>> entity to which it is addressed. It may contain confidential and/or
>> privileged material. Any review, transmission, dissemination or other use,
>> or taking of any action in reliance upon this message by persons or
>> entities other than the intended recipient is prohibited and may be
>> unlawful. If you received this message in error, please contact the sender
>> and delete it from your computer.
>>
>>
>> Nothing in this message is intended to constitute an electronic signature
>> unless a specific statement to the contrary is included in this message.
>>
>> Confidentiality Note: This message is intended only for the person or
>> entity to which it is addressed. It may contain confidential and/or
>> privileged material. Any review, transmission, dissemination or other use,
>> or taking of any action in reliance upon this message by persons or
>> entities other than the intended recipient is prohibited and may be
>> unlawful. If you received this message in error, please contact the sender
>> and delete it from your computer.
>>
>>
>>
>> Nothing in this message is intended to constitute an electronic signature
>> unless a specific statement to the contrary is included in this message.
>>
>> Confidentiality Note: This message is intended only for the person or
>> entity to which it is addressed. It may contain confidential and/or
>> privileged material. Any review, transmission, dissemination or other use,
>> or taking of any action in reliance upon this message by persons or
>> entities other than the intended recipient is prohibited and may be
>> unlawful. If you received this message in error, please contact the sender
>> and delete it from your computer.
>>
>
>
>
>  --
> Have a Nice Day!
> Lohit
>



-- 
Have a Nice Day!
Lohit

Mime
View raw message