flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Naik <ros...@hortonworks.com>
Subject Re: HDFS Sink performance
Date Thu, 23 Jul 2015 19:16:24 GMT
Robert: Are u saying that the MemCh perf with Null sink also exhibits the
same perf degradation ?

A side note: The Spillable channel has a faster performing memory channel
(and spilling to disk can be disabled) but unfortunately there is an issue
with its metrics publishing which is kind of hard to fix.
-roshan


On 7/23/15 12:00 PM, "Robert B Hamilton" <robert.hamilton@gm.com> wrote:

>I now believe that Roshan is correct that the channel may be the place to
>look.
>
>With tests using null sinks I had found that the channel was not much of
>a factor with 1.3, but now that I check 1.5 and 1.6 with null sinks, they
>still show the same pattern of performance degradation.  The interesting
>thing is that I find similar performance hits both when using file
>channel AND when using memory channel.  Looking forward to Johny's
>findings.
>
>
>From: Hari Shreedharan [mailto:hshreedharan@cloudera.com]
>Sent: Thursday, July 23, 2015 12:33 PM
>To: user@flume.apache.org
>Subject: Re: HDFS Sink performance
>
>This is interesting. I believe Johny is actually looking into this
>performance issue.
>
>
>
>Thanks,
>Hari
>
>On Thu, Jul 23, 2015 at 9:27 AM, lohit <lohit.vijayarenu@gmail.com> wrote:
>Majority of messages need not be persisted to disk for us. So, we are
>interested in MemoryChannel.
>There has been gradual performance degradation from 1.3.1 -> 1.4.0 ->
>1.6.0.
>See this graph below, were I have a constant stream of messages (blue
>line). While this is happening I swap different versions of flumes for
>agent.
>Orange line shows messages dropped. (Flat line is when data is streamed
>to HDFS) and I have marked flat lines with different versions.
>
>
>
>2015-07-22 19:48 GMT-07:00 Roshan Naik <roshan@hortonworks.com>:
>
>My guess is that most of you will probably use File channel in production
>with HDFS sink? In which scenario the common observation seems to be that
>the File channel becomes the primary bottleneck. Going by Robert's
>observations too seems to have dropped also since v1.3.
>
>Robert,  can u confirm how many data dirs  were used for your readings
>with FCh ?
>
>-roshan
>
>
>
>From: lohit <lohit.vijayarenu@gmail.com>
>Reply-To: "user@flume.apache.org" <user@flume.apache.org>
>Date: Wednesday, July 22, 2015 3:01 PM
>To: "user@flume.apache.org" <user@flume.apache.org>
>
>Subject: Re: HDFS Sink performance
>
>Thanks for sharing these number Robert. Curious, I did the same
>experiment.
>Flume 1.3.1 version has higher throughput than 1.6.0 (I was able to get
>sustained 60MB/s with Flume 1.3.1)
>No config or setup change, just changing flume version shows this
>difference. We should probably look at change set between 1.3.1 and 1.5
>to see if there was any obvious changes.
>
>2015-07-22 14:00 GMT-07:00 Robert B Hamilton <robert.hamilton@gm.com>:
>Here is a comparison between versions 1.3, 1.5, and 1.6.
>I would estimate that error bars are plus or minus 15%.
>
>All parameters are identical, as between runs all I change is the version
>of flume.
>Lohit¹s numbers are fairly consistent with this, because if we double the
>sinks from my 4 to his 8 and assuming linear scalability we would expect
>to get somewhere close to 30-40MB/s.
>
>It looks like the drop off is more pronounced for the larger event size.
>This is of concern to us because we are looking at this for a high volume
>feed with message sizes up to 80 kB.
>
>------------------------------------------
>HDFSx4 sink, Memory channel
>--------------------------------------
>Payload     V1.3      v1.5     v1.6
>(kB)              MB/s
>----------      -----     -----    -----
>1                    27         17         20
>25                  56         15         15
>
>
>
>From: Hari Shreedharan [mailto:hshreedharan@cloudera.com]
>Sent: Wednesday, July 22, 2015 1:27 PM
>To: user@flume.apache.org
>Subject: Re: HDFS Sink performance
>
>That is a bit disconcerting. Are you using the same HDFS setup and same
>config for both tests? Would it be possible for you to take a look at
>Flume 1.6.0? Such drops in performance should be taken care of.
>
>
>
>Thanks,
>Hari
>
>On Wed, Jul 22, 2015 at 11:04 AM, Robert B Hamilton
><robert.hamilton@gm.com> wrote:
>My mailer totally scrambled the numbers, probably by inserting special
>characters.
>Sorry, here are the actual results....
>
>All rates in MB/s
>Payload in KB
>
>Flume 1.3.1
>Payload   rate memchRate Fch
>25                  34                      29
>25                  31                  27.6
>25                  50                  23.3
>25                  46.5                  27.2
>50                  31.3                  23.8
>50                  37.4                  31.3
>50                  32.3                  31.8
>80                  30.5                  25.8
>80                  46.2                  25.2
>80                  39.1                  25.8
>80                  56.5                  25.1
>
>Flume 1.5.
>Payload  rate memchRate Fch
>25                  18.7                  15.6
>50                  18.3                  17.3
>80                  18.4                   15.6
>
>-----Original Message-----
>From: Robert B Hamilton [mailto:robert.hamilton@gm.com]
>Sent: Wednesday, July 22, 2015 11:00 AM
>To: user@flume.apache.org
>Subject: RE: HDFS Sink performance
>
> I only see that kind of throughput for event sizes of 25kB to 50kB or
>larger.
>
>These particular tests are done on flume version 1.3.1.
>But because you asked,  I thought to do a few quick runs on 1.5.0.1 and
>added those results below.  The results are significantly different for
>1.5 and I wonder if this is a cause for concern.
>
>None of this has been peer reviewed so it should be considered as
>tentative.
>
>As to the HDD, here is result of a quick and dirty dd test.
>
>  dd if=/dev/zero of=100M bs=1M count=100 conv=fsync oflag=sync
>   104857600 bytes (105 MB) copied, 0.685646 s, 153 MB/s
>
>
>Source data: each record consists of random ascii strings of constant
>length (25k,50k,or 80k depending on the run).
>Source: spooldir
>Channel: file channel single dataDir, or memory channel.
>Sink: four HDFS, SequenceFile, Text, Batch size=10, rollInterval=20
>seconds.
>
>Batch size was kept small because of memory channel capacity. Increasing
>batch size for file channel did not improve performance so I kept it at
>10.
>
>Here I have numbers for some runs where the payload is varied from
>25K,50K, and 80K. I include memory channel for comparison.
>
>Multiple runs were peformed for each event size. As you can see the
>throughput can vary from run to run because these particular measurements
>were done on an environment that is not tightly controlled.  Think of
>them as "in situ" measurements :)
>
>Flume 1.3.1 memory channel and file channel
>-------------------------------------------------------
>Payload  Rate memch Rate(filechl)
>(kB)(MB/s)       (MB/s)
>-----------------------------------------------------
>253429
>253127.6
>255023.3
>2546.527.2
>5031.223.8
>5037.431.3
>5032.331.8
>8030.525.8
>8046.225.2
>8039.125.8
>8056.525.1
>
>
>Flume 1.5 File Channel and Memory Channel
>---------------------------------------------------
>Event size  Rate memch Rate filech
>(KB)        (MB/s)  (MB/s)
>---------------------------------------------------
>2518.715.6
>5018.317.3
>8018.415.6
>
>-----Original Message-----
>From: Roshan Naik [mailto:roshan@hortonworks.com]
>Sent: Friday, July 17, 2015 6:21 PM
>To: user@flume.apache.org
>Subject: Re: HDFS Sink performance
>
>I Updated the Flume wiki with my measurements. Also added section with
>Hive sink measurements.
>
>https://cwiki.apache.org/confluence/display/FLUME/Performance+Measurements
>+
>-+round+2
>
>
>@Robert:
>  What sort of a HDD are you using ?
>  What is event size ?
>  Which version of flume ?
>
>-roshan
>
>
>
>
>On 7/17/15 12:51 PM, "Robert B Hamilton" <robert.hamilton@gm.com> wrote:
>
>>Our testing has shown up to 60MB/s to HDFS if we use up to 8 or 10
>>sinks per agent, and with a file channel with a single dataDir.
>>
>>
>>From: lohit [mailto:lohit.vijayarenu@gmail.com]
>>Sent: Wednesday, July 15, 2015 11:11 AM
>>To: user@flume.apache.org
>>Subject: HDFS Sink performance
>>
>>Hello,
>>
>>Does anyone have some numbers which they can share around HDFS sink
>>performance. From our testing, for single sink writing to HDFS
>>(CompressedStream) and reading from MemoryChannel can only do about
>>35000 events per second (each event is about 1K) in size. After
>>compression this turns out to be ~10MB/s write stream to HDFS file.
>>Which is pretty low. Our configuration looks like this
>>
>>agent.sinks.hdfsSink.type = hdfs
>>agent.sinks.hdfsSink.channel = memoryChannel
>>agent.sinks.hdfsSink.hdfs.path = /tmp/lohit
>>agent.sinks.hdfsSink.hdfs.codeC = lzo
>>agent.sinks.hdfsSink.hdfs.fileType = CompressedStream
>>agent.sinks.hdfsSink.hdfs.writeFormat = Writable
>>agent.sinks.hdfsSink.hdfs.rollInterval = 3600
>>agent.sinks.hdfsSink.hdfs.rollSize = 1073741824
>>agent.sinks.hdfsSink.hdfs.rollCount = 0
>>agent.sinks.hdfsSink.hdfs.batchSize = 10000
>>agent.sinks.hdfsSink.hdfs.txnEventMax = 10000
>>
>>agent.channels.memoryChannel.type = memory
>>
>>agent.channels.memoryChannel.capacity = 3000000
>>agent.channels.memoryChannel.transactionCapacity = 10000
>>
>>--
>>Have a Nice Day!
>>Lohit
>>
>>
>>Nothing in this message is intended to constitute an electronic
>>signature unless a specific statement to the contrary is included in
>>this message.
>>
>>Confidentiality Note: This message is intended only for the person or
>>entity to which it is addressed. It may contain confidential and/or
>>privileged material. Any review, transmission, dissemination or other
>>use, or taking of any action in reliance upon this message by persons
>>or entities other than the intended recipient is prohibited and may be
>>unlawful. If you received this message in error, please contact the
>>sender and delete it from your computer.
>
>
>
>Nothing in this message is intended to constitute an electronic signature
>unless a specific statement to the contrary is included in this message.
>
>Confidentiality Note: This message is intended only for the person or
>entity to which it is addressed. It may contain confidential and/or
>privileged material. Any review, transmission, dissemination or other
>use, or taking of any action in reliance upon this message by persons or
>entities other than the intended recipient is prohibited and may be
>unlawful. If you received this message in error, please contact the
>sender and delete it from your computer.
>
>
>Nothing in this message is intended to constitute an electronic signature
>unless a specific statement to the contrary is included in this message.
>
>Confidentiality Note: This message is intended only for the person or
>entity to which it is addressed. It may contain confidential and/or
>privileged material. Any review, transmission, dissemination or other
>use, or taking of any action in reliance upon this message by persons or
>entities other than the intended recipient is prohibited and may be
>unlawful. If you received this message in error, please contact the
>sender and delete it from your computer.
>
>
>
>Nothing in this message is intended to constitute an electronic signature
>unless a specific statement to the contrary is included in this message.
>
>Confidentiality Note: This message is intended only for the person or
>entity to which it is addressed. It may contain confidential and/or
>privileged material. Any review, transmission, dissemination or other
>use, or taking of any action in reliance upon this message by persons or
>entities other than the intended recipient is prohibited and may be
>unlawful. If you received this message in error, please contact the
>sender and delete it from your computer.
>
>
>
>
>--
>Have a Nice Day!
>Lohit
>
>
>
>
>--
>Have a Nice Day!
>Lohit
>
>
>
>Nothing in this message is intended to constitute an electronic signature
>unless a specific statement to the contrary is included in this message.
>
>Confidentiality Note: This message is intended only for the person or
>entity to which it is addressed. It may contain confidential and/or
>privileged material. Any review, transmission, dissemination or other
>use, or taking of any action in reliance upon this message by persons or
>entities other than the intended recipient is prohibited and may be
>unlawful. If you received this message in error, please contact the
>sender and delete it from your computer.


Mime
View raw message