flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: HDFS Sink performance
Date Fri, 24 Jul 2015 20:27:44 GMT
I am inclined to believe that this is a Spool Dir source issue than a
channel issue - which is considerably better (having a regression in one
source is better than having an issue in the channels which would affect
the entire framework)


Thanks,
Hari

On Fri, Jul 24, 2015 at 12:15 PM, Robert B Hamilton <robert.hamilton@gm.com>
wrote:

> I WAS saying just that, Roshan, but I was wrong!
>
> I was having an issue with the spooldir source which was spoiling the
> results for versions 1.5 and 1.6.   When I switch to an exec source there
> is not so much an issue and the hdfs sink/file channel performance of 1.5
> and 1.6 are not measurably worse than 1.3 for 25K/larger event size after
> all.  Well, I did say these were quick measurements...I will update the
> list after taking more careful tests...
>
> -----Original Message-----
> From: Roshan Naik [mailto:roshan@hortonworks.com]
> Sent: Thursday, July 23, 2015 2:16 PM
> To: user@flume.apache.org
> Subject: Re: HDFS Sink performance
>
> Robert: Are u saying that the MemCh perf with Null sink also exhibits the
> same perf degradation ?
>
> A side note: The Spillable channel has a faster performing memory channel
> (and spilling to disk can be disabled) but unfortunately there is an issue
> with its metrics publishing which is kind of hard to fix.
> -roshan
>
>
> On 7/23/15 12:00 PM, "Robert B Hamilton" <robert.hamilton@gm.com> wrote:
>
> >I now believe that Roshan is correct that the channel may be the place
> >to look.
> >
> >With tests using null sinks I had found that the channel was not much
> >of a factor with 1.3, but now that I check 1.5 and 1.6 with null sinks,
> >they still show the same pattern of performance degradation.  The
> >interesting thing is that I find similar performance hits both when
> >using file channel AND when using memory channel.  Looking forward to
> >Johny's findings.
> >
> >
> >From: Hari Shreedharan [mailto:hshreedharan@cloudera.com]
> >Sent: Thursday, July 23, 2015 12:33 PM
> >To: user@flume.apache.org
> >Subject: Re: HDFS Sink performance
> >
> >This is interesting. I believe Johny is actually looking into this
> >performance issue.
> >
> >
> >
> >Thanks,
> >Hari
> >
> >On Thu, Jul 23, 2015 at 9:27 AM, lohit <lohit.vijayarenu@gmail.com>
> wrote:
> >Majority of messages need not be persisted to disk for us. So, we are
> >interested in MemoryChannel.
> >There has been gradual performance degradation from 1.3.1 -> 1.4.0 ->
> >1.6.0.
> >See this graph below, were I have a constant stream of messages (blue
> >line). While this is happening I swap different versions of flumes for
> >agent.
> >Orange line shows messages dropped. (Flat line is when data is streamed
> >to HDFS) and I have marked flat lines with different versions.
> >
> >
> >
> >2015-07-22 19:48 GMT-07:00 Roshan Naik <roshan@hortonworks.com>:
> >
> >My guess is that most of you will probably use File channel in
> >production with HDFS sink? In which scenario the common observation
> >seems to be that the File channel becomes the primary bottleneck. Going
> >by Robert's observations too seems to have dropped also since v1.3.
> >
> >Robert,  can u confirm how many data dirs  were used for your readings
> >with FCh ?
> >
> >-roshan
> >
> >
> >
> >From: lohit <lohit.vijayarenu@gmail.com>
> >Reply-To: "user@flume.apache.org" <user@flume.apache.org>
> >Date: Wednesday, July 22, 2015 3:01 PM
> >To: "user@flume.apache.org" <user@flume.apache.org>
> >
> >Subject: Re: HDFS Sink performance
> >
> >Thanks for sharing these number Robert. Curious, I did the same
> >experiment.
> >Flume 1.3.1 version has higher throughput than 1.6.0 (I was able to get
> >sustained 60MB/s with Flume 1.3.1) No config or setup change, just
> >changing flume version shows this difference. We should probably look
> >at change set between 1.3.1 and 1.5 to see if there was any obvious
> >changes.
> >
> >2015-07-22 14:00 GMT-07:00 Robert B Hamilton <robert.hamilton@gm.com>:
> >Here is a comparison between versions 1.3, 1.5, and 1.6.
> >I would estimate that error bars are plus or minus 15%.
> >
> >All parameters are identical, as between runs all I change is the
> >version of flume.
> >Lohit¹s numbers are fairly consistent with this, because if we double
> >the sinks from my 4 to his 8 and assuming linear scalability we would
> >expect to get somewhere close to 30-40MB/s.
> >
> >It looks like the drop off is more pronounced for the larger event size.
> >This is of concern to us because we are looking at this for a high
> >volume feed with message sizes up to 80 kB.
> >
> >------------------------------------------
> >HDFSx4 sink, Memory channel
> >--------------------------------------
> >Payload     V1.3      v1.5     v1.6
> >(kB)              MB/s
> >----------      -----     -----    -----
> >1                    27         17         20
> >25                  56         15         15
> >
> >
> >
> >From: Hari Shreedharan [mailto:hshreedharan@cloudera.com]
> >Sent: Wednesday, July 22, 2015 1:27 PM
> >To: user@flume.apache.org
> >Subject: Re: HDFS Sink performance
> >
> >That is a bit disconcerting. Are you using the same HDFS setup and same
> >config for both tests? Would it be possible for you to take a look at
> >Flume 1.6.0? Such drops in performance should be taken care of.
> >
> >
> >
> >Thanks,
> >Hari
> >
> >On Wed, Jul 22, 2015 at 11:04 AM, Robert B Hamilton
> ><robert.hamilton@gm.com> wrote:
> >My mailer totally scrambled the numbers, probably by inserting special
> >characters.
> >Sorry, here are the actual results....
> >
> >All rates in MB/s
> >Payload in KB
> >
> >Flume 1.3.1
> >Payload   rate memchRate Fch
> >25                  34                      29
> >25                  31                  27.6
> >25                  50                  23.3
> >25                  46.5                  27.2
> >50                  31.3                  23.8
> >50                  37.4                  31.3
> >50                  32.3                  31.8
> >80                  30.5                  25.8
> >80                  46.2                  25.2
> >80                  39.1                  25.8
> >80                  56.5                  25.1
> >
> >Flume 1.5.
> >Payload  rate memchRate Fch
> >25                  18.7                  15.6
> >50                  18.3                  17.3
> >80                  18.4                   15.6
> >
> >-----Original Message-----
> >From: Robert B Hamilton [mailto:robert.hamilton@gm.com]
> >Sent: Wednesday, July 22, 2015 11:00 AM
> >To: user@flume.apache.org
> >Subject: RE: HDFS Sink performance
> >
> > I only see that kind of throughput for event sizes of 25kB to 50kB or
> >larger.
> >
> >These particular tests are done on flume version 1.3.1.
> >But because you asked,  I thought to do a few quick runs on 1.5.0.1 and
> >added those results below.  The results are significantly different for
> >1.5 and I wonder if this is a cause for concern.
> >
> >None of this has been peer reviewed so it should be considered as
> >tentative.
> >
> >As to the HDD, here is result of a quick and dirty dd test.
> >
> >  dd if=/dev/zero of=100M bs=1M count=100 conv=fsync oflag=sync
> >   104857600 bytes (105 MB) copied, 0.685646 s, 153 MB/s
> >
> >
> >Source data: each record consists of random ascii strings of constant
> >length (25k,50k,or 80k depending on the run).
> >Source: spooldir
> >Channel: file channel single dataDir, or memory channel.
> >Sink: four HDFS, SequenceFile, Text, Batch size=10, rollInterval=20
> >seconds.
> >
> >Batch size was kept small because of memory channel capacity.
> >Increasing batch size for file channel did not improve performance so I
> >kept it at 10.
> >
> >Here I have numbers for some runs where the payload is varied from
> >25K,50K, and 80K. I include memory channel for comparison.
> >
> >Multiple runs were peformed for each event size. As you can see the
> >throughput can vary from run to run because these particular
> >measurements were done on an environment that is not tightly
> >controlled.  Think of them as "in situ" measurements :)
> >
> >Flume 1.3.1 memory channel and file channel
> >-------------------------------------------------------
> >Payload  Rate memch Rate(filechl)
> >(kB)(MB/s)       (MB/s)
> >-----------------------------------------------------
> >253429
> >253127.6
> >255023.3
> >2546.527.2
> >5031.223.8
> >5037.431.3
> >5032.331.8
> >8030.525.8
> >8046.225.2
> >8039.125.8
> >8056.525.1
> >
> >
> >Flume 1.5 File Channel and Memory Channel
> >---------------------------------------------------
> >Event size  Rate memch Rate filech
> >(KB)        (MB/s)  (MB/s)
> >---------------------------------------------------
> >2518.715.6
> >5018.317.3
> >8018.415.6
> >
> >-----Original Message-----
> >From: Roshan Naik [mailto:roshan@hortonworks.com]
> >Sent: Friday, July 17, 2015 6:21 PM
> >To: user@flume.apache.org
> >Subject: Re: HDFS Sink performance
> >
> >I Updated the Flume wiki with my measurements. Also added section with
> >Hive sink measurements.
> >
> >https://cwiki.apache.org/confluence/display/FLUME/Performance+Measureme
> >nts
> >+
> >-+round+2
> >
> >
> >@Robert:
> >  What sort of a HDD are you using ?
> >  What is event size ?
> >  Which version of flume ?
> >
> >-roshan
> >
> >
> >
> >
> >On 7/17/15 12:51 PM, "Robert B Hamilton" <robert.hamilton@gm.com> wrote:
> >
> >>Our testing has shown up to 60MB/s to HDFS if we use up to 8 or 10
> >>sinks per agent, and with a file channel with a single dataDir.
> >>
> >>
> >>From: lohit [mailto:lohit.vijayarenu@gmail.com]
> >>Sent: Wednesday, July 15, 2015 11:11 AM
> >>To: user@flume.apache.org
> >>Subject: HDFS Sink performance
> >>
> >>Hello,
> >>
> >>Does anyone have some numbers which they can share around HDFS sink
> >>performance. From our testing, for single sink writing to HDFS
> >>(CompressedStream) and reading from MemoryChannel can only do about
> >>35000 events per second (each event is about 1K) in size. After
> >>compression this turns out to be ~10MB/s write stream to HDFS file.
> >>Which is pretty low. Our configuration looks like this
> >>
> >>agent.sinks.hdfsSink.type = hdfs
> >>agent.sinks.hdfsSink.channel = memoryChannel
> >>agent.sinks.hdfsSink.hdfs.path = /tmp/lohit
> >>agent.sinks.hdfsSink.hdfs.codeC = lzo
> >>agent.sinks.hdfsSink.hdfs.fileType = CompressedStream
> >>agent.sinks.hdfsSink.hdfs.writeFormat = Writable
> >>agent.sinks.hdfsSink.hdfs.rollInterval = 3600
> >>agent.sinks.hdfsSink.hdfs.rollSize = 1073741824
> >>agent.sinks.hdfsSink.hdfs.rollCount = 0
> >>agent.sinks.hdfsSink.hdfs.batchSize = 10000
> >>agent.sinks.hdfsSink.hdfs.txnEventMax = 10000
> >>
> >>agent.channels.memoryChannel.type = memory
> >>
> >>agent.channels.memoryChannel.capacity = 3000000
> >>agent.channels.memoryChannel.transactionCapacity = 10000
> >>
> >>--
> >>Have a Nice Day!
> >>Lohit
> >>
> >>
> >>Nothing in this message is intended to constitute an electronic
> >>signature unless a specific statement to the contrary is included in
> >>this message.
> >>
> >>Confidentiality Note: This message is intended only for the person or
> >>entity to which it is addressed. It may contain confidential and/or
> >>privileged material. Any review, transmission, dissemination or other
> >>use, or taking of any action in reliance upon this message by persons
> >>or entities other than the intended recipient is prohibited and may be
> >>unlawful. If you received this message in error, please contact the
> >>sender and delete it from your computer.
> >
> >
> >
> >Nothing in this message is intended to constitute an electronic
> >signature unless a specific statement to the contrary is included in this
> message.
> >
> >Confidentiality Note: This message is intended only for the person or
> >entity to which it is addressed. It may contain confidential and/or
> >privileged material. Any review, transmission, dissemination or other
> >use, or taking of any action in reliance upon this message by persons
> >or entities other than the intended recipient is prohibited and may be
> >unlawful. If you received this message in error, please contact the
> >sender and delete it from your computer.
> >
> >
> >Nothing in this message is intended to constitute an electronic
> >signature unless a specific statement to the contrary is included in this
> message.
> >
> >Confidentiality Note: This message is intended only for the person or
> >entity to which it is addressed. It may contain confidential and/or
> >privileged material. Any review, transmission, dissemination or other
> >use, or taking of any action in reliance upon this message by persons
> >or entities other than the intended recipient is prohibited and may be
> >unlawful. If you received this message in error, please contact the
> >sender and delete it from your computer.
> >
> >
> >
> >Nothing in this message is intended to constitute an electronic
> >signature unless a specific statement to the contrary is included in this
> message.
> >
> >Confidentiality Note: This message is intended only for the person or
> >entity to which it is addressed. It may contain confidential and/or
> >privileged material. Any review, transmission, dissemination or other
> >use, or taking of any action in reliance upon this message by persons
> >or entities other than the intended recipient is prohibited and may be
> >unlawful. If you received this message in error, please contact the
> >sender and delete it from your computer.
> >
> >
> >
> >
> >--
> >Have a Nice Day!
> >Lohit
> >
> >
> >
> >
> >--
> >Have a Nice Day!
> >Lohit
> >
> >
> >
> >Nothing in this message is intended to constitute an electronic
> >signature unless a specific statement to the contrary is included in this
> message.
> >
> >Confidentiality Note: This message is intended only for the person or
> >entity to which it is addressed. It may contain confidential and/or
> >privileged material. Any review, transmission, dissemination or other
> >use, or taking of any action in reliance upon this message by persons
> >or entities other than the intended recipient is prohibited and may be
> >unlawful. If you received this message in error, please contact the
> >sender and delete it from your computer.
>
>
>
> Nothing in this message is intended to constitute an electronic signature
> unless a specific statement to the contrary is included in this message.
>
> Confidentiality Note: This message is intended only for the person or
> entity to which it is addressed. It may contain confidential and/or
> privileged material. Any review, transmission, dissemination or other use,
> or taking of any action in reliance upon this message by persons or
> entities other than the intended recipient is prohibited and may be
> unlawful. If you received this message in error, please contact the sender
> and delete it from your computer.
>

Mime
View raw message