flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: HDFS Sink performance
Date Wed, 22 Jul 2015 18:27:03 GMT
That is a bit disconcerting. Are you using the same HDFS setup and same
config for both tests? Would it be possible for you to take a look at Flume
1.6.0? Such drops in performance should be taken care of.


Thanks,
Hari

On Wed, Jul 22, 2015 at 11:04 AM, Robert B Hamilton <robert.hamilton@gm.com>
wrote:

> My mailer totally scrambled the numbers, probably by inserting special
> characters.
> Sorry, here are the actual results....
>
> All rates in MB/s
> Payload in KB
>
> Flume 1.3.1
> Payload   rate memchRate Fch
> 25                  34                      29
> 25                  31                  27.6
> 25                  50                  23.3
> 25                  46.5                  27.2
> 50                  31.3                  23.8
> 50                  37.4                  31.3
> 50                  32.3                  31.8
> 80                  30.5                  25.8
> 80                  46.2                  25.2
> 80                  39.1                  25.8
> 80                  56.5                  25.1
>
> Flume 1.5.
> Payload  rate memchRate Fch
> 25                  18.7                  15.6
> 50                  18.3                  17.3
> 80                  18.4                   15.6
>
> -----Original Message-----
> From: Robert B Hamilton [mailto:robert.hamilton@gm.com]
> Sent: Wednesday, July 22, 2015 11:00 AM
> To: user@flume.apache.org
> Subject: RE: HDFS Sink performance
>
>  I only see that kind of throughput for event sizes of 25kB to 50kB or
> larger.
>
> These particular tests are done on flume version 1.3.1.
> But because you asked,  I thought to do a few quick runs on 1.5.0.1 and
> added those results below.  The results are significantly different for 1.5
> and I wonder if this is a cause for concern.
>
> None of this has been peer reviewed so it should be considered as
> tentative.
>
> As to the HDD, here is result of a quick and dirty dd test.
>
>   dd if=/dev/zero of=100M bs=1M count=100 conv=fsync oflag=sync
>    104857600 bytes (105 MB) copied, 0.685646 s, 153 MB/s
>
>
> Source data: each record consists of random ascii strings of constant
> length (25k,50k,or 80k depending on the run).
> Source: spooldir
> Channel: file channel single dataDir, or memory channel.
> Sink: four HDFS, SequenceFile, Text, Batch size=10, rollInterval=20
> seconds.
>
> Batch size was kept small because of memory channel capacity. Increasing
> batch size for file channel did not improve performance so I kept it at 10.
>
> Here I have numbers for some runs where the payload is varied from
> 25K,50K, and 80K. I include memory channel for comparison.
>
> Multiple runs were peformed for each event size. As you can see the
> throughput can vary from run to run because these particular measurements
> were done on an environment that is not tightly controlled.  Think of them
> as "in situ" measurements :)
>
> Flume 1.3.1 memory channel and file channel
> -------------------------------------------------------
> Payload  Rate memch Rate(filechl)
> (kB)(MB/s)       (MB/s)
> -----------------------------------------------------
> 253429
> 253127.6
> 255023.3
> 2546.527.2
> 5031.223.8
> 5037.431.3
> 5032.331.8
> 8030.525.8
> 8046.225.2
> 8039.125.8
> 8056.525.1
>
>
> Flume 1.5 File Channel and Memory Channel
> ---------------------------------------------------
> Event size  Rate memch Rate filech
> (KB)        (MB/s)  (MB/s)
> ---------------------------------------------------
> 2518.715.6
> 5018.317.3
> 8018.415.6
>
> -----Original Message-----
> From: Roshan Naik [mailto:roshan@hortonworks.com]
> Sent: Friday, July 17, 2015 6:21 PM
> To: user@flume.apache.org
> Subject: Re: HDFS Sink performance
>
> I Updated the Flume wiki with my measurements. Also added section with
> Hive sink measurements.
>
> https://cwiki.apache.org/confluence/display/FLUME/Performance+Measurements+
> -+round+2
>
>
> @Robert:
>   What sort of a HDD are you using ?
>   What is event size ?
>   Which version of flume ?
>
> -roshan
>
>
>
>
> On 7/17/15 12:51 PM, "Robert B Hamilton" <robert.hamilton@gm.com> wrote:
>
> >Our testing has shown up to 60MB/s to HDFS if we use up to 8 or 10
> >sinks per agent, and with a file channel with a single dataDir.
> >
> >
> >From: lohit [mailto:lohit.vijayarenu@gmail.com]
> >Sent: Wednesday, July 15, 2015 11:11 AM
> >To: user@flume.apache.org
> >Subject: HDFS Sink performance
> >
> >Hello,
> >
> >Does anyone have some numbers which they can share around HDFS sink
> >performance. From our testing, for single sink writing to HDFS
> >(CompressedStream) and reading from MemoryChannel can only do about
> >35000 events per second (each event is about 1K) in size. After
> >compression this turns out to be ~10MB/s write stream to HDFS file.
> >Which is pretty low. Our configuration looks like this
> >
> >agent.sinks.hdfsSink.type = hdfs
> >agent.sinks.hdfsSink.channel = memoryChannel
> >agent.sinks.hdfsSink.hdfs.path = /tmp/lohit
> >agent.sinks.hdfsSink.hdfs.codeC = lzo
> >agent.sinks.hdfsSink.hdfs.fileType = CompressedStream
> >agent.sinks.hdfsSink.hdfs.writeFormat = Writable
> >agent.sinks.hdfsSink.hdfs.rollInterval = 3600
> >agent.sinks.hdfsSink.hdfs.rollSize = 1073741824
> >agent.sinks.hdfsSink.hdfs.rollCount = 0
> >agent.sinks.hdfsSink.hdfs.batchSize = 10000
> >agent.sinks.hdfsSink.hdfs.txnEventMax = 10000
> >
> >agent.channels.memoryChannel.type = memory
> >
> >agent.channels.memoryChannel.capacity = 3000000
> >agent.channels.memoryChannel.transactionCapacity = 10000
> >
> >--
> >Have a Nice Day!
> >Lohit
> >
> >
> >Nothing in this message is intended to constitute an electronic
> >signature unless a specific statement to the contrary is included in this
> message.
> >
> >Confidentiality Note: This message is intended only for the person or
> >entity to which it is addressed. It may contain confidential and/or
> >privileged material. Any review, transmission, dissemination or other
> >use, or taking of any action in reliance upon this message by persons
> >or entities other than the intended recipient is prohibited and may be
> >unlawful. If you received this message in error, please contact the
> >sender and delete it from your computer.
>
>
>
> Nothing in this message is intended to constitute an electronic signature
> unless a specific statement to the contrary is included in this message.
>
> Confidentiality Note: This message is intended only for the person or
> entity to which it is addressed. It may contain confidential and/or
> privileged material. Any review, transmission, dissemination or other use,
> or taking of any action in reliance upon this message by persons or
> entities other than the intended recipient is prohibited and may be
> unlawful. If you received this message in error, please contact the sender
> and delete it from your computer.
>
>
> Nothing in this message is intended to constitute an electronic signature
> unless a specific statement to the contrary is included in this message.
>
> Confidentiality Note: This message is intended only for the person or
> entity to which it is addressed. It may contain confidential and/or
> privileged material. Any review, transmission, dissemination or other use,
> or taking of any action in reliance upon this message by persons or
> entities other than the intended recipient is prohibited and may be
> unlawful. If you received this message in error, please contact the sender
> and delete it from your computer.
>

Mime
View raw message