flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Naik <ros...@hortonworks.com>
Subject Re: HDFS Sink performance
Date Fri, 17 Jul 2015 02:10:04 GMT
JVM heap was perhaps 8GB or so. Amount of actively used memory tends to remains stable... increasing
it by much won't improve perf.
I did not measure the compression ratio.

Good to know you are seeing similar numbers.

-roshan


From: lohit <lohit.vijayarenu@gmail.com<mailto:lohit.vijayarenu@gmail.com>>
Reply-To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Date: Wednesday, July 15, 2015 10:43 PM
To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Subject: Re: HDFS Sink performance

Thanks for information Roshan. I was able to find your email.
>From your experiment the best you could get was 538K message for single agent which you
mentioned was about ~250MB/s. Do you know what was compression ratio? Also how much memory
did you give for agent?
These numbers are similar to what we are seeing. WIth 2 sinks we see about 50K (1K messages)
so ~50MB/s.

2015-07-15 13:45 GMT-07:00 Roshan Naik <roshan@hortonworks.com<mailto:roshan@hortonworks.com>>:
Yes.. My bad.. Been meaning to do it... will try to do it his week.
-roshan

From: Hari Shreedharan <hshreedharan@cloudera.com<mailto:hshreedharan@cloudera.com>>
Reply-To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Date: Wednesday, July 15, 2015 1:41 PM

To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Subject: Re: HDFS Sink performance

Roshan - how about posting that on the Flume wiki?


Thanks,
Hari

On Wed, Jul 15, 2015 at 1:07 PM, Roshan Naik <roshan@hortonworks.com<mailto:roshan@hortonworks.com>>
wrote:
Lohit,
You may want to search the mailing list for 'Flume perf measurements' . You should find the
recent measurements I posted.
-roshan

From: lohit <lohit.vijayarenu@gmail.com<mailto:lohit.vijayarenu@gmail.com>>
Reply-To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Date: Wednesday, July 15, 2015 11:19 AM
To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Subject: Re: HDFS Sink performance

Thanks for the reply Hari. Multiple Sinks make sense, but this would also mean there is lot
more files on HDFS. I will try multiple sinks and see how fast this can go to.
Given that single HDFS stream can do much higher throughput, may be there is way to have threadpool
for SinkRunner-PollingRunner-DefaultSinkProcessor instead of single thread per sink.

2015-07-15 11:11 GMT-07:00 Hari Shreedharan <hshreedharan@cloudera.com<mailto:hshreedharan@cloudera.com>>:
Hi Lohit,

HDFS sinks (in fact, most sinks) are single-threaded by design. This is meant to make writing
the sinks easier, but all channels can handle multiple sinks reading from them. So to improve
the efficiency, you basically configure several sinks which read off the same channel. Make
sure that each sink though writes to files with different HDFS paths or different file prefixes
(else HDFS client API will complain about leases).


Thanks,
Hari

On Wed, Jul 15, 2015 at 9:10 AM, lohit <lohit.vijayarenu@gmail.com<mailto:lohit.vijayarenu@gmail.com>>
wrote:
Hello,

Does anyone have some numbers which they can share around HDFS sink performance. From our
testing, for single sink writing to HDFS (CompressedStream) and reading from MemoryChannel
can only do about 35000 events per second (each event is about 1K) in size. After compression
this turns out to be ~10MB/s write stream to HDFS file. Which is pretty low. Our configuration
looks like this

agent.sinks.hdfsSink.type = hdfs
agent.sinks.hdfsSink.channel = memoryChannel
agent.sinks.hdfsSink.hdfs.path = /tmp/lohit
agent.sinks.hdfsSink.hdfs.codeC = lzo
agent.sinks.hdfsSink.hdfs.fileType = CompressedStream
agent.sinks.hdfsSink.hdfs.writeFormat = Writable
agent.sinks.hdfsSink.hdfs.rollInterval = 3600
agent.sinks.hdfsSink.hdfs.rollSize = 1073741824
agent.sinks.hdfsSink.hdfs.rollCount = 0
agent.sinks.hdfsSink.hdfs.batchSize = 10000
agent.sinks.hdfsSink.hdfs.txnEventMax = 10000

agent.channels.memoryChannel.type = memory

agent.channels.memoryChannel.capacity = 3000000
agent.channels.memoryChannel.transactionCapacity = 10000

--
Have a Nice Day!
Lohit




--
Have a Nice Day!
Lohit




--
Have a Nice Day!
Lohit

Mime
View raw message