flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From R P <hadoo...@outlook.com>
Subject Flume HDFS sink memory requierment.
Date Tue, 09 Feb 2016 19:17:12 GMT
Hello All,

  Hope you all are having great time. Thanks for reading my question, I appreciate any suggestion/reply.

I am evaluating flume for HDFS write. We get sparse data which will be bucketed into thousands
of different logs. As this data is received sporadically through out the day we get into HDFS
small files problem.

To address this problem one solution is to use file size as the only condition for file close
using hdfs.rollSize.  As we might have thousands of files open for hours I have following

1. Will flume keep thousands of files open until hdfs.rollSize condition is met?

2. How much memory is used by HDFS sink when thousands of files are open at a time?

3. Is memory used for HDFS event buffer equal to data written on HDFS? e.g if thousands of
files to be written has total size of 500gb, will flume sink need 500gb memory size?

Thanks again for your input.


View raw message