flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raj Kumar <rajkumartheone...@gmail.com>
Subject Appending data into HDFS Sink!
Date Mon, 19 Jan 2015 13:29:10 GMT
Hello guys!

I'm new to Flume and this group, so please be patience with me :-)

I have a Flume which stream data into HDFS sink (appends to same file),
which I could "hdfs dfs -cat" and see it from HDFS. However, when I run
MapReduce job on that file (.tmp), it only picks up the first batch that
was flushed (bacthSize = 100) into HDFS. The rest are not being picked up,
although I could cat and see the rest. When I execute the MapReduce job
after the file is rolled(closed), it's picking up all data.

Do you know why MR job is failing to find the rest of the batch even though
it exists.

Best regards,


View raw message