I'm new to Flume and this group, so please be patience with me :-)
I have a Flume which stream data into HDFS sink (appends to same file), which I could "hdfs dfs -cat" and see it from HDFS. However, when I run MapReduce job on that file (.tmp), it only picks up the first batch that was flushed (bacthSize = 100) into HDFS. The rest are not being picked up, although I could cat and see the rest. When I execute the MapReduce job after the file is rolled(closed), it's picking up all data.
Do you know why MR job is failing to find the rest of the batch even though it exists.