Thanks for your feedbacks.We have components to do data ingestion, effectively what flume sources do. That component collects data and dump it into files on boxes. The data files are then loaded into HDFS via 'hadoop fs -put' by a simple script. We want to build a resilient long-lived service in Java to load files in HDFS. That is how I come to know Flume.I understand that Flume is managing its transactions as events, not physical files. Is it possible to map files to logical events, thus achieve atomic writes?--On Tue, Apr 30, 2013 at 10:25 PM, Roshan Naik <firstname.lastname@example.org> wrote:
Are you sure you want to directly write to hdfs from the app that is generating data ? often in production, apps like web servers etc do not have direct access to HDFS. i am not sure that HDFS sink guarantees 'either fully written successfully or failed totally without any partial file blocks written' since each transaction does not translate into a separate file. so i think there could be some partially written transactions in case of transaction abort.This level of support for all-or-none at the file level is planned for what is currently referred to as the HCatalog sink https://issues.apache.org/jira/browse/FLUME-1734-roshanOn Tue, Apr 30, 2013 at 6:48 PM, Connor Woodson <email@example.com> wrote:
If you just want to write data to HDFS then Flume might not be the best thing to use; however, there is a Flume Embedded Agent that will embed Flume into your application. I don't believe it works yet with the HDFS sink, but some tinkering can likely make it work.