flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gintautas Sulskus <gintautas.suls...@gmail.com>
Subject Use case for Flume
Date Tue, 05 Sep 2017 12:00:18 GMT

I have a question regarding Flume suitability for a particular use case.

Task: There is an incoming constant stream of links that point to files.
Those files to be fetched and stored in HDFS.

Desired implementation:

1. Each link to a file is stored in Kafka queue Q1.
2. Flume A1.source monitors Q1 for new links.
3. Upon retrieving a link from Q1, A1.source fetches the file. The file
eventually is stored in HDFS by A1.sink

My concern here is a seemingly overloaded functionality of A1.source. The
A1.source would have to perform two activities: 1.) to periodically poll
queue Q1 for new links to files and then  2.) fetch those files.

What do you think? Is there a cleaner way to achieve this, e.g. by using an
interceptor to fetch files? Would this be appropriate?


View raw message