Thanks Hari. Using Spool Dir, I could have remote flume agents write events to a remote dir and run rsync locally to sync a local dir with the remote dir and have local flume agent pick up events from the local dir.

But this way I am breaking the flume pipeline with rsync in the middle. I don't know how this will affect  flume features like reliability, scalability, etc.

-Majid


On Tuesday, December 2, 2014, Hari Shreedharan <hshreedharan@cloudera.com> wrote:
Not sure how that would be possible. You could use a Spool Dir Source if you want to write the data to files and then read it from there.

Thanks,
Hari


On Tue, Nov 25, 2014 at 11:00 AM, Majid Alfifi <majid.alfifi@gmail.com> wrote:

I have a typical flume pipeline that collects logs from online servers and aggregate them and push them down to HDFS. The typical configuration is to open a port on the local cluster so the online flume agent can send Avro events to.

Is it possible to have a flume agent on the local cluster basically "pulling" events from the online agent without the need to open a local port?

Best Regards,
Majid