flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sadananda Hegde <saduhe...@gmail.com>
Subject Re: picking up new files in Flume NG
Date Wed, 17 Oct 2012 23:51:35 GMT
That's awesome! Patrick,. Thank you so much. That would tremendously help
us.

We are currently using Flume NG 1.2.0.  Will we be able to use spooldir on
that version? or do we have to upgrade to latest version?

Thanks,
Sadu

On Tue, Oct 16, 2012 at 11:37 PM, Patrick Wendell <pwendell@gmail.com>wrote:

> Hey Sadu, your use case is exactly what I'm writing this for. I'm
> hoping this patch will get committed within a few days, we're on a
> last rev of reviews.
>
> - Patrick
>
> On Tue, Oct 16, 2012 at 10:47 AM, Brock Noland <brock@cloudera.com> wrote:
> > Correct, it's only available in that patch, from the RB it looks like
> > it's not too far off from being committed.
> >
> > Brock
> >
> > On Tue, Oct 16, 2012 at 12:00 PM, Sadananda Hegde <saduhegde@gmail.com>
> wrote:
> >> Yes, It is very similar.
> >>
> >> The spool directory will keep getting new files. We need to scan
> through the
> >> directory, send the data in the existing files to HDFS , cleanup the
> files
> >> (delete / move/ rename, etc) and scan for new files again. The Spooldir
> >> source is not available yet, right?
> >>
> >> Thanks,
> >> Sadu
> >>
> >>
> >> On Tue, Oct 16, 2012 at 10:11 AM, Brock Noland <brock@cloudera.com>
> wrote:
> >>>
> >>> Sounds like https://issues.apache.org/jira/browse/FlUME-1425  ?
> >>>
> >>> Brock
> >>>
> >>> On Mon, Oct 15, 2012 at 11:37 PM, Sadananda Hegde <saduhegde@gmail.com
> >
> >>> wrote:
> >>> > Hello,
> >>> >
> >>> > I have a scenario where in the client application is continuously
> >>> > pushing
> >>> > xml messages. Actually the application is writing these messages to
> >>> > files
> >>> > (new files; same directory). So we will be keep getting new files
> >>> > throughout
> >>> > the day. I am trying to configure Flume agents on these applcation
> >>> > servers
> >>> > (4 of them) to pick up the new data and transfer them to HDFS on a
> >>> > hadoop
> >>> > cluster. How should I configure my source to pick up new files (and
> >>> > exclude
> >>> > the files that have been processed already)? I don't think Exec
> source
> >>> > with
> >>> > tail  -F will work in this scenario because data is not getting
> added to
> >>> > existing files; rather new files get created.
> >>> >
> >>> > Thank you very much for your time and support.
> >>> >
> >>> > Sadu
> >>>
> >>>
> >>>
> >>> --
> >>> Apache MRUnit - Unit testing MapReduce -
> >>> http://incubator.apache.org/mrunit/
> >>
> >>
> >
> >
> >
> > --
> > Apache MRUnit - Unit testing MapReduce -
> http://incubator.apache.org/mrunit/
>

Mime
View raw message