flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gwen Shapira <gshap...@cloudera.com>
Subject Re: beginner's question -- file source configuration
Date Mon, 09 Mar 2015 03:26:06 GMT
As stated in the docs, you'll need to have the timestamp in the event
header for HDFS to automatically place the events in the correct
directory.
This can be done using the timestamp interceptor.

You can see an example here:
https://github.com/hadooparchitecturebook/hadoop-arch-book/tree/master/ch09-clickstream/Flume

This example uses 2-tier architecture (i.e. one flume agent collecting
logs from web servers and the other writing to HDFS).
However, you can see how in client.conf the spooling-directory source
is configured with timestamp interceptor and in collector.conf the
HDFS sink has a parameterized target directory with the timestamp in
it.

Gwen


Gwen

On Sun, Mar 8, 2015 at 7:56 PM, Lin Ma <linlma@gmail.com> wrote:
> Thanks Ashish,
>
> One further question on HDFS sink. If I configure the destination directory
> on HDFS to be Year Month Day Hour, etc. pattern, Flume will put the data
> event it received automatically to the related directory and created new
> directory with time elapsed further? Or I have to setup some Key/Value
> headers event in order for HDFS sink to recognize event time and put into
> appropriate time based folder?
>
> regards,
> Lin
>
> On Sun, Mar 8, 2015 at 6:32 PM, Ashish <paliwalashish@gmail.com> wrote:
>>
>> Your understanding is correct :)
>>
>> On Mon, Mar 9, 2015 at 6:54 AM, Lin Ma <linlma@gmail.com> wrote:
>> > Thanks Ashish,
>> >
>> > Followed your guidance, and found below instructions of which have
>> > further
>> > questions to confirm with you, it seems we need to close the files and
>> > never
>> > touch it for Flume to process correctly, so not sure if it is good
>> > practice
>> > that -- (1) let the application write log file in existing way, like
>> > hourly
>> > or 5 mins pattern, (2) close and move the files to another directory as
>> > input Source for Flume Agent which Flume could process as Spooling
>> > Directory?
>> >
>> > “This source will watch the specified directory for new files, and will
>> > parse events out of new files as they appear. ”
>> >
>> > "
>> >
>> > If a file is written to after being placed into the spooling directory,
>> > Flume will print an error to its log file and stop processing.
>> > If a file name is reused at a later time, Flume will print an error to
>> > its
>> > log file and stop processing.
>> >
>> > "
>> >
>> > regards,
>> > Lin
>> >
>> > On Sun, Mar 8, 2015 at 12:23 AM, Ashish <paliwalashish@gmail.com> wrote:
>> >>
>> >> Please look at following
>> >> Spooling Directory Source
>> >> [http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source]
>> >> and
>> >> HDFS Sink (http://flume.apache.org/FlumeUserGuide.html#hdfs-sink)
>> >>
>> >> Spooling Directory Source need immutable files, means files should not
>> >> be written to once they are being consumed. In short your application
>> >> cannot write to the file being read by Flume.
>> >>
>> >> Log format is not an issue, as long as you don't want it to be
>> >> interpreted by Flume components. Since it's log assuming single log
>> >> per line with line separator at the end of line.
>> >>
>> >> You can also look at Exec source
>> >> (http://flume.apache.org/FlumeUserGuide.html#exec-source) for tailing
>> >> to a file being written by application. Documentation covers details
>> >> on all the links.
>> >>
>> >> HTH !
>> >>
>> >>
>> >> On Sun, Mar 8, 2015 at 12:32 PM, Lin Ma <linlma@gmail.com> wrote:
>> >> > Hi Flume masters,
>> >> >
>> >> > I want to install Flume on a box, and consume local log file as
>> >> > source
>> >> > and
>> >> > send to remote HDFS sink. The log format is private and text (not
>> >> > Avro
>> >> > or
>> >> > JSON format).
>> >> >
>> >> > I am reading the guide on Flume and many advanced Source
>> >> > configuration,
>> >> > wondering for the plain local log file source, any reference samples?
>> >> > And
>> >> > not sure if Flume could consume the local file while the application
>> >> > is
>> >> > still writing the log file? Thanks.
>> >> >
>> >> > regards,
>> >> > Lin
>> >>
>> >>
>> >>
>> >> --
>> >> thanks
>> >> ashish
>> >>
>> >> Blog: http://www.ashishpaliwal.com/blog
>> >> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>> >
>> >
>>
>>
>>
>> --
>> thanks
>> ashish
>>
>> Blog: http://www.ashishpaliwal.com/blog
>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>
>

Mime
View raw message