flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasad Mujumdar <pras...@cloudera.com>
Subject Re: Sending log files to HDFS
Date Fri, 02 Mar 2012 20:13:15 GMT
Hi Som,

   Please see my comments inline.

thanks
Prasad

On Thu, Mar 1, 2012 at 3:54 AM, shekhar sharma <shekhar2581@gmail.com>wrote:

> Hello,
> I am doing a POC, where in using Flume i am trying to send log files to
> HDFS. I have done the basic testing of sending the contents of the file.
> But i have few doubts:
> I am using Flume -728 and hadoop-0.20.2 pseudomode
>
> 1) Is there any way by which renaming of file on HDFS side can be stopped?
> I mean when Flume agent writes the data to the HDFS, it write the contents
> in a file FlumeData.#
> Moreover while extracting the data appended in HDFS file, it consist of
> various java classes name, and the contents are not in proper format? Is it
> because, agent picks up single line of file as event and write to channel
> and finally write the data in sequence file format.
>
The HDFS sink has a built in file roller ie. it periodically closes the
current file and opens new one. By default the filename is FluemData.# You
can configure the prefix by setting hdfs.filePrefix = <someFormat>
Also you can change the sequence files to text files by setting
hdfs.fileType = DataStream
Not sure if that answers your question about "renaming files" ...


> 2) Till now i was operating on Files. Now if the files are placed in a
> directory, then how do i accomplish in sending the files to HDFS. And if
> new files are arriving, how the flume agent would come to know that a new
> file has arrived?
>
> My Set up is something like this: i have configured an agent as follows:
>
> agent1.properties
>
> # Sources
>
> agent1.sources.log.type = exec
>
> agent1.sources.log.command = /usr/bin/tail -F /etc/passwd
>
> agent1.sources.log.channels = log
>
>
>
> # Channgels
>
> agent1.channels.log.type = memory
>
>
>
> # Sinks
>
> agent1.sinks.log.type = avro
>
> agent1.sinks.log.hostname = namenode
>
> agent1.sinks.log.port = 41414
>
> agent1.sinks.log.batch-size = 10
>
> agent1.sinks.log.runner.type = polling
>
> agent1.sinks.log.runner.polling.interval = 5
>
> agent1.sinks.log.channel = log
>
>
>
> # Load everything
>
> agent1.sources = log
>
> agent1.sinks = log
>
> agent1.channels = log
>
>
>
> agent2.properties
>
> # Sources
>
> agent2.sources.log.type = avro
>
> agent2.sources.log.bind = 0.0.0.0
>
> agent2.sources.log.port = 41414
>
> agent2.sources.log.channels = log
>
>
>
> # Channgels
>
> agent2.channels.log.type = memory
>
>
>
> # Sinks
>
> agent2.sinks.log.type = hdfs
>
> agent2.sinks.log.hdfs.path = hdfs://namenode:54310/usr/hadoop/
>
> agent2.sinks.log.hdfs.batchsize = 10
>
> agent2.sinks.log.runner.type = polling
>
> agent2.sinks.log.runner.polling.interval = 10
>
> agent2.sinks.log.channel = log
>
>
>
> # Load everything
>
> agent2.sources = log
>
> agent2.sinks = log
>
> agent2.channels = log
>
> Now after starting the agent2 and then running agent1, i can see that
> files are generated under HDFS.. but it consist of some junk values also..
>
> I guess the agent1 case you are seeing a sequence file, try with text
files as mentioned above. The agent2 needs an avro client reading file
(flume-ng avro-client -F <file>). Currently it only takes a single file at
a time. If you need to send a multiple files, then you can perhaps script
it. Feel free to log a jira for avro client to support a directory ..
An important thing to understand is that Flume is a data collection and
aggregation framework and not a file transfer tool. Flume's data flow unit
is an event which is created by the client sending data to flume agent. In
your test case, the avro-client that read the fie line-by-line. Hence each
event here is a single line of file. The flume agent  doesn't know about
the file boundaries. The data getting written to HDFS has its own directory
bucketing and files rotation.
Does your use case really need to transfer files as-is ?


Regards,
> Som
>

Mime
View raw message