flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shekhar sharma <shekhar2...@gmail.com>
Subject Sending log files to HDFS
Date Thu, 01 Mar 2012 11:54:36 GMT
I am doing a POC, where in using Flume i am trying to send log files to
HDFS. I have done the basic testing of sending the contents of the file.
But i have few doubts:
I am using Flume -728 and hadoop-0.20.2 pseudomode

1) Is there any way by which renaming of file on HDFS side can be stopped?
I mean when Flume agent writes the data to the HDFS, it write the contents
in a file FlumeData.#
Moreover while extracting the data appended in HDFS file, it consist of
various java classes name, and the contents are not in proper format? Is it
because, agent picks up single line of file as event and write to channel
and finally write the data in sequence file format.

2) Till now i was operating on Files. Now if the files are placed in a
directory, then how do i accomplish in sending the files to HDFS. And if
new files are arriving, how the flume agent would come to know that a new
file has arrived?

My Set up is something like this: i have configured an agent as follows:


# Sources

agent1.sources.log.type = exec

agent1.sources.log.command = /usr/bin/tail -F /etc/passwd

agent1.sources.log.channels = log

# Channgels

agent1.channels.log.type = memory

# Sinks

agent1.sinks.log.type = avro

agent1.sinks.log.hostname = namenode

agent1.sinks.log.port = 41414

agent1.sinks.log.batch-size = 10

agent1.sinks.log.runner.type = polling

agent1.sinks.log.runner.polling.interval = 5

agent1.sinks.log.channel = log

# Load everything

agent1.sources = log

agent1.sinks = log

agent1.channels = log


# Sources

agent2.sources.log.type = avro

agent2.sources.log.bind =

agent2.sources.log.port = 41414

agent2.sources.log.channels = log

# Channgels

agent2.channels.log.type = memory

# Sinks

agent2.sinks.log.type = hdfs

agent2.sinks.log.hdfs.path = hdfs://namenode:54310/usr/hadoop/

agent2.sinks.log.hdfs.batchsize = 10

agent2.sinks.log.runner.type = polling

agent2.sinks.log.runner.polling.interval = 10

agent2.sinks.log.channel = log

# Load everything

agent2.sources = log

agent2.sinks = log

agent2.channels = log

Now after starting the agent2 and then running agent1, i can see that files
are generated under HDFS.. but it consist of some junk values also..


View raw message