flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Chavez <pcha...@ntent.com>
Subject RE: Simple- Just copying plain files into the cluster (hdfs) using flume - possible?
Date Tue, 03 Feb 2015 00:11:13 GMT
Flume doesn’t really address this use case. Even the spooling directory source will decompose
the file into individual events (one per line per event by default).


From: Bob Metelsky [mailto:bob.metelsky@gmail.com]
Sent: Monday, February 02, 2015 3:49 PM
To: user@flume.apache.org
Subject: Re: Simple- Just copying plain files into the cluster (hdfs) using flume - possible?

Steve - I appreciate you time on this...

Yes, I want to use flume to copy .xml  or .whatever files from a server outside the cluster
to hdfs. That server does l have flume installed on it

Id like the same behavior as "spooling directory" but from a remote machine --> to hdfs

So, from all my reading flume looks like it completely designed for streaming "live" logs
and program outputs...

Doesn't seem to be known for  being a filewatcher and grabbing files as they show up, then
shiping and writing to hdfs

Of can it?

Ok I can think fragmentation with individual "small" files but doesn't "spool directory behaviour"
face the same issue?

I've done quite a bit of reading but one can easily get into the weeds :) - All I need to
do is this simple task.

Thanks



On Mon, Feb 2, 2015 at 5:17 PM, Steve Morin <steve.morin@gmail.com<mailto:steve.morin@gmail.com>>
wrote:
So you want 1 to 1 replication of the logs to HDFS?

As a footnote people usually don't do this because the log files are often too small (think
fragmentation) which causes performance problems when used on Hadoop

On Feb 2, 2015, at 13:30, Bob Metelsky <bob.metelsky@gmail.com<mailto:bob.metelsky@gmail.com>>
wrote:
Hi I have a simple requirement

on server1 (NOT in the cluster, but has flume installed)
I have a process that constantly generates xml files in a known directory

I need to transfer them to server2 (IN the hadoop cluster)
and into hdfs as xml files

from what Im reading avro, thrift rpc, et all - are designed for other uses

Is there a way to have flume just copy over plain files? txt, xml...
Im thinking there should be but I cant find it

The closest I see is the "spooling directory" but that seems to be the files are already inside
the cluster.

Can flume do this? Is there an example,I've read the flume documentation and nothing is jumping
out

Thanks!

Mime
View raw message