flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Lord <jl...@cloudera.com>
Subject Re: Simple- Just copying plain files into the cluster (hdfs) using flume - possible?
Date Tue, 03 Feb 2015 00:42:01 GMT
Bob,

You may want to have a look at Apache Nifi.

http://ingest.tips/2014/12/22/getting-started-with-apache-nifi/

Regards,

Jeff

On Mon, Feb 2, 2015 at 3:49 PM, Bob Metelsky <bob.metelsky@gmail.com> wrote:

> Steve - I appreciate you time on this...
>
> Yes, I want to use flume to copy .xml  or .whatever files from a server
> outside the cluster to hdfs. That server does l have flume installed on it
>
> Id like the same behavior as "spooling directory" but from a remote
> machine --> to hdfs
>
> So, from all my reading flume looks like it completely designed for
> streaming "live" logs and program outputs...
>
> Doesn't seem to be known for  being a filewatcher and grabbing files as
> they show up, then shiping and writing to hdfs
>
> Of can it?
>
> Ok I can think fragmentation with individual "small" files but doesn't
> "spool directory behaviour" face the same issue?
>
> I've done quite a bit of reading but one can easily get into the weeds :)
> - All I need to do is this simple task.
>
> Thanks
>
>
>
> On Mon, Feb 2, 2015 at 5:17 PM, Steve Morin <steve.morin@gmail.com> wrote:
>
>> So you want 1 to 1 replication of the logs to HDFS?
>>
>> As a footnote people usually don't do this because the log files are
>> often too small (think fragmentation) which causes performance problems
>> when used on Hadoop
>>
>> On Feb 2, 2015, at 13:30, Bob Metelsky <bob.metelsky@gmail.com> wrote:
>>
>> Hi I have a simple requirement
>>
>> on server1 (NOT in the cluster, but has flume installed)
>> I have a process that constantly generates xml files in a known directory
>>
>> I need to transfer them to server2 (IN the hadoop cluster)
>> and into hdfs as xml files
>>
>> from what Im reading avro, thrift rpc, et all - are designed for other
>> uses
>>
>> Is there a way to have flume just copy over plain files? txt, xml...
>> Im thinking there should be but I cant find it
>>
>> The closest I see is the "spooling directory" but that seems to be the
>> files are already inside the cluster.
>>
>> Can flume do this? Is there an example,I've read the flume documentation
>> and nothing is jumping out
>>
>> Thanks!
>>
>>
>

Mime
View raw message