flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Lord <jl...@cloudera.com>
Subject Re: Simple- Just copying plain files into the cluster (hdfs) using flume - possible?
Date Tue, 03 Feb 2015 02:14:56 GMT
I would be curious to hear what you think...

On Mon, Feb 2, 2015 at 6:10 PM, Bob Metelsky <bob.metelsky@gmail.com> wrote:

> Jeff - very cool... Installed it, looks great. Ill have to play with it.
> Im afraid this may not be mature enough to use in the enterprise yet.
> Possibly It can handle my requirement, maybe Im wrong.  Ill have to play
> around
>
> Thanks
>
> [image: Inline image 1]
>
> On Mon, Feb 2, 2015 at 7:42 PM, Jeff Lord <jlord@cloudera.com> wrote:
>
>> Bob,
>>
>> You may want to have a look at Apache Nifi.
>>
>> http://ingest.tips/2014/12/22/getting-started-with-apache-nifi/
>>
>> Regards,
>>
>> Jeff
>>
>> On Mon, Feb 2, 2015 at 3:49 PM, Bob Metelsky <bob.metelsky@gmail.com>
>> wrote:
>>
>>> Steve - I appreciate you time on this...
>>>
>>> Yes, I want to use flume to copy .xml  or .whatever files from a server
>>> outside the cluster to hdfs. That server does l have flume installed on it
>>>
>>> Id like the same behavior as "spooling directory" but from a remote
>>> machine --> to hdfs
>>>
>>> So, from all my reading flume looks like it completely designed for
>>> streaming "live" logs and program outputs...
>>>
>>> Doesn't seem to be known for  being a filewatcher and grabbing files as
>>> they show up, then shiping and writing to hdfs
>>>
>>> Of can it?
>>>
>>> Ok I can think fragmentation with individual "small" files but doesn't
>>> "spool directory behaviour" face the same issue?
>>>
>>> I've done quite a bit of reading but one can easily get into the weeds
>>> :) - All I need to do is this simple task.
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Mon, Feb 2, 2015 at 5:17 PM, Steve Morin <steve.morin@gmail.com>
>>> wrote:
>>>
>>>> So you want 1 to 1 replication of the logs to HDFS?
>>>>
>>>> As a footnote people usually don't do this because the log files are
>>>> often too small (think fragmentation) which causes performance problems
>>>> when used on Hadoop
>>>>
>>>> On Feb 2, 2015, at 13:30, Bob Metelsky <bob.metelsky@gmail.com> wrote:
>>>>
>>>> Hi I have a simple requirement
>>>>
>>>> on server1 (NOT in the cluster, but has flume installed)
>>>> I have a process that constantly generates xml files in a known
>>>> directory
>>>>
>>>> I need to transfer them to server2 (IN the hadoop cluster)
>>>> and into hdfs as xml files
>>>>
>>>> from what Im reading avro, thrift rpc, et all - are designed for other
>>>> uses
>>>>
>>>> Is there a way to have flume just copy over plain files? txt, xml...
>>>> Im thinking there should be but I cant find it
>>>>
>>>> The closest I see is the "spooling directory" but that seems to be the
>>>> files are already inside the cluster.
>>>>
>>>> Can flume do this? Is there an example,I've read the flume
>>>> documentation and nothing is jumping out
>>>>
>>>> Thanks!
>>>>
>>>>
>>>
>>
>

Mime
View raw message