flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject Re: Import files from a directory on remote machine
Date Thu, 17 Apr 2014 18:06:32 GMT
Hmm... yeah.. I felt a bit uncomfortable with the 'tail -F' solution.  But
then going back to your original suggestion:

'Perhaps you could use rsync to copy the files somewhere that you have
write access to?':  This will work with files that have been populated
completely and will no longer change, correct?  What about the file that is
currently getting written to?  Is there some sort of 'file watching'
mechanism equivalent to 'tail -F' in Flume?


On Thu, Apr 17, 2014 at 10:11 AM, Jeff Lord <jlord@cloudera.com> wrote:

> Using the exec source with a tail -f is not considered a production
> solution.
> It mainly exists for testing purposes.
>
>
> On Thu, Apr 17, 2014 at 7:03 AM, Laurance George <
> laurance.w.george@gmail.com> wrote:
>
>> If you can NFS mount that directory to your local machine with flume it
>> sounds like what you've listed out would work well.
>>
>>
>> On Thu, Apr 17, 2014 at 2:54 AM, Something Something <
>> mailinglists19@gmail.com> wrote:
>>
>>> If I am going to 'rsync' a file from remote host & copy it to hdfs via
>>> Flume, then why use Flume?  I can rsync & then just do a 'hadoop fs -put',
>>> no?  I must be missing something.  I guess, the only benefit of using Flume
>>> is that I can add Interceptors if I want to.  Current requirements don't
>>> need that.  We just want to copy data as is.
>>>
>>> Here's the real use case:   An application is writing to xyz.log file.
>>> Once this file gets over certain size it gets rolled over to xyz1.log & so
>>> on.  Kinda like Log4j.  What we really want is as soon as a line gets
>>> written to xyz.log, it should go to HDFS via Flume.
>>>
>>> Can I do something like this?
>>>
>>> 1)  Share the log directory under Linux.
>>> 2)  Use
>>> test1.sources.mylog.type = exec
>>> test1.sources.mylog.command = tail -F /home/user1/shares/logs/xyz.log
>>>
>>> I believe this will work, but is this the right way?  Thanks for your
>>> help.
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Apr 16, 2014 at 5:51 PM, Laurance George <
>>> laurance.w.george@gmail.com> wrote:
>>>
>>>> Agreed with Jeff.  Rsync + cron ( if it needs to be regular) is
>>>> probably your best bet to ingest files from a remote machine that you only
>>>> have read access to.  But then again you're sorta stepping outside of the
>>>> use case of flume at some level here as rsync is now basically a part of
>>>> your flume topology.  However, if you just need to back-fill old log data
>>>> then this is perfect!  In fact, it's what I do myself.
>>>>
>>>>
>>>> On Wed, Apr 16, 2014 at 8:46 PM, Jeff Lord <jlord@cloudera.com> wrote:
>>>>
>>>>> The spooling directory source runs as part of the agent.
>>>>> The source also needs write access to the files as it renames them
>>>>> upon completion of ingest. Perhaps you could use rsync to copy the files
>>>>> somewhere that you have write access to?
>>>>>
>>>>>
>>>>> On Wed, Apr 16, 2014 at 5:26 PM, Something Something <
>>>>> mailinglists19@gmail.com> wrote:
>>>>>
>>>>>> Thanks Jeff.  This is useful.  Can the spoolDir be on a different
>>>>>> machine?  We may have to setup a different process to copy files
into
>>>>>> 'spoolDir', right?  Note:  We have 'read only' access to these files.
 Any
>>>>>> recommendations about this?
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 16, 2014 at 5:16 PM, Jeff Lord <jlord@cloudera.com>wrote:
>>>>>>
>>>>>>> http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Apr 16, 2014 at 5:14 PM, Something Something <
>>>>>>> mailinglists19@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Needless to say I am newbie to Flume, but I've got a basic
flow
>>>>>>>> working in which I am importing a log file from my linux
box to hdfs.  I am
>>>>>>>> using
>>>>>>>>
>>>>>>>> a1.sources.r1.command = tail -F /var/log/xyz.log
>>>>>>>>
>>>>>>>> which is working like a stream of messages.  This is good!
>>>>>>>>
>>>>>>>> Now what I want to do is copy log files from a directory
on a
>>>>>>>> remote machine on a regular basis.  For example:
>>>>>>>>
>>>>>>>> username@machinename:/var/log/logdir/<multiple files>
>>>>>>>>
>>>>>>>> One way to do it is to simply 'scp' files from the remote
directory
>>>>>>>> into my box on a regular basis, but what's the best way to
do this in
>>>>>>>> Flume?  Please let me know.
>>>>>>>>
>>>>>>>> Thanks for the help.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Laurance George
>>>>
>>>
>>>
>>
>>
>> --
>> Laurance George
>>
>
>

Mime
View raw message