flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ed <edor...@gmail.com>
Subject Re: Transferring another server using flume
Date Fri, 31 Jan 2014 04:18:34 GMT
Hi Burak,

Unfortunately I don't have any experience with Scribe so can't provide any
advice there.  I briefly checked out the Github site for it and it did not
look like there is much (if any) activity on that project at this point.  I
think all of the Flume sources use a push model (rather than Pull)  so I
think it'll be tough if you can't install any software/scripts on the
remote servers you're trying to collect from to push data to Flume.  Can
you configure the remote servers to write their logs to some sort of shared
storage?  Other than that I'm not sure if there are other rsync like
programs you could try to use to pull the logs and get them into a Flume
Source like the File Spooler.  Maybe someone the mailing list with more
Linux tool experience will have some suggestions on rsync alternatives that
might work for you.



On Thu, Jan 30, 2014 at 9:34 AM, burakkk <burak.isikli@gmail.com> wrote:

> Hi Ed,
> Syslog isn't available for the remote machines and remote machines aren't
> desired to install any application or library as possible. I have to pull
> data from remote servers without depending on anything remotely.
> The problem with rsync is that on the remote servers so many small files
> are generating that rsync get stuck in some point. It doesn't fail but it's
> just waiting for something doing nothing. It means it's related to getting
> the files from the remote servers.
> After a brief review of flume, using scribe+flume may solve my problem.
> What do you think?
> Thanks
> Best regards...
> On Thu, Jan 30, 2014 at 1:58 AM, ed <edorsey@gmail.com> wrote:
>> Hi Burak,
>> Do the machines with the logs on them have syslog available  (e.g.,
>> rsyslog for RedHat/CentOS)?  Can the remote servers do any kind of push or
>> do you have to pull data from them?  If you you have a syslog daemon
>> available on the remote servers then I would try configuring those to send
>> the logs to the Flume multiport syslog TCP source.
>> In regards to pulling data from the remote servers, what part of rsync is
>> causing issues  (assuming your using rsync to pull data)?  Is the problem
>> with rsync itself in regards to getting the files from the remote servers
>> or is it an issue related to getting the files into HDFS once you've pulled
>> the files to the main server?  If the problem is related to getting the
>> files into HDFS you could try using the Spooling Directory Source and point
>> it at the directory on your main server where you are aggregating the logs
>> via rsync.
>> Best,
>> Ed
>> On Wed, Jan 29, 2014 at 11:24 PM, burakkk <burak.isikli@gmail.com> wrote:
>>> Hi folks,
>>> I have question about flume-ng. There are some different generating log
>>> machines. These log files are small (around 4-5mb per file). I want to get
>>> or read these files into my main server from these remote servers on
>>> a specific directory and then I want to put it into HDFS. I can't install
>>> any kind of application on these remote servers so that I can't use avro
>>> and thrift source.
>>> For now I use rsync to sync files between two different machines and put
>>> them using hdfs file commands such as hdfs fs -put. But there are some
>>> issues about rsync.
>>> In order to solve this problem, what kind of source should I use and how
>>> can I do that?
>>> Thanks
>>> Best Regards...
>>> --
>>> *BURAK ISIKLI* | *http://burakisikli.wordpress.com
>>> <http://burakisikli.wordpress.com>*
> --
> *BURAK ISIKLI* | *http://burakisikli.wordpress.com
> <http://burakisikli.wordpress.com>*

View raw message