flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SaravanaKumar TR <saran0081...@gmail.com>
Subject Re: Need suggestion on reliable source for log processing
Date Mon, 27 Oct 2014 11:09:56 GMT
Yes I understand the concerns with this use case.

If so we need to configure failover in this scenario , can we have it like
channel level ,sink channel.

Does flume support to configure failover incase channel fills up.



On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <avila@devlogic.eu> wrote:

> Hi,
>
> In fact, this is not the problem with Flume.
>
> No solution will function reliably for your use case, simply because all
> of them will have to do some sort of tail-f or streaming on a file and if
> they can't keep up with it (they mostly don't in high speed entry points),
> they will drop some entries.
> Please, be kind to yourself and plan for failures - if you need to restart
> Flume or any other solution then you'll face dropped entries that you'll
> not be able to re-ingest easily as in most cases you won't know which ones
> you've dropped.
>
>
> Regards,
> Ahmed
>
> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <saran0081986@gmail.com
> > wrote:
>
>> Thanks for comments Ahmed.
>>
>> So from your comments , I consider that flume doesn't have any reliable
>> source option for use case provided by me.
>>
>> If flume can't provide it, can you help me with any other log collector
>> solutions which can I consider here to move real time data to HDFS.
>>
>>
>>
>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <avila@devlogic.eu> wrote:
>>
>>> Hi,
>>>
>>> Then, you're out of luck in my opinion, as there is no way other than
>>> tail -f.
>>> The problem with fail-f is that tail will not wait for source/channel to
>>> keep up with it. If Cnannel is full it will back-off to the source and then
>>> the source will just stop ingesting.
>>>
>>> There is a possibility to hack up the tail -f into another file and then
>>> custom-rotate that duplicate file.
>>> But, I wouldn't recommend such case.
>>>
>>> Just a side note - If you're operating Java application (Tomcat or
>>> similar), then you can create multiple output files via log4j.properties
>>> configuration without application itself knowing anything about it.
>>>
>>> Regards,
>>> Ahmed
>>>
>>>
>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>> saran0081986@gmail.com> wrote:
>>>
>>>> Ahmed,
>>>>
>>>> Here in my case , the application will rename the existing file as
>>>> <logfile>.yesterdaydate and create a new file as <logfile> at
00:00 AM.
>>>>
>>>> I can't change the log rotation policy of application for now.So I
>>>> guess I should rule out the option of using spooling directory source in
my
>>>> case.
>>>>
>>>> Can you suggest me with any other options other than spooling dir
>>>> source.
>>>>
>>>> Thanks,
>>>>
>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <avila@devlogic.eu> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> It all depends on how log rotation is done and how application
>>>>> producing the log file handles log rotation.
>>>>> Most of the applications just reopens the log file when it receives a
>>>>> kill signal. For example, nginx reopens the log file when it receives
USR1
>>>>> signal, but it doesn't stop the process. Some applications might restart
as
>>>>> a result.
>>>>>
>>>>> If the application just reopens the log file, then you can change your
>>>>> log rotation policy to be per minute.
>>>>> In that case logrotate daemon won't satisfy such case, so you'll have
>>>>> to make a cron job to do it.
>>>>> In such case, you would separate finished logs location and live log
>>>>> location so the spooling directory source doesn't freak out about active
>>>>> log file being appended.
>>>>>
>>>>> Anyway, spooling directory source is a way to go, as it will leave log
>>>>> files in place, just renamed.
>>>>>
>>>>> Regards,
>>>>> Ahmed
>>>>>
>>>>>
>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>>> saran0081986@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>>>>
>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>
>>>>>> Channel:  file channel
>>>>>>
>>>>>> Sink: HDFS
>>>>>>
>>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>>
>>>>>>
>>>>>> It appears like exec is not a reliable source , as we may data loss
>>>>>> if channel/source is down.
>>>>>>
>>>>>>
>>>>>> So i tried with other option "spooling directory source" which is
>>>>>> mentioned as reliable source.But here I have a single logfile where
data
>>>>>> gets appended in , so I dont see option of moving the file to spool
>>>>>> directory.
>>>>>>
>>>>>>
>>>>>> Can anyone help me with providing any other reliable source option
in
>>>>>> case where logfile gets appended with data and logfile rotation happens
>>>>>> only at the end of the day.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Saravana
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>> recipient(s) only. This email contains confidential information. It should
>>>>> not be copied, disclosed to, retained or used by, any party other than
the
>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>> contained in them, is strictly prohibited and may be illegal. If you
are
>>>>> not an intended recipient then please promptly delete this e-mail and
any
>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>> emails that you send to us may be monitored by systems or persons other
>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>> communication complies with the law and company policies.
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> This e-mail and any attachment is for authorised use by the intended
>>> recipient(s) only. This email contains confidential information. It should
>>> not be copied, disclosed to, retained or used by, any party other than the
>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>> of this E-mail or its attachments, and/or any use of any information
>>> contained in them, is strictly prohibited and may be illegal. If you are
>>> not an intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender directly via email. Any
>>> emails that you send to us may be monitored by systems or persons other
>>> than the named communicant for the purposes of ascertaining whether the
>>> communication complies with the law and company policies.
>>>
>>
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>

Mime
View raw message