flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juhani Connolly <juhani_conno...@cyberagent.co.jp>
Subject Re: Restarts without data loss
Date Mon, 09 Jul 2012 07:51:00 GMT
Hari: you mean multiple disks, not multiple folders? Running off a 
single disk the performance is unfortunately not "reasonably good".

The reality of most companies hoping to aggregate logs is that a lot of 
machines generating the logs have a single set of raided disks, and that 
using multiple disks is not an option. Please do keep this in mind when 
running tests and not just the "best case scenario". After all, flume is 
going to be co-habiting on a server that was made for the primary task 
in mind. The servers are built for their primary purposes, not for flume.

In our case what we had hoped to do on our log sources, and currently 
are doing with scribed(which has its own issues, hence wanting to move):

- Run agents on all our log generating servers, using a channel that can 
retain data in case of network issues communicating with the collector 
layer.
  - Current setup is a scribed buffer store with network store as 
primary, file as secondary.
  - Intended setup with flume was a file channel connected to an avro 
sink. With only a single disk available, it is extremely slow. JDBC 
channel is also extremely slow, and MemoryChannel will fill up and start 
refusing puts as soon as a network issue comes up.

I think this is a very common use case and one that is likely holding up 
adoption until we solve it(at least is is for us).

On 07/09/2012 04:07 PM, Hari Shreedharan wrote:
> Senthil,
>
> Have you tried using it recently, with multiple data folders etc. In 
> recent tests, we have seen reasonably good performance. Of course, the 
> performance of MemoryChannel would be much better, since it is 
> in-memory :-). You should try to use the FileChannel as much as you 
> can, else there is a risk of losing data.
>
> Thanks
> Hari
>
> -- 
> Hari Shreedharan
>
> On Monday, July 9, 2012 at 12:01 AM, Senthilvel Rangaswamy wrote:
>
>> We do use persistent channel when there is overflow. Using 
>> FileChannel for regular operations
>> is too slow for us.
>>
>> On Sun, Jul 8, 2012 at 11:58 PM, Brock Noland <brock@cloudera.com 
>> <mailto:brock@cloudera.com>> wrote:
>>> I am guessing you are aware, but you could use a persistent channel 
>>> such as file channel.
>>>
>>> -- 
>>> Brock Noland
>>> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>>>
>>> On Monday, July 9, 2012 at 7:18 AM, Senthilvel Rangaswamy wrote:
>>>
>>>> We are using Flume 1.2.0 with memory channel. When we rollout new 
>>>> configs/decorators
>>>> we may need to restart flume at which point any events in memory 
>>>> channel is gone. Any
>>>> ways to avoid this ?
>>>>
>>>> Thanks,
>>>> -- 
>>>> ..Senthil
>>>>
>>>> "If there's anything more important than my ego around, I want it
>>>>  caught and shot now."
>>>>            - Douglas Adams.
>>>>
>>>
>>
>>
>>
>> -- 
>> ..Senthil
>>
>> "If there's anything more important than my ego around, I want it
>>  caught and shot now."
>>                                                     - Douglas Adams.
>>
>



Mime
View raw message