flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simone Roselli <simonerosell...@gmail.com>
Subject Re: Flume-ng 1.6 reliable setup
Date Wed, 21 Oct 2015 09:52:03 GMT
In case of disk crash the Flume host is removed from a backend pool, and it
stops receiving events; so that won't be a problem.

I've found a nice solution (nice for our setup) setting up a Spooldir
source, configured on the same Channel of the Kafka sink. This means that
each event placed on this directory will be instantly pushed on the Kafka
sink.

So, to recap, the File roll sink will write each events that cannot reach
the main Kafka, on a directory (/failover); periodically a script will just
check for the presence of events in this directory and, in the case, it
will just move them to /spool_directory


Thanks anyway for the support

On Mon, Oct 19, 2015 at 3:14 PM, Gonzalo Herreros <gherreros@gmail.com>
wrote:

> I see. Maybe you need more kafka nodes and less Flume agents (I have the
> same of each)
>
> All the solutions you mention will not survive a disk crash.
> I would rather rely on Kafka to guarantee no message losses.
>
> Gonzalo
>
> On 19 October 2015 at 13:39, Simone Roselli <simoneroselli78@gmail.com>
> wrote:
>
>> Hi,
>>
>> .. because a Kafka channel will lead me to the same problem, no?
>>
>> I have 200 nodes, each one with a Flume-ng agent aboard. I cannot lose a
>> single event.
>>
>> With a memory/file channel, in case Kafka is down/broken/bugged, I could
>> still take care of events (Spillable memory, File roll, other sinks..). In
>> case of Kafka Channel (another separated Kafka cluster) I would exclusively
>> rely on the Kafka cluster, which was my initial non-ideal situation, having
>> it as a Sink.
>>
>>
>> Thanks
>> Simone
>>
>>
>>
>>
>>
>> On Mon, Oct 19, 2015 at 11:28 AM, Gonzalo Herreros <gherreros@gmail.com>
>> wrote:
>>
>>> Why don't you use a Kafka channel?
>>> It would be simpler and it would meet your initial requirement of having
>>> channel fail tolerance.
>>>
>>> Regards,
>>> Gonzalo
>>>
>>> On 19 October 2015 at 10:23, Simone Roselli <simoneroselli78@gmail.com>
>>> wrote:
>>>
>>>> However,
>>>>
>>>> since the arrive order on Kafka (main sink) is not a particular problem
>>>> to me, my current solution would be:
>>>>
>>>>  * memory channel
>>>>  * sinkgroup with 2 sinks:
>>>>    ** Kafka
>>>>    ** File_roll (write events on '/data/x' directory,  in case Kafka is
>>>> down)
>>>>  * periodically check the presence of files in '/data/x' and, in the
>>>> case, re-push them to Kafka
>>>>
>>>> I still don't know whether it is possible to re-push File-roll files on
>>>> Kafka using bin/flume-ng
>>>>
>>>> Whatever hints would be appreciated.
>>>>
>>>> Many thanks
>>>>
>>>> On Fri, Oct 16, 2015 at 4:32 PM, Simone Roselli <
>>>> simoneroselli78@gmail.com> wrote:
>>>>
>>>>> Hi Phil,
>>>>>
>>>>> thanks for your reply.
>>>>>
>>>>> Yes, setting up a file-channel configuration is consuming CPU up to
>>>>> 80/90%
>>>>>
>>>>> My settings:
>>>>> # Channel configuration
>>>>> agent1.channels.ch1.type = file
>>>>> agent1.channels.ch1.checkpointDir = /opt/flume-ng/chekpoint
>>>>> agent1.channels.ch1.dataDirs = /opt/flume-ng/data
>>>>> agent1.channels.ch1.capacity = 1000000
>>>>> agent1.channels.ch1.transactionCapacity = 10000
>>>>>
>>>>> # flume-env.sh
>>>>> export JAVA_OPTS="-Xms512m -Xmx2048m"
>>>>>
>>>>> # top
>>>>> 22079 flume-ng  20   0 6924752 785536  17132 S  83.7%  2.4   3:53.19
>>>>> java
>>>>>
>>>>> Do you have any tuning for the GC ?
>>>>>
>>>>> Thanks
>>>>> Simone
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 15, 2015 at 7:59 PM, Phil Scala <
>>>>> Phil.Scala@globalrelay.net> wrote:
>>>>>
>>>>>> Hi Simone
>>>>>>
>>>>>>
>>>>>>
>>>>>> I wonder why you’re seeing 90% CPU use when you use a file channel.
>>>>>> I would expect high disk I/O.  To counter, I have on a single server
4
>>>>>> spool dir sources, each going to a separate file channel.  Also on
an SSD
>>>>>> based server.   I do not see any CPU or even disk IO utilization.
 I am
>>>>>> pushing about 10 million events per day across all 4 sources and
has been
>>>>>> running reliably for 2 years now.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I would always use a file channel, any memory channel runs the risk
>>>>>> of data loss if the node were to fail.  I would be as worried about
the
>>>>>> local node failing seeing that a 3 node kafka cluster losing 2 nodes
before
>>>>>> it would lose quorum.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Not sure what your data source is, if you can add more flume nodes
of
>>>>>> course that would help.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Have you given ample heap space, seeing maybe GC’s causing the
high
>>>>>> CPU?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Phil
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Simone Roselli [mailto:simoneroselli78@gmail.com]
>>>>>> *Sent:* Friday, October 09, 2015 12:33 AM
>>>>>> *To:* user@flume.apache.org
>>>>>> *Subject:* Flume-ng 1.6 reliable setup
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm currently plan to migrate from Flume 0.9 to Flume-ng 1.6, but
I'm
>>>>>> having troubles trying to find a reliable setup for this one.
>>>>>>
>>>>>>
>>>>>>
>>>>>> My sink is a 3 nodes Kafka cluster. I must avoid *to lose events
in
>>>>>> case the main sink is down*, broken or unreachable for a while.
>>>>>>
>>>>>>
>>>>>>
>>>>>> In Flume 0.9, I use a memory channel with the *store on failure *feature,
>>>>>> which starts writing events on the local disk in case the target
sink is
>>>>>> not available.
>>>>>>
>>>>>>
>>>>>>
>>>>>> In Flume-ng 1.6 the same behaviour would be accomplished by setting
>>>>>> up a *Spillable memory channel, *but the problem with this solution
>>>>>> is written in the end of the channel's description: "*This channel
>>>>>> is currently experimental and not recommended for use in production."*
>>>>>>
>>>>>>
>>>>>>
>>>>>> In Flume-ng 1.6, it's possible to setup a pool of *Failover sinks*.
>>>>>> So, I was thinking to hypothetically configure a *File Roll *as
>>>>>> Secondary sink in case the Primary is down. However, once the Primary
sink
>>>>>> would be back online, the data placed on the Secondary sink (local
disk)
>>>>>> won't be automatically pushed on the Primary one.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Another option would be setting up a *file channel*: write each
>>>>>> event on the disk and then sink. Without mentioning that I don't
love the
>>>>>> idea to write/delete each single event continuously on a SSD, this
setup is
>>>>>> taking 90% of CPU. The same exactly configuration but using a memory
>>>>>> channel takes 3%.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Other solutions to evaluate ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Simone
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message