flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dani Rayan <dani.ra...@gmail.com>
Subject Re: Cassandra Sink using Hector
Date Tue, 18 Oct 2011 07:00:40 GMT
Hi Kamal,

Flume's design is such that it is horizontally scalable, add more boxes and
run more collector daemons. It should be able to handle 2000 messages per
You can configure a fail over chain to avoid loss of events.

IMHO, the downside of mulch-threaded approach is lack of manageability.

On Mon, Oct 17, 2011 at 9:58 PM, Kamal Bahadur <mailtokamal@gmail.com>wrote:

> Hi Dani,
> Thanks for the reply. I am using E2E relaibility mode. If I spawn new
> thread for each append call, I am not sure if the acks will be handled
> properly. I might lose an event if the child thread ends up in an exception.
> Do you have any suggestion for my use case? With current setup, I am able to
> write only 500 events per second. The expected events rate is over 2000 per
> second. I tried to increase the number of collectors and it seems to help.
> Is this my only option?
> Thanks,
> Kamal
> On Mon, Oct 17, 2011 at 4:42 PM, Dani Rayan <dani.rayan@gmail.com> wrote:
>> Hey Kamal,
>> You are correct. The append method would not spawn new threads by itself.
>> However, you can still override it.
>> On Mon, Oct 17, 2011 at 1:58 PM, Kamal Bahadur <mailtokamal@gmail.com>wrote:
>>> Hi,
>>> I have written a sink for writing data into Casandra using Hector API. It
>>> looks like Hector does a great job of connection pooling and load balancing.
>>> As soon as I start the collector, I can see 16 conections being established
>>> between collector and cassandra. I am not sure if flume is taking advantage
>>> of those connections in the pool. I am assuming that, Collector's append
>>> method is not multi-threaded and therefore only one connection is being used
>>> at any point of time. Can someone confirm this?
>>> Thanks,
>>> Kamal
>> --
>> -Dani Abel Rayan

-Dani Abel Rayan

View raw message