flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Pawar <nitinpawar...@gmail.com>
Subject Re: Flume NG and S3
Date Tue, 01 Jul 2014 08:05:18 GMT
If you are heavily dependent on AWS stack then instead of kafka you can
look at AWS Kinesis and then from their on there is good integration
available to AWS s3 or any other service you want to dump data.




On Tue, Jul 1, 2014 at 1:33 PM, Asim Zafir <asim.zafir@gmail.com> wrote:

> Kafka's framework is designed for scalable read i/o's then for a massive
> write event push coming to a centralize storage such as that of hdfs.
>
> not sure, how flume's avro sink to s3 would turn out for entire flume
> pipeline. i suspect it will be fatal to carry on a memory channel and even
> if you have a file chnanel on the flume agent/collectors, it is very likely
> it will cause buffering on the channel.
>
>
>
>
> On Mon, Jun 30, 2014 at 11:47 PM, Máté Gulyás <gulyasm@dmlab.hu> wrote:
>
>> Please see my comments inline.
>>
>> YIMEN YIMGA Gael wrote:
>> > Could you please communicate the link of the article you read please ?
>> https://gist.github.com/crowdmatt/5256881 and the last comment.
>>
>> Sharninder wrote
>> > No reason to not use flume except for the fact that S3, since its over
>> the wire, will be a lot slower than a local hdfs cluster in which case you
>> need a big enough channel to hold events not yet processed out of the sink.
>> If you have a fast enough pipe, you can very well use flume for this sort
>> of use-case.
>> I plan to aggregate 5-15GB data with Filechannel, as I want to flush
>> to S3 every hour on every node. As far as I know Flume can gzip it, so
>> the size would be about 500MB-1,5GB.
>>
>> Thanks for the feedback, I will write If I have any results.
>>
>> Mate Gulyas
>>
>> On Tue, Jul 1, 2014 at 6:26 AM, Sharninder <sharninder@gmail.com> wrote:
>> > No reason to not use flume except for the fact that S3, since its over
>> the
>> > wire, will be a lot slower than a local hdfs cluster in which case you
>> need
>> > a big enough channel to hold events not yet processed out of the sink.
>> If
>> > you have a fast enough pipe, you can very well use flume for this sort
>> of
>> > use-case.
>> >
>> > The reason the author might have moved to kafka, and I'm just
>> speculating
>> > here, is that kafka provides him better buffering support for exactly
>> the
>> > case I've written above.
>> >
>> > HTH
>> > Sharninder
>> >
>> >
>> >
>> > On Mon, Jun 30, 2014 at 7:57 PM, Máté Gulyás <gulyasm@dmlab.hu> wrote:
>> >>
>> >> Hi!
>> >>
>> >> I would like to use flume to aggregate and send logs to an S3 bucket.
>> >> I did some research, but the last article I found on the topic was
>> >> more then a year old and the author abandoned Flume for Kafka. My
>> >> other concern is that most of the articles were written for Flume OG,
>> >> not NG.
>> >> Is there any reason why I should not use flume to sink messages to S3?
>> >>
>> >>
>> >> Thanks in advance.
>> >>
>> >> Mate Gulyas
>> >> Lead Developer at Dmlab
>> >
>> >
>>
>
>


-- 
Nitin Pawar

Mime
View raw message