flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Máté Gulyás <guly...@dmlab.hu>
Subject Re: Flume NG and S3
Date Tue, 01 Jul 2014 08:42:08 GMT
Our current stack has Flume wwith Socket source and HDFS sink. We move
to AWS and keeping flume would be a great time saver. Kinesis looks
good, but If I can use flume I would stick with it. Due to S3 PUT
price, we have to aggregate and flume does that with the filechannel.

Mate Gulyas

On Tue, Jul 1, 2014 at 10:05 AM, Nitin Pawar <nitinpawar432@gmail.com> wrote:
> If you are heavily dependent on AWS stack then instead of kafka you can look
> at AWS Kinesis and then from their on there is good integration available to
> AWS s3 or any other service you want to dump data.
>
>
>
>
> On Tue, Jul 1, 2014 at 1:33 PM, Asim Zafir <asim.zafir@gmail.com> wrote:
>>
>> Kafka's framework is designed for scalable read i/o's then for a massive
>> write event push coming to a centralize storage such as that of hdfs.
>>
>> not sure, how flume's avro sink to s3 would turn out for entire flume
>> pipeline. i suspect it will be fatal to carry on a memory channel and even
>> if you have a file chnanel on the flume agent/collectors, it is very likely
>> it will cause buffering on the channel.
>>
>>
>>
>>
>> On Mon, Jun 30, 2014 at 11:47 PM, Máté Gulyás <gulyasm@dmlab.hu> wrote:
>>>
>>> Please see my comments inline.
>>>
>>> YIMEN YIMGA Gael wrote:
>>> > Could you please communicate the link of the article you read please ?
>>> https://gist.github.com/crowdmatt/5256881 and the last comment.
>>>
>>> Sharninder wrote
>>> > No reason to not use flume except for the fact that S3, since its over
>>> > the wire, will be a lot slower than a local hdfs cluster in which case you
>>> > need a big enough channel to hold events not yet processed out of the sink.
>>> > If you have a fast enough pipe, you can very well use flume for this sort
of
>>> > use-case.
>>> I plan to aggregate 5-15GB data with Filechannel, as I want to flush
>>> to S3 every hour on every node. As far as I know Flume can gzip it, so
>>> the size would be about 500MB-1,5GB.
>>>
>>> Thanks for the feedback, I will write If I have any results.
>>>
>>> Mate Gulyas
>>>
>>> On Tue, Jul 1, 2014 at 6:26 AM, Sharninder <sharninder@gmail.com> wrote:
>>> > No reason to not use flume except for the fact that S3, since its over
>>> > the
>>> > wire, will be a lot slower than a local hdfs cluster in which case you
>>> > need
>>> > a big enough channel to hold events not yet processed out of the sink.
>>> > If
>>> > you have a fast enough pipe, you can very well use flume for this sort
>>> > of
>>> > use-case.
>>> >
>>> > The reason the author might have moved to kafka, and I'm just
>>> > speculating
>>> > here, is that kafka provides him better buffering support for exactly
>>> > the
>>> > case I've written above.
>>> >
>>> > HTH
>>> > Sharninder
>>> >
>>> >
>>> >
>>> > On Mon, Jun 30, 2014 at 7:57 PM, Máté Gulyás <gulyasm@dmlab.hu>
wrote:
>>> >>
>>> >> Hi!
>>> >>
>>> >> I would like to use flume to aggregate and send logs to an S3 bucket.
>>> >> I did some research, but the last article I found on the topic was
>>> >> more then a year old and the author abandoned Flume for Kafka. My
>>> >> other concern is that most of the articles were written for Flume OG,
>>> >> not NG.
>>> >> Is there any reason why I should not use flume to sink messages to S3?
>>> >>
>>> >>
>>> >> Thanks in advance.
>>> >>
>>> >> Mate Gulyas
>>> >> Lead Developer at Dmlab
>>> >
>>> >
>>
>>
>
>
>
> --
> Nitin Pawar

Mime
View raw message