flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: AWS S3 flume source
Date Fri, 01 Aug 2014 18:19:05 GMT
+1 on an S3 Source. I would gladly review.

Jonathan Natkins wrote:
>
> Hey Pawel,
>
> My intention is to start working on it, but I don't know exactly how
> long it will take, and I'm not a committer, so time estimates would
> have to be taken with a grain of salt regardless. If this is something
> that you need urgently, it may not be ideal to wait for me to start
> building something for yourself.
>
> That said, as mentioned in the other thread, dynamic configuration can
> be done by refreshing the configuration files across the set of Flume
> agents. It's certainly not as great as having a single place to change
> it (e.g. ZooKeeper), but it's a way to get the job done.
>
> Thanks,
> Natty
>
>
> On Fri, Aug 1, 2014 at 1:33 AM, Paweł <prog88@gmail.com
> <mailto:prog88@gmail.com>> wrote:
>
> Hi,
> Jonathan how should we interpret your last e-mail? You opened an
> JIRA issue and want to start implementing this and do you have any
> estimate how long it will take?
>
> I think the biggest challenge here is to have dynamic
> configuration of Flume. It doesn't seem to be part of FLUME-2437
> issue. Am I right?
>
> > Would you need to be able to pull files from multiple S3
> directories with the same source?
>
> I think we don't need to track multiple S3 buckets with a single
> source. I just imagine an approach where each S3 source can be
> added or deleted on demand and attached to any Channel. I'm only
> afraid about this dynamic configuration. I'll open a new thread
> about this. It seems we have two totally separate things:
> * build S3 source
> * make flume configurable dynamically
>
> --
> Paweł
>
>
> 2014-08-01 9:51 GMT+02:00 Otis Gospodnetic
> <otis.gospodnetic@gmail.com <mailto:otis.gospodnetic@gmail.com>>:
>
> Hi,
>
> On Fri, Aug 1, 2014 at 4:52 AM, Jonathan Natkins
> <natty@streamsets.com <mailto:natty@streamsets.com>> wrote:
>
> Hey all,
>
> I created a JIRA for this:
> https://issues.apache.org/jira/browse/FLUME-2437
>
>
> Thanks! Should Fix Version be set to the next Flume release
> version?
>
> I thought I'd start working on one myself, which can
> hopefully be contributed back. I'm curious: do you have
> particular requirements? Based on the emails in this
> thread, it sounds like the original goal was to have
> something that's like a SpoolDirectorySource that just
> picks up new files from S3. Is that accurate?
>
>
> Yes, I think so. We need to be able to:
> * fetch data (logs for pulling them in Logsene
> <http://sematext.com/logsene/>) from S3 periodically (e.g.
> every 1 min, every 5 min, etc.)
> * fetch data from multiple S3 buckets
> * associate an S3 bucket with a user/token/key
> * dynamically (i.e. without editing/writing config files
> stored on disk) add new S3 buckets from which data should be fetch
> * dynamically (i.e. without editing/writing config files
> stored on disk) stop fetching data from some S3 buckets
>
>
> Would you need to be able to pull files from multiple S3
> directories with the same source?
>
>
> I think the above addresses this question.
>
> Thanks,
> Natty
>
>
> Thanks!
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On Thu, Jul 31, 2014 at 4:58 PM, Otis Gospodnetic
> <otis.gospodnetic@gmail.com
> <mailto:otis.gospodnetic@gmail.com>> wrote:
>
> +1 for seeing S3Source, starting with a JIRA issue.
>
> But being able to dynamically add/remove S3 buckets
> from which to pull data seems important.
>
> Any suggestions for how to approach that?
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Thu, Jul 31, 2014 at 9:14 PM, Hari Shreedharan
> <hshreedharan@cloudera.com
> <mailto:hshreedharan@cloudera.com>> wrote:
>
> Please go ahead and file a jira. If you are
> willing to submit a patch, you can post it on the
> jira.
>
> Viral Bajaria wrote:
>>
>>
>> I have a similar use case that cropped up
>> yesterday. I saw the archive
>> and found that there was a recommendation to
>> build it as Sharninder
>> suggested.
>>
>> For now, I went down the route of writing a
>> python script which
>> downloads from S3 and puts the files in a
>> directory which is
>> configured to be picked up via a spooldir.
>>
>> I would prefer to get a direct S3 source, and
>> maybe we could
>> collaborate on it and open-source it. Let me know
>> if you prefer that
>> and we can work directly on it by creating a JIRA.
>>
>> Thanks,
>> Viral
>>
>>
>>
>> On Thu, Jul 31, 2014 at 10:26 AM, Hari Shreedharan
>> <hshreedharan@cloudera.com
>> <mailto:hshreedharan@cloudera.com>
>> <mailto:hshreedharan@cloudera.com
>> <mailto:hshreedharan@cloudera.com>>> wrote:
>>
>> In both cases, Sharninder is right :)
>>
>> Sharninder wrote:
>>>
>>>
>>>
>>> As far as I know, there is no (open source)
>>> implementation of an S3
>>> source, so yes, you'll have to implement
>>> your own. You'll have to
>>> implement a Pollable source and the dev
>>> documentation has an outline
>>> that you can use. You can also look at the
>>> existing Execsource and
>>> work your way up.
>>>
>>> As far as I know, there is no way to
>>> configure flume without
>>> using the
>>> configuration file.
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 7:57 PM, Paweł
>>> <prog88@gmail.com <mailto:prog88@gmail.com>
>>> <mailto:prog88@gmail.com <mailto:prog88@gmail.com>>
>>> <mailto:prog88@gmail.com
>>> <mailto:prog88@gmail.com>
>>> <mailto:prog88@gmail.com
>>> <mailto:prog88@gmail.com>>>> wrote:
>>>
>>> Hi,
>>> I'm wondering if Flume is able to read
>>> directly from S3.
>>>
>>> I'll describe my case. I have log files
>>> stored in AWS S3. I have
>>> to fetch periodically new S3 objects and
>>> read log lines from it.
>>> Than use log lines (events) are
>>> processed in standard flume's way
>>> (as with other sources).
>>>
>>> *1) Is there any way to fetch S3 objects
>>> or I have to write
>>> my own
>>> Source?*
>>>
>>>
>>> There is also second case. I want to
>>> have flume configuration
>>> dynamic. Flume sources can change in
>>> time. New AWS key and S3
>>> bucket can be added or deleted.
>>>
>>> *2) Is there any other way to configure
>>> Flume than by static
>>> configuration file?*
>>>
>>> --
>>> Paweł Róg
>>>
>>
>>
>
>
>
>
>

Mime
View raw message