flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paweł <pro...@gmail.com>
Subject Re: AWS S3 flume source
Date Fri, 01 Aug 2014 18:31:15 GMT
Hi,
Thanks for explanation Jonathan. I think I will also start working on it.
When you have any patch (even draft) I'd be glad if you can attach it in
JIRA. I'll do the same.
What do you think?

--
Paweł Róg

2014-08-01 20:19 GMT+02:00 Hari Shreedharan <hshreedharan@cloudera.com>:

> +1 on an S3 Source. I would gladly review.
>
> Jonathan Natkins wrote:
>
>
> Hey Pawel,
>
> My intention is to start working on it, but I don't know exactly how
> long it will take, and I'm not a committer, so time estimates would
> have to be taken with a grain of salt regardless. If this is something
> that you need urgently, it may not be ideal to wait for me to start
> building something for yourself.
>
> That said, as mentioned in the other thread, dynamic configuration can
> be done by refreshing the configuration files across the set of Flume
> agents. It's certainly not as great as having a single place to change
> it (e.g. ZooKeeper), but it's a way to get the job done.
>
> Thanks,
> Natty
>
>
> On Fri, Aug 1, 2014 at 1:33 AM, Paweł <prog88@gmail.com
> <mailto:prog88@gmail.com>> wrote:
>
>     Hi,
>     Jonathan how should we interpret your last e-mail? You opened an
>     JIRA issue and want to start implementing this and do you have any
>     estimate how long it will take?
>
>     I think the biggest challenge here is to have dynamic
>     configuration of Flume. It doesn't seem to be part of FLUME-2437
>     issue. Am I right?
>
>     > Would you need to be able to pull files from multiple S3
>     directories with the same source?
>
>     I think we don't need to track multiple S3 buckets with a single
>     source. I just imagine an approach where each S3 source can be
>     added or deleted on demand and attached to any Channel. I'm only
>     afraid about this dynamic configuration. I'll open a new thread
>     about this. It seems we have two totally separate things:
>     * build S3 source
>     * make flume configurable dynamically
>
>     --
>     Paweł
>
>
>     2014-08-01 9:51 GMT+02:00 Otis Gospodnetic
>     <otis.gospodnetic@gmail.com <mailto:otis.gospodnetic@gmail.com>>:
>
>
>         Hi,
>
>         On Fri, Aug 1, 2014 at 4:52 AM, Jonathan Natkins
>         <natty@streamsets.com <mailto:natty@streamsets.com>> wrote:
>
>             Hey all,
>
>             I created a JIRA for this:
>             https://issues.apache.org/jira/browse/FLUME-2437
>
>
>         Thanks!  Should Fix Version be set to the next Flume release
>         version?
>
>             I thought I'd start working on one myself, which can
>             hopefully be contributed back. I'm curious: do you have
>             particular requirements? Based on the emails in this
>             thread, it sounds like the original goal was to have
>             something that's like a SpoolDirectorySource that just
>             picks up new files from S3. Is that accurate?
>
>
>         Yes, I think so.  We need to be able to:
>         * fetch data (logs for pulling them in Logsene
>         <http://sematext.com/logsene/>) from S3 periodically (e.g.
>
>         every 1 min, every 5 min, etc.)
>         * fetch data from multiple S3 buckets
>         * associate an S3 bucket with a user/token/key
>         * dynamically (i.e. without editing/writing config files
>         stored on disk) add new S3 buckets from which data should be fetch
>         * dynamically (i.e. without editing/writing config files
>         stored on disk) stop fetching data from some S3 buckets
>
>
>             Would you need to be able to pull files from multiple S3
>             directories with the same source?
>
>
>         I think the above addresses this question.
>
>             Thanks,
>             Natty
>
>
>         Thanks!
>
>         Otis
>         --
>         Performance Monitoring * Log Analytics * Search Analytics
>         Solr & Elasticsearch Support * http://sematext.com/
>
>
>
>             On Thu, Jul 31, 2014 at 4:58 PM, Otis Gospodnetic
>             <otis.gospodnetic@gmail.com
>             <mailto:otis.gospodnetic@gmail.com>> wrote:
>
>                 +1 for seeing S3Source, starting with a JIRA issue.
>
>                 But being able to dynamically add/remove S3 buckets
>                 from which to pull data seems important.
>
>                 Any suggestions for how to approach that?
>
>                 Otis
>                 --
>                 Performance Monitoring * Log Analytics * Search Analytics
>                 Solr & Elasticsearch Support * http://sematext.com/
>
>
>                 On Thu, Jul 31, 2014 at 9:14 PM, Hari Shreedharan
>                 <hshreedharan@cloudera.com
>                 <mailto:hshreedharan@cloudera.com>> wrote:
>
>                     Please go ahead and file a jira. If you are
>                     willing to submit a patch, you can post it on the
>                     jira.
>
>                     Viral Bajaria wrote:
>
>
>
>                     I have a similar use case that cropped up
>                     yesterday. I saw the archive
>                     and found that there was a recommendation to
>                     build it as Sharninder
>                     suggested.
>
>                     For now, I went down the route of writing a
>                     python script which
>                     downloads from S3 and puts the files in a
>                     directory which is
>                     configured to be picked up via a spooldir.
>
>                     I would prefer to get a direct S3 source, and
>                     maybe we could
>                     collaborate on it and open-source it. Let me know
>                     if you prefer that
>                     and we can work directly on it by creating a JIRA.
>
>                     Thanks,
>                     Viral
>
>
>
>                     On Thu, Jul 31, 2014 at 10:26 AM, Hari Shreedharan
>                     <hshreedharan@cloudera.com
>                     <mailto:hshreedharan@cloudera.com>
>                     <mailto:hshreedharan@cloudera.com
>
>                     <mailto:hshreedharan@cloudera.com>>> wrote:
>
>                         In both cases, Sharninder is right :)
>
>                         Sharninder wrote:
>
>
>
>
>                         As far as I know, there is no (open source)
>                     implementation of an S3
>                         source, so yes, you'll have to implement
>                     your own. You'll have to
>                         implement a Pollable source and the dev
>                     documentation has an outline
>                         that you can use. You can also look at the
>                     existing Execsource and
>                         work your way up.
>
>                         As far as I know, there is no way to
>                     configure flume without
>                         using the
>                         configuration file.
>
>
>
>                         On Thu, Jul 31, 2014 at 7:57 PM, Paweł
>                     <prog88@gmail.com <mailto:prog88@gmail.com>
>                     <mailto:prog88@gmail.com <mailto:prog88@gmail.com>>
>                     <mailto:prog88@gmail.com
>                     <mailto:prog88@gmail.com>
>                     <mailto:prog88@gmail.com
>                     <mailto:prog88@gmail.com>>>> wrote:
>
>                             Hi,
>                             I'm wondering if Flume is able to read
>                     directly from S3.
>
>                             I'll describe my case. I have log files
>                     stored in AWS S3. I have
>                             to fetch periodically new S3 objects and
>                     read log lines from it.
>                             Than use log lines (events) are
>                     processed in standard flume's way
>                             (as with other sources).
>
>                             *1) Is there any way to fetch S3 objects
>                     or I have to write
>                         my own
>                             Source?*
>
>
>                             There is also second case. I want to
>                     have flume configuration
>                             dynamic. Flume sources can change in
>                     time. New AWS key and S3
>                             bucket can be added or deleted.
>
>                             *2) Is there any other way to configure
>                     Flume than by static
>                             configuration file?*
>
>                             --
>                             Paweł Róg
>
>
>
>
>
>
>
>
>

Mime
View raw message