flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shuang <shu...@open42.com>
Subject Re: Use metadata field for output bucketing
Date Thu, 17 Nov 2011 08:38:25 GMT
Just to let people know, I have successfully set up my Flume flow with
autoCollectorSource" |
'collectorSink("s3n://mybucket/flume/%{source}/%Y-%m-%d/","log-%{host}-",
3600000)'

while in my python program, I have following:
       event_msg = self.flume_connection.ThriftFlumeEvent(
            timestamp=int(timestamp * 1000),
            body=content,
            nanos=timestamp * 1000000000,
            host=config.HOSTNAME,
            fields = { 'source' : traffic_source } )
        self.flume_connection.appendFlumeEventMsg(event_msg)
and it works perfectly, now my request logs are stored in different
directories by traffic_source.

Shuang

On Tue, Nov 8, 2011 at 11:09 AM, Shuang <shuang@open42.com> wrote:

> According to Flume User Guide 8.2.3/8.2.3 (it's a little inconsistent).
> "collectorSink" is translated in following way:
>
> collectorSink("s3n://mybucket/flume/%{source}/%Y-%m-%d/","log-%{host}-",
> 3600000)
> ==>
> collector(3600000) {
> escapedCustomDfs("s3n://mybucket/flume/%{source}/%Y-%m-%d/",
> "log-%{host}-%{rolltag}") }
>
> So I assume either collectorSink or your solution would work, unless
> Flume's translation mechanism doesn't escape Dfs path properly.
>
> Thanks a lot for your help.
>
> Shuang
>
>
> On Tue, Nov 8, 2011 at 10:20 AM, Mingjie Lai <mjlai09@gmail.com> wrote:
>
>>
>> As I said, you need to use escapedFormatDfs. For your case, it should be:
>>
>> collector(3600000){ escapedFormatDfs("s3n://**
>> mybucket/flume/%{source}/%Y-%**m-%d", "log-%{host}-")}
>>
>> According to the user guide, escapedFormatDfs will help to escape the
>> %{source} string.
>>
>> ... The hdfspath can use escape sequences documented to bucket data as
>> documented in the Output Bucketing section...
>>
>> But I haven't tried s3 as sink. However it should work. Can you have a
>> try?
>>
>>
>>
>> On 11/08/2011 12:44 AM, Shuang wrote:
>>
>>> Thanks, Mingjie. In my case, I already have the "source" field in event
>>> metadata, does that mean I can do the following directly?
>>> collectorSink("s3n://mybucket/**flume/%{source}/%Y-%m-%d/\",
>>> \"log-%{host}-\", 3600000)
>>>
>>> basically, use %{source} to refer to that metadata field in the S3 path?
>>>
>>> Shuang
>>>
>>> On Fri, Nov 4, 2011 at 5:02 PM, Mingjie Lai <mjlai09@gmail.com
>>> <mailto:mjlai09@gmail.com>> wrote:
>>>
>>>
>>>    You can try escapedFormatDfs. Here is example:
>>>
>>>    $ bin/flume node_nowatch -n f1 -c 'f1: text("/tmp/aa.txt") |
>>>    value("src", 123)
>>>    collector(2000){__**escapedFormatDfs("file:///tmp"**__,
>>> "aaaa-%{src}" )};'
>>>
>>>
>>>    $ ls /tmp
>>>    aaaa-123
>>>    ...
>>>
>>>    Should also work for s3.
>>>
>>>    Mingjie
>>>
>>>
>>>    On 11/04/2011 03:17 PM, Shuang wrote:
>>>
>>>        Hi, guys,
>>>           After reading the Flume User Guide, I thought this is
>>>        possible, but
>>>        would like to confirm with your guys. Currently, I have collectors
>>>        configured as:
>>>        collectorSink("s3n://mybucket/**__flume/%Y-%m-%d/\",\"log-%{__**
>>> host}-\",
>>>
>>>        3600000),
>>>        and I have a field called "source" in Flume event's metadata
>>>        table, and
>>>        would like to use in the collectorSink path, something like this:
>>>        collectorSink("s3n://mybucket/**__flume/%{metadata['source']}/**
>>> %__Y-%m-%d/\",\"log-%{host}-\"**,
>>>
>>>        3600000),
>>>
>>>        I wonder what's the right syntax to refer to field in the metadata
>>>        table. I search the user guide and couldn't find any example.
>>>        Also I'd
>>>        like to point out, this is kind of similar to how Scribe's message
>>>        category is used.
>>>
>>>        Shuang
>>>
>>>
>>>
>

Mime
View raw message