flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shuang <shu...@open42.com>
Subject Re: Use metadata field for output bucketing
Date Tue, 08 Nov 2011 19:09:09 GMT
According to Flume User Guide 8.2.3/8.2.3 (it's a little inconsistent).
"collectorSink" is translated in following way:
collectorSink("s3n://mybucket/flume/%{source}/%Y-%m-%d/","log-%{host}-",
3600000)
==>
collector(3600000) {
escapedCustomDfs("s3n://mybucket/flume/%{source}/%Y-%m-%d/",
"log-%{host}-%{rolltag}") }

So I assume either collectorSink or your solution would work, unless
Flume's translation mechanism doesn't escape Dfs path properly.

Thanks a lot for your help.

Shuang

On Tue, Nov 8, 2011 at 10:20 AM, Mingjie Lai <mjlai09@gmail.com> wrote:

>
> As I said, you need to use escapedFormatDfs. For your case, it should be:
>
> collector(3600000){ escapedFormatDfs("s3n://**
> mybucket/flume/%{source}/%Y-%**m-%d", "log-%{host}-")}
>
> According to the user guide, escapedFormatDfs will help to escape the
> %{source} string.
>
> ... The hdfspath can use escape sequences documented to bucket data as
> documented in the Output Bucketing section...
>
> But I haven't tried s3 as sink. However it should work. Can you have a try?
>
>
>
> On 11/08/2011 12:44 AM, Shuang wrote:
>
>> Thanks, Mingjie. In my case, I already have the "source" field in event
>> metadata, does that mean I can do the following directly?
>> collectorSink("s3n://mybucket/**flume/%{source}/%Y-%m-%d/\",
>> \"log-%{host}-\", 3600000)
>>
>> basically, use %{source} to refer to that metadata field in the S3 path?
>>
>> Shuang
>>
>> On Fri, Nov 4, 2011 at 5:02 PM, Mingjie Lai <mjlai09@gmail.com
>> <mailto:mjlai09@gmail.com>> wrote:
>>
>>
>>    You can try escapedFormatDfs. Here is example:
>>
>>    $ bin/flume node_nowatch -n f1 -c 'f1: text("/tmp/aa.txt") |
>>    value("src", 123)
>>    collector(2000){__**escapedFormatDfs("file:///tmp"**__, "aaaa-%{src}"
>> )};'
>>
>>
>>    $ ls /tmp
>>    aaaa-123
>>    ...
>>
>>    Should also work for s3.
>>
>>    Mingjie
>>
>>
>>    On 11/04/2011 03:17 PM, Shuang wrote:
>>
>>        Hi, guys,
>>           After reading the Flume User Guide, I thought this is
>>        possible, but
>>        would like to confirm with your guys. Currently, I have collectors
>>        configured as:
>>        collectorSink("s3n://mybucket/**__flume/%Y-%m-%d/\",\"log-%{__**
>> host}-\",
>>
>>        3600000),
>>        and I have a field called "source" in Flume event's metadata
>>        table, and
>>        would like to use in the collectorSink path, something like this:
>>        collectorSink("s3n://mybucket/**__flume/%{metadata['source']}/**
>> %__Y-%m-%d/\",\"log-%{host}-\"**,
>>
>>        3600000),
>>
>>        I wonder what's the right syntax to refer to field in the metadata
>>        table. I search the user guide and couldn't find any example.
>>        Also I'd
>>        like to point out, this is kind of similar to how Scribe's message
>>        category is used.
>>
>>        Shuang
>>
>>
>>

Mime
View raw message