Just to let people know, I have successfully set up my Flume flow with
autoCollectorSource" | 'collectorSink("s3n://mybucket/flume/%{source}/%Y-%m-%d/","log-%{host}-", 3600000)'

while in my python program, I have following:
       event_msg = self.flume_connection.ThriftFlumeEvent(
            timestamp=int(timestamp * 1000),
            body=content,
            nanos=timestamp * 1000000000,
            host=config.HOSTNAME,
            fields = { 'source' : traffic_source } )
        self.flume_connection.appendFlumeEventMsg(event_msg)
and it works perfectly, now my request logs are stored in different directories by traffic_source.

Shuang

On Tue, Nov 8, 2011 at 11:09 AM, Shuang <shuang@open42.com> wrote:
According to Flume User Guide 8.2.3/8.2.3 (it's a little inconsistent). "collectorSink" is translated in following way:

collectorSink("s3n://mybucket/flume/%{source}/%Y-%m-%d/","log-%{host}-", 3600000)
==>
collector(3600000) { escapedCustomDfs("s3n://mybucket/flume/%{source}/%Y-%m-%d/", "log-%{host}-%{rolltag}") }

So I assume either collectorSink or your solution would work, unless Flume's translation mechanism doesn't escape Dfs path properly.

Thanks a lot for your help.

Shuang


On Tue, Nov 8, 2011 at 10:20 AM, Mingjie Lai <mjlai09@gmail.com> wrote:

As I said, you need to use escapedFormatDfs. For your case, it should be:

collector(3600000){ escapedFormatDfs("s3n://mybucket/flume/%{source}/%Y-%m-%d", "log-%{host}-")}

According to the user guide, escapedFormatDfs will help to escape the %{source} string.

... The hdfspath can use escape sequences documented to bucket data as documented in the Output Bucketing section...

But I haven't tried s3 as sink. However it should work. Can you have a try?



On 11/08/2011 12:44 AM, Shuang wrote:
Thanks, Mingjie. In my case, I already have the "source" field in event
metadata, does that mean I can do the following directly?
collectorSink("s3n://mybucket/flume/%{source}/%Y-%m-%d/\",
\"log-%{host}-\", 3600000)

basically, use %{source} to refer to that metadata field in the S3 path?

Shuang

On Fri, Nov 4, 2011 at 5:02 PM, Mingjie Lai <mjlai09@gmail.com
<mailto:mjlai09@gmail.com>> wrote:


   You can try escapedFormatDfs. Here is example:

   $ bin/flume node_nowatch -n f1 -c 'f1: text("/tmp/aa.txt") |
   value("src", 123)
   collector(2000){__escapedFormatDfs("file:///tmp"__, "aaaa-%{src}" )};'


   $ ls /tmp
   aaaa-123
   ...

   Should also work for s3.

   Mingjie


   On 11/04/2011 03:17 PM, Shuang wrote:

       Hi, guys,
          After reading the Flume User Guide, I thought this is
       possible, but
       would like to confirm with your guys. Currently, I have collectors
       configured as:
       collectorSink("s3n://mybucket/__flume/%Y-%m-%d/\",\"log-%{__host}-\",

       3600000),
       and I have a field called "source" in Flume event's metadata
       table, and
       would like to use in the collectorSink path, something like this:
       collectorSink("s3n://mybucket/__flume/%{metadata['source']}/%__Y-%m-%d/\",\"log-%{host}-\",

       3600000),

       I wonder what's the right syntax to refer to field in the metadata
       table. I search the user guide and couldn't find any example.
       Also I'd
       like to point out, this is kind of similar to how Scribe's message
       category is used.

       Shuang