flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srinivasan Subramanian <ssrini_va...@hotmail.com>
Subject RE: Output bucketing based on custom data / logic
Date Wed, 30 Nov 2011 08:18:44 GMT

Thanks Shuang.  I delimited the body of the message generated with a ~ and then did this:
Agent sink: {split("~", 1, "customer") => agentSink("localhost",35853)}
collectorSink("file:///var/log/collected/%Y-%m-%d/", "%{customer}-", 3600000)
This bucketed the files properly.  But I will evaluate the log4j appender for Flume as that
gives me a way to also include the meta data directly without delimiting the message body

From: shuang@open42.com
Date: Tue, 29 Nov 2011 17:56:35 -0800
Subject: Re: Output bucketing based on custom data / logic
To: flume-user@incubator.apache.org

I don't actually use regex, I log into Flume directly using Thrift, so I add a customized
metadata field in my application. You have to look it up in the Flume User Guide, one example
I was able to google up is: https://groups.google.com/a/cloudera.org/group/flume-user/browse_thread/thread/baa3a5f3790453a6?pli=1

Let us know how it works out for you.


On Tue, Nov 29, 2011 at 5:40 PM, Srinivasan Subramanian <ssrini_vasan@hotmail.com> wrote:

I am not too familiar with this. Can you please provide some more details. Collector I am
familiar to bucket. On the agent how do I specify the regex to introduce the meta?


Sent from my BlackBerry® smartphone

-----Original Message-----

From: Shuang <shuang@open42.com>

Date: Tue, 29 Nov 2011 18:45:57

To: <flume-user@incubator.apache.org>

Subject: Re: Output bucketing based on custom data / logic

You can use the regex decorator to parse out "ABC" and put it in a meta data field for example
named "customer", then in your collectorSink,  use %{customer} in the destination path.


On Tue, Nov 29, 2011 at 6:10 AM, Srinivasan Subramanian <ssrini_vasan@hotmail.com <mailto:ssrini_vasan@hotmail.com>
> wrote:

Just got flume installed on Centos and the basic setup is working great.  I have got a Master,
collector and a few agents.  

One additional requirement i have is to do with Output Bucketing.  Based on data that is present
in the log message that is being sent to flume, the data has to be slotted into different
buckets.  For eg assume that the log files are to be separated into different folders based
on a customer name that is present in the log message, how would one go about doing this?

Lets say the log message sent to Flume is "ABC : Some message".  The collector needs to put
this into a folder like /var/log/ABC.  How can this be achieved?  

Alternatively,  Can I use metadata?  As far as i could make out meta data is static and not
dynamic as i would need it to be.  

Thanks for all the help.



View raw message