flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Seshu V <ses...@gmail.com>
Subject Re: Writing to HDFS from multiple HDFS agents (separate machines)
Date Fri, 15 Mar 2013 21:20:02 GMT
I could differentiate different sources using this config by creating
separate directories by hostname:

agent.sources.syslogsrc.interceptors = ts
agent.sources.syslogsrc.interceptors.ts.type = timestamp
agent.sinks.hdfsSink.hdfs.path =

However, I have a question related to this.  When two different products
are sending their logs to one source and I am collecting them via syslog.
 Is there a way to differentiate two different product logs coming from
single source in flume?  I would ideally like to have sub directory at the
sink like '/flumetest/%{host}/<product_name>/%y-%m-%d.  How can I do this?

- Seshu

On Thu, Mar 14, 2013 at 5:00 PM, Mohammad Tariq <dontariq@gmail.com> wrote:

> Hello sir,
>     One idea could be to create the sub directories with the machines'
> hostnames, in case you are getting data from multiple sources. you can
> easily find out which data belongs to which machine then.
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
> On Fri, Mar 15, 2013 at 3:24 AM, Gary Malouf <malouf.gary@gmail.com>wrote:
>> Hi guys,
>> I'm new to flume (hdfs for that metter), using the version packaged with
>> CDH4 (1.3.0) and was wondering how others are maintaining different file
>> names being written to per HDFS sink.
>> My initial thought is to create a separate sub-directory in hdfs for each
>> sink - though I feel like the better way is to somehow prefix each file
>> with a unique sink id.  Are there any patterns that others are following
>> for this?
>> -Gary

View raw message