flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Chavez <pcha...@verticalsearchworks.com>
Subject Re: Writing to HDFS from multiple HDFS agents (separate machines)
Date Fri, 15 Mar 2013 03:30:14 GMT
It just depends on what you want to do with the header. In the case I presented the header
is set by the agent running the HDFS sink, which seemed to align with your use case. If you
need to know the originating host, just have the interceptor or originating host set a different
header, the %{} notation allows you to specify an arbitrary header to swap in for the token,
as long as it exists, of course.


On Mar 14, 2013, at 7:31 PM, "Gary Malouf" <malouf.gary@gmail.com<mailto:malouf.gary@gmail.com>>

Paul, I interpreted the host property to be for identifying the host that an event originates
from rather than the host of the sink which writes the event to HDFS?  Is my understanding

What happens if I am using the NettyAvroRpcClient to feed events from a different server round
robin style to two hdfs writing agents; should I then NOT set the host property on client
side and rely on the interceptor?

On Thu, Mar 14, 2013 at 6:34 PM, Gary Malouf <malouf.gary@gmail.com<mailto:malouf.gary@gmail.com>>
To be clear, I am referring to the segregating of data from different flume sinks as opposed
to the original source of the event.  Having said that, it sounds like your approach is the


On Thu, Mar 14, 2013 at 5:54 PM, Gary Malouf <malouf.gary@gmail.com<mailto:malouf.gary@gmail.com>>
Hi guys,

I'm new to flume (hdfs for that metter), using the version packaged with CDH4 (1.3.0) and
was wondering how others are maintaining different file names being written to per HDFS sink.

My initial thought is to create a separate sub-directory in hdfs for each sink - though I
feel like the better way is to somehow prefix each file with a unique sink id.  Are there
any patterns that others are following for this?


View raw message