flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillermo Ortiz <konstt2...@gmail.com>
Subject Re: Flow in Flume, could it make better?
Date Tue, 19 Aug 2014 07:11:28 GMT
Yeah, I think that it's what I'm doing.
How about:

                                                  channel1 -> sink1 (hdfs
raw data)
Agent 1src --> replicate +
Interceptor1
-->sink3
                                                   channel2 --> sink2 avro
--> agent2 src Avro --> multiplexing + interceptor2

-->sink4

Could it be possible to apply the interceptor1 just for channel1?? I know
that interceptors apply to source level. Interceptor1 doesn't modify too
much the data,
I could feed channel2 with those little transformations but ideally I would
like it. So, if I want to do it, it looks like I'd have to create another
level with more channels, etc, etc... Something like this:

                                   channel1 -> *sink1 avro -> scr1 avro +
interceptor1 -> channel -> sink1 (hdfs raw data)*
Agent 1src -->
replicate
                          -->sink3
                                   channel2 --> sink2 avro --> agent2 src
Avro --> multiplexing + interceptor2

-->sink4

The point is that in sink4 my flow continues and I have other structure
that it's similiar that all the previously, So, that means 8 channels in
total. I don't know if it's possible to simplify this.


2014-08-19 0:09 GMT+02:00 terrey shih <terreyshih@gmail.com>:

> something like this
>
>                         channel 1 -> sink 1 (raw event sink)
> agent 1src -> replicate
>
>                                                                          ->
> sink 3
>                         channel 2 - sink  2 -> agent 2 src -> multiplexer
>
> -> sink 4
>
> In fact, I tried not having agent 2, but directly connecting sink2 to src
> 2, I was not able to do due to RPCClient exception.
>
> I am just going to try to have 2 agents.
>
> terrey
>
>
> On Mon, Aug 18, 2014 at 3:06 PM, terrey shih <terreyshih@gmail.com> wrote:
>
>> Well, I am actually doing similar things as you do.  I also need to feed
>> that data to different sinks, one just raw data and the other ones are
>> Hbase sinks using the multiplexer.
>>
>>
>>                         channel 1 -> sink 1 (raw event sink)
>> agent 1src -> replicate
>> channel 2 - sink  2 -> agent 2 src -> multiplexer
>>
>>                         channel 2 - sink  2 -> agent 2 src -> multiplexer
>>
>>
>>
>>
>> On Mon, Aug 18, 2014 at 1:35 PM, Guillermo Ortiz <konstt2000@gmail.com>
>> wrote:
>>
>>> On my test, everything is in the same VM. Later, I'll have another flow
>>> which is just spooling or tailing a file and send through Avro to another
>>> Source on my system.
>>>
>>> Do I really need to do that replicating step? I think that I have too
>>> many channel and that means too resources and too configuration.
>>>
>>>
>>> 2014-08-18 19:51 GMT+02:00 terrey shih <terreyshih@gmail.com>:
>>>
>>> Hi,
>>>>
>>>> Your 2 sources (spooling) and source Avro (from sink 2) are in two
>>>> different JVMs/machines ?
>>>>
>>>> thx
>>>>
>>>>
>>>> On Mon, Aug 18, 2014 at 9:53 AM, Guillermo Ortiz <konstt2000@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have build a flow with Flume and I don't know if it's the way to do
>>>>> it, or there is something better. I am spooling a directory and need
those
>>>>> data in three different paths in HDFS with different formats, so I have
>>>>> created two interceptors.
>>>>>
>>>>> Source(Spooling) + Replication + Interceptor1 --> to C1 and C2
>>>>> C1 -> Sink1 to HDFS Path1 (It's like a historic)
>>>>> C2 --> Sink2 to Avro --> Source Avro + Multiplexing + Interceptor2
-->
>>>>> C3 and C4
>>>>> C3 --> Sink3 to HDFS Path2
>>>>> C4 --> Sink4 to HDFS Path3
>>>>>
>>>>> Interceptor1 doesn't make too much with the data, it's just to save as
>>>>> they are, it's like to store an history of the original data.
>>>>>
>>>>> Interceptor2 configure an selector and a header. It processes the data
>>>>> and configure the selector to redirect to Sink3 or Sink4. But this
>>>>> interceptor change the original data.
>>>>>
>>>>> I tried to do all the process without replicating data, but I could
>>>>> not. Now, it seems like too many steps just because I want to store the
>>>>> original data in HDFS like a historic.
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message