flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sutanu Das <sd2...@att.com>
Subject regex_extractor NOT replacing the HDFS path vaiable
Date Thu, 18 Feb 2016 01:06:18 GMT
Hi Hari/Community,

We are trying to replace the hdfs path with the regex_extrator interceptor but apparently
the variable is not getting replaced in the HDFS path in the HDFS Sink.

We are trying to replace the HDFS path of the HDFS Sink with /prod/hadoop/smallsite/flume_ingest_ale2/%{host}/%Y/%m/%d/%H.....
Where /%{host} is the regex = .*host=(ale-\d+-\w+.attwifi.com).* of type = regex_extractor

We know the regex works b/c we checked in python that the source data output has the regex
match

>>> pattern = re.compile("host=(\w+-\d+-\w+.attwifi.com)\s.*")
>>> pattern.match(s)
<_sre.SRE_Match object at 0x7f8ca5cb4f30>
>>> s
'host=ale-1-sa.attwifi.com seq=478237182 timestamp=1455754889 op=1 topic_seq=540549 lic_info=10
topic=station sta_eth_mac=60:f8:1d:95:74:79 username=Javiers-phone role=centerwifi bssid=40:e3:d6:b0:02:52
device_type=iPhone sta_ip_address=192.168.21.14 hashed_sta_eth_mac=928ebc57036a2df7909c70ea5fce35774687835f
hashed_sta_ip_address=8c76d83c5afb6aa1ca814d8902943a42a58d0a23 vlan=0 ht=0 ap_name=BoA-AP564'
>>>


Is my config incorrect or do we need to write a custom interceptor on this?


Here is my Flume config:

multi-ale2-station.sources = source1
multi-ale2-station.channels = channel1
multi-ale2-station.sinks =  sink1

# Define the sources
multi-ale2-station.sources.source1.type = exec
multi-ale2-station.sources.source1.command =  /usr/local/bin/multi_ale2.py -f /etc/flume/ale_station_conf/m_s.cfg
multi-ale2-station.sources.source1.channels = channel1


# Define the channels
multi-ale2-station.channels.channel1.type = memory
multi-ale2-station.channels.channel1.capacity = 10000000
multi-ale2-station.channels.channel1.transactionCapacity = 10000000


# Define the interceptors
multi-ale2-station.sources.source1.interceptors = i1
multi-ale2-station.sources.source1.interceptors.i1.type = regex_extractor
multi-ale2-station.sources.source1.interceptors.i1.regex = .*host=(ale-\d+-\w+.attwifi.com).*
multi-ale2-station.sources.source1.interceptors.i1.serializers = s1
multi-ale2-station.sources.source1.interceptors.i1.serializers.type = default
multi-ale2-station.sources.source1.interceptors.i1.serializers.s1.name = host


# Define a logging sink
multi-ale2-station.sinks.sink1.type = hdfs
multi-ale2-station.sinks.sink1.channel = channel1
multi-ale2-station.sinks.sink1.hdfs.path = /prod/hadoop/smallsite/flume_ingest_ale2/%{host}/%Y/%m/%d/%H
multi-ale2-station.sinks.sink1.hdfs.fileType = DataStream
multi-ale2-station.sinks.sink1.hdfs.writeFormat = Text
multi-ale2-station.sinks.sink1.hdfs.filePrefix = Sutanu_regex_ALE_2_Station_topic
multi-ale2-station.sinks.sink1.hdfs.useLocalTimeStamp = true

Mime
View raw message