flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From iain wright <iainw...@gmail.com>
Subject Re: regex_extractor NOT replacing the HDFS path vaiable
Date Thu, 18 Feb 2016 01:38:48 GMT
Config looks sane,

Are events being written to /prod/hadoop/smallsite/flume_
ingest_ale2//%Y/%m/%d/%H?

A couple things that may be worth trying if you haven't yet:

- Try host=(ale-\d+-\w+.attwifi.com) instead of .*host=(ale-\d+-\w+.
attwifi.com).*
- Try hostname or another header instead of host, since host is a header
used by the host interceptor


-- 
Iain Wright

This email message is confidential, intended only for the recipient(s)
named above and may contain information that is privileged, exempt from
disclosure under applicable law. If you are not the intended recipient, do
not disclose or disseminate the message to anyone except the intended
recipient. If you have received this message in error, or are not the named
recipient(s), please immediately notify the sender by return email, and
delete all copies of this message.

On Wed, Feb 17, 2016 at 5:06 PM, Sutanu Das <sd2302@att.com> wrote:

> Hi Hari/Community,
>
>
>
> We are trying to replace the hdfs path with the regex_extrator interceptor
> but apparently the variable is not getting replaced in the HDFS path in the
> HDFS Sink.
>
>
>
> We are trying to replace the HDFS path of the HDFS Sink with
> /prod/hadoop/smallsite/flume_ingest_ale2*/%{host*}/%Y/%m/%d/%H….. Where
> */%{host*} is the regex = .*host=(ale-\d+-\w+.attwifi.com).* of type =
> regex_extractor
>
>
>
> We know the regex works b/c we checked in python that the source data
> output has the regex match
>
>
>
> >>> pattern = re.compile("host=(\w+-\d+-\w+.attwifi.com)\s.*")
>
> >>> pattern.match(s)
>
> <_sre.SRE_Match object at 0x7f8ca5cb4f30>
>
> >>> s
>
> *'host=ale-1-sa.attwifi.com <http://ale-1-sa.attwifi.com>* seq=478237182
> timestamp=1455754889 op=1 topic_seq=540549 lic_info=10 topic=station
> sta_eth_mac=60:f8:1d:95:74:79 username=Javiers-phone role=centerwifi
> bssid=40:e3:d6:b0:02:52 device_type=iPhone sta_ip_address=192.168.21.14
> hashed_sta_eth_mac=928ebc57036a2df7909c70ea5fce35774687835f
> hashed_sta_ip_address=8c76d83c5afb6aa1ca814d8902943a42a58d0a23 vlan=0 ht=0
> ap_name=BoA-AP564'
>
> >>>
>
>
>
>
>
> Is my config incorrect or do we need to write a custom interceptor on this?
>
>
>
>
>
> Here is my Flume config:
>
>
>
> multi-ale2-station.sources = source1
>
> multi-ale2-station.channels = channel1
>
> multi-ale2-station.sinks =  sink1
>
>
>
> # Define the sources
>
> multi-ale2-station.sources.source1.type = exec
>
> multi-ale2-station.sources.source1.command =  /usr/local/bin/multi_ale2.py
> -f /etc/flume/ale_station_conf/m_s.cfg
>
> multi-ale2-station.sources.source1.channels = channel1
>
>
>
>
>
> # Define the channels
>
> multi-ale2-station.channels.channel1.type = memory
>
> multi-ale2-station.channels.channel1.capacity = 10000000
>
> multi-ale2-station.channels.channel1.transactionCapacity = 10000000
>
>
>
>
>
> # Define the interceptors
>
> multi-ale2-station.sources.source1.interceptors = i1
>
> multi-ale2-station.sources.source1.interceptors.i1.type = regex_extractor
>
> multi-ale2-station.sources.source1.interceptors.i1.regex =
> .*host=(ale-\d+-\w+.attwifi.com).*
>
> multi-ale2-station.sources.source1.interceptors.i1.serializers = s1
>
> multi-ale2-station.sources.source1.interceptors.i1.serializers.type =
> default
>
> multi-ale2-station.sources.source1.interceptors.i1.serializers.s1.name =
> host
>
>
>
>
>
> # Define a logging sink
>
> multi-ale2-station.sinks.sink1.type = hdfs
>
> multi-ale2-station.sinks.sink1.channel = channel1
>
> multi-ale2-station.sinks.sink1.hdfs.path =
> /prod/hadoop/smallsite/flume_ingest_ale2/%{host}/%Y/%m/%d/%H
>
> multi-ale2-station.sinks.sink1.hdfs.fileType = DataStream
>
> multi-ale2-station.sinks.sink1.hdfs.writeFormat = Text
>
> multi-ale2-station.sinks.sink1.hdfs.filePrefix =
> Sutanu_regex_ALE_2_Station_topic
>
> multi-ale2-station.sinks.sink1.hdfs.useLocalTimeStamp = true
>

Mime
View raw message