flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Workman <justinjwork...@gmail.com>
Subject Re: hdfs.idleTime
Date Fri, 13 Jan 2017 21:14:32 GMT
I'll try  debug again. The output /regex seems to be fine, but I never see a call to close/rename
the last files in each directory until flume shuts down or restarts. 

I would expect to see this call when the idleTimeout value is reached. 

Sent from my iPhone

> On Jan 13, 2017, at 2:05 PM, iain wright <iainwrig@gmail.com> wrote:
> 
> Might be worth trying the debug output (I forget exact sink name) to just log the headers
being attached to events after the interceptor to validate the regex is working correctly,
and for all events. 
> 
> I setup this exact config at previous company so I know it works. 
> 
> I also remember needing to escape the regex in an odd way due to how java was loading/parsing
the config 
> 
> Best,
> Iain
> 
> Sent from my iPhone
> 
>> On Jan 13, 2017, at 12:00 PM, Justin Workman <justinjworkman@gmail.com> wrote:
>> 
>> Absolutey, see below. Just to reiterate, when using the timestamp interceptor values
to build the output path based on timestamp in the flume header, things roll correct. The
files also roll just fine base on file size as well. However when using the regex_interceptor
to get the actual events timestamp to use in the output path, the last file in each directory
does not ever rename/close until flume is restarted.
>> 
>> 
>> flume-conf.properties
>> agent1.sources  = fpssKafkaTopic
>> agent1.channels = fpssHdfsFileChannel
>> agent1.sinks = fpssHdfsSink
>> 
>> agent1.sources.fpssKafkaTopic.type = org.apache.flume.source.kafka.KafkaSource
>> agent1.sources.fpssKafkaTopic.zookeeperConnect = zk-host:2181
>> agent1.sources.fpssKafkaTopic.topic = first-pass-stream-sessionized 
>> agent1.sources.fpssKafkaTopic.groupId =  flume-first-pass-stream-sessionized
>> agent1.sources.fpssKafkaTopic.kafka.auto.offset.reset = smallest
>> agent1.sources.fpssKafkaTopic.channels = fpssHdfsFileChannel
>> agent1.sources.fpssKafkaTopic.interceptors = i1 i2 i3
>> agent1.sources.fpssKafkaTopic.interceptors.i1.type = timestamp
>> agent1.sources.fpssKafkaTopic.interceptors.i1.preserveExisting = false
>> agent1.sources.fpssKafkaTopic.interceptors.i2.type = org.apache.flume.interceptor.HostInterceptor$Builder
>> agent1.sources.fpssKafkaTopic.interceptors.i2.hostHeader = hostname
>> agent1.sources.fpssKafkaTopic.interceptors.i2.useIP= false
>> agent1.sources.fpssKafkaTopic.interceptors.i2.preserveExisting = true
>> agent1.sources.fpssKafkaTopic.interceptors.i3.type = regex_extractor
>> agent1.sources.fpssKafkaTopic.interceptors.i3.regex = ^.*\\"entryId\\":\\{\\"date\\":\\"(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d)T(\\d\\d):.*\\"\\}.*$
>> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers = s1 s2 s3 s4
>> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s1.name = year
>> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s2.name = month
>> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s3.name = day
>> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s4.name = hour
>> agent1.sources.fpssKafkaTopic.kafka.consumer.timeout.ms = 100
>> 
>> agent1.channels.fpssHdfsFileChannel.type = file
>> agent1.channels.fpssHdfsFileChannel.checkpointDir = /opt/flume/file-channel/fpss/checkpoint
>> agent1.channels.fpssHdfsFileChannel.dataDirs = /opt/flume/file-channel/fpss/data
>> 
>> agent1.sinks.fpssHdfsSink.type = hdfs
>> agent1.sinks.fpssHdfsSink.hdfs.filePrefix = %{hostname}-log
>> agent1.sinks.fpssHdfsSink.hdfs.inUseSuffix = .tmp
>> agent1.sinks.fpssHdfsSink.hdfs.path = hdfs://prodcluster/flumedata/processed/first-pass-stream/%{year}/%{month}/%{day}/%{hour}-00
>> agent1.sinks.fpssHdfsSink.hdfs.kerberosPrincipal = runtime@EXAMPLE.COM
>> agent1.sinks.fpssHdfsSink.hdfs.kerberosKeytab = <keytab path removed for privacy>
>> agent1.sinks.fpssHdfsSink.hdfs.rollInterval = 0 
>> agent1.sinks.fpssHdfsSink.hdfs.rollCount = 0
>> ## Account for compression. See flume-2128
>> ## My calculation: 512 * 1024 * 1024 * 2.75
>> agent1.sinks.fpssHdfsSink.hdfs.rollSize = 1476395008
>> # Close file if idle more than 300 seconds
>> agent1.sinks.hdfsSink.hdfs.idleTimeout = 300
>> agent1.sinks.hdfsSink.hdfs.useLocalTimeStamp = true
>> agent1.sinks.fpssHdfsSink.hdfs.fileType = CompressedStream
>> agent1.sinks.fpssHdfsSink.hdfs.codeC = snappy
>> agent1.sinks.fpssHdfsSink.hdfs.writeFormat = Text
>> agent1.sinks.fpssHdfsSink.channel = fpssHdfsFileChannel
>> agent1.sinks.fpssHdfsSink.hdfs.batchSize = 10000
>> agent1.sinks.fpssHdfsSink.hdfs.threadsPoolSize = 20
>> agent1.sinks.fpssHdfsSink.hdfs.callTimeout = 20000
>> 
>> HDFS Output Since Midnight (Notice the last file is never closed/renamed)
>>  hdfs dfs -ls /flumedata/processed/first-pass-stream/2017/01/13/*/
>> 17/01/13 12:38:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
>> Found 7 items
>> -rw-r--r--   3 b2c_runtime hadoop  513710580 2017-01-13 00:09 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815397.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  514439844 2017-01-13 00:18 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815398.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  515125962 2017-01-13 00:28 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815399.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  513010837 2017-01-13 00:38 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815400.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  511315467 2017-01-13 00:49 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815401.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  508420966 2017-01-13 00:59 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815402.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    2503353 2017-01-13 00:59 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815403.snappy.tmp
>> Found 6 items
>> -rw-r--r--   3 b2c_runtime hadoop  509116221 2017-01-13 01:10 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415705.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  507800675 2017-01-13 01:21 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415706.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  504432110 2017-01-13 01:32 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415707.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  501932914 2017-01-13 01:42 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415708.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  498136257 2017-01-13 01:50 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415709.snappy
>> -rw-r--r--   3 b2c_runtime hadoop      60539 2017-01-13 01:50 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415710.snappy.tmp
>> Found 6 items
>> -rw-r--r--   3 b2c_runtime hadoop  500879399 2017-01-13 02:11 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016017.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  501827071 2017-01-13 02:21 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016018.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  501489101 2017-01-13 02:32 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016019.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  501527838 2017-01-13 02:43 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016020.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  499393977 2017-01-13 02:54 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016021.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    1282327 2017-01-13 02:54 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016022.snappy.tmp
>> Found 6 items
>> -rw-r--r--   3 b2c_runtime hadoop  501033294 2017-01-13 03:10 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615579.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  500933906 2017-01-13 03:20 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615580.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  505869233 2017-01-13 03:31 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615581.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  502910608 2017-01-13 03:41 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615582.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  499561080 2017-01-13 03:52 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615583.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    3616826 2017-01-13 03:52 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615584.snappy.tmp
>> Found 6 items
>> -rw-r--r--   3 b2c_runtime hadoop  502243204 2017-01-13 04:11 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215893.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  508966498 2017-01-13 04:22 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215894.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  510972236 2017-01-13 04:34 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215895.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  513225577 2017-01-13 04:46 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215896.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  512743679 2017-01-13 04:57 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215897.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    3888775 2017-01-13 04:57 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215898.snappy.tmp
>> Found 7 items
>> -rw-r--r--   3 b2c_runtime hadoop  515832251 2017-01-13 05:11 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811983.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  518077964 2017-01-13 05:20 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811984.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  519490676 2017-01-13 05:29 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811985.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  519105563 2017-01-13 05:37 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811986.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  518672209 2017-01-13 05:46 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811987.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  520019853 2017-01-13 05:53 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811988.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    1574211 2017-01-13 05:53 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811989.snappy.tmp
>> Found 9 items
>> -rw-r--r--   3 b2c_runtime hadoop  521428204 2017-01-13 06:07 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413743.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  519885769 2017-01-13 06:15 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413744.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  519050891 2017-01-13 06:21 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413745.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  520691322 2017-01-13 06:29 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413746.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  520902319 2017-01-13 06:36 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413747.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  520831873 2017-01-13 06:42 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413748.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  519785647 2017-01-13 06:49 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413749.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  520590143 2017-01-13 06:55 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413750.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    4621367 2017-01-13 06:55 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413751.snappy.tmp
>> Found 11 items
>> -rw-r--r--   3 b2c_runtime hadoop  522623760 2017-01-13 07:06 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015214.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523065112 2017-01-13 07:12 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015215.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523445533 2017-01-13 07:18 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015216.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523084945 2017-01-13 07:24 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015217.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524283976 2017-01-13 07:30 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015218.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523923379 2017-01-13 07:36 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015219.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523910723 2017-01-13 07:42 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015220.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524266095 2017-01-13 07:47 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015221.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523002505 2017-01-13 07:53 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015222.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  520706211 2017-01-13 07:58 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015223.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    8051588 2017-01-13 07:58 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015224.snappy.tmp
>> Found 11 items
>> -rw-r--r--   3 b2c_runtime hadoop  520528155 2017-01-13 08:05 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618433.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  521761390 2017-01-13 08:11 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618434.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  522548272 2017-01-13 08:16 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618435.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  522616117 2017-01-13 08:22 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618436.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525953759 2017-01-13 08:28 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618437.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524475009 2017-01-13 08:34 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618438.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523995339 2017-01-13 08:40 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618439.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524188832 2017-01-13 08:47 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618440.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525303001 2017-01-13 08:53 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618441.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525606532 2017-01-13 08:59 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618442.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    4486982 2017-01-13 08:59 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618443.snappy.tmp
>> Found 11 items
>> -rw-r--r--   3 b2c_runtime hadoop  525207364 2017-01-13 09:06 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216987.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  526105891 2017-01-13 09:12 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216988.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  526426735 2017-01-13 09:18 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216989.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525298099 2017-01-13 09:24 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216990.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525282945 2017-01-13 09:30 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216991.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523921005 2017-01-13 09:36 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216992.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524827705 2017-01-13 09:42 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216993.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524203463 2017-01-13 09:47 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216994.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524678485 2017-01-13 09:53 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216995.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524598220 2017-01-13 09:59 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216996.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    3877959 2017-01-13 09:59 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216997.snappy.tmp
>> Found 10 items
>> -rw-r--r--   3 b2c_runtime hadoop  523000460 2017-01-13 10:06 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813831.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523455154 2017-01-13 10:12 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813832.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525465618 2017-01-13 10:18 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813833.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524630955 2017-01-13 10:24 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813834.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  527780298 2017-01-13 10:30 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813835.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  526565562 2017-01-13 10:37 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813836.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524936336 2017-01-13 10:43 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813837.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524565610 2017-01-13 10:49 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813838.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524276950 2017-01-13 10:55 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813839.snappy
>> -rw-r--r--   3 b2c_runtime hadoop     654810 2017-01-13 10:55 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813840.snappy.tmp
>> Found 11 items
>> -rw-r--r--   3 b2c_runtime hadoop  524174553 2017-01-13 11:06 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415712.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524127864 2017-01-13 11:12 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415713.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524778919 2017-01-13 11:18 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415714.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524851182 2017-01-13 11:24 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415715.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525156750 2017-01-13 11:30 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415716.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525334538 2017-01-13 11:35 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415717.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  527346578 2017-01-13 11:41 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415718.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525592734 2017-01-13 11:47 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415719.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525502291 2017-01-13 11:53 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415720.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523135186 2017-01-13 11:58 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415721.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    9967141 2017-01-13 11:58 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415722.snappy.tmp
>> Found 7 items
>> -rw-r--r--   3 b2c_runtime hadoop  520881970 2017-01-13 12:05 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016849.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  522340745 2017-01-13 12:11 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016850.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524156495 2017-01-13 12:17 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016851.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523482390 2017-01-13 12:23 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016852.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524096591 2017-01-13 12:29 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016853.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523184628 2017-01-13 12:35 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016854.snappy
>> -rw-r--r--   3 b2c_runtime hadoop   10981218 2017-01-13 12:35 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016855.snappy.tmp
>> 
>> HDFS Stat On One Of The File (Keep in Mind the output backet is based on event time
that is MDT/MST vs the stat date of GMT)
>>  hadoop fs -stat "%y %n"  /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100
>> -log.1484326813840.snappy.tmp
>> 17/01/13 12:57:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
>> 2017-01-13 17:55:35 flumeload100-log.1484326813840.snappy.tmp
>> 
>> Thanks
>> Justin
>> 
>>> On Thu, Jan 12, 2017 at 11:56 PM, Denes Arvay <denes@cloudera.com> wrote:
>>> Hi Justin,
>>> 
>>> Could you please share your config file with us?
>>> 
>>> Thanks,
>>> Denes
>>> 
>>> 
>>>> On Thu, Jan 12, 2017, 20:20 Justin Workman <justinjworkman@gmail.com>
wrote:
>>>> sorry for cross posting to user and dev. I have recently set up a flume configuration
where we are using the regex_extractor interceptor to parse the actual event date from the
record flowing through the Flume source, then using that date to build the HDFS sink bucket
path. However, it appears that the hdfs.idleTimeout value is not honored in this configuration.
It does work when using the timestamp interceptor you build the output path.
>>>> 
>>>> I have set the hdfs.idleTimeout value for the HDFS sink, but the files are
never closed or renamed until I restart or shutdown Flume. Our flume is configured to roll
based on size or output path, and the files rename/close/roll fine based on size, however
the last file in each output path is always left with the .tmp extension until we restart
Flume. I would expect that the file would be renamed and closed if there are no records written
to this file after the idleTimeout is reached.
>>>> 
>>>> Could I be missing something, or is this a known bug with the regex_extract
interceptor?
>>>> 
>>>> Thanks
>>>> Justin
>> 

Mime
View raw message