I'm running Flume 1.5.0 with this configuration:
 
flume_test.sources = sr1
flume_test.channels = ch1
flume_test.sinks = sk1
 
#avro source
flume_test.sources.sr1.type = avro
flume_test.sources.sr1.channels = ch1
flume_test.sources.sr1.bind = 10.92.211.22
flume_test.sources.sr1.port = 55000
flume_test.sources.sr1.ssl = true
flume_test.sources.sr1.keystore = /nas/used_by_hadoop/hadoop-kn-p2/rdd/hadoop_keystore.jks
flume_test.sources.sr1.keystore-password = *****
flume_test.sources.sr1.compression-type = gzip
 
#custom interceptor
flume_test.sources.sr1.interceptors = i1
flume_test.sources.sr1.interceptors.i1.type = com.vm.rdd.TimeBodyInterceptor$Builder
 
#memory channel
flume_test.channels.ch1.type = file
flume_test.channels.ch1.checkpointDir = /hadoop/user/flume/channels/flumeTest/checkpoint
flume_test.channels.ch1.dataDirs = /hadoop/user/flume/channels/flumeTest/data
flume_test.channels.ch1.capacity = 100000000
flume_test.channels.ch1.transactionCapacity = 10000
flume_test.channels.ch1.maxFileSize = 10000000
 
#HDFS channel
flume_test.sinks.sk1.channel = ch1
flume_test.sinks.sk1.type = hdfs
#dynamic path
flume_test.sinks.sk1.hdfs.path = hdfs:///landing/data/flumeTest/%Y-%m-%d
flume_test.sinks.sk1.hdfs.inUsePrefix = _
flume_test.sinks.sk1.hdfs.codeC = gzip
#write at this block size when reach it
flume_test.sinks.sk1.hdfs.rollSize = 2560000000
# roll file every 2 minutes if not filled a block
flume_test.sinks.sk1.hdfs.rollInterval = 120
# if file has been idle for 15s, close file
flume_test.sinks.sk1.hdfs.idleTimeout = 15
flume_test.sinks.sk1.hdfs.rollCount = 0
flume_test.sinks.sk1.hdfs.batchSize = 100
 
When I look at the data dir, there are many files:
 
-rw-r--r-- 1 rdd rdd    0 Jan 13 12:06 in_use.lock
-rw-r--r-- 1 rdd rdd 1.0M Jan 13 12:09 log-1
-rw-r--r-- 1 rdd rdd 1.0M Jan 13 12:09 log-2
-rw-r--r-- 1 rdd rdd 1.0M Jan 13 12:09 log-3
-rw-r--r-- 1 rdd rdd 1.0M Jan 13 12:09 log-4
-rw-r--r-- 1 rdd rdd 1.0M Jan 13 12:09 log-5
-rw-r--r-- 1 rdd rdd 1.0M Jan 13 12:09 log-6
-rw-r--r-- 1 rdd rdd 1.0M Jan 13 12:09 log-7
-rw-r--r-- 1 rdd rdd 1.0M Jan 13 12:09 log-8
-rw-r--r-- 1 rdd rdd 1.0M Jan 13 12:09 log-9
-rw-r--r-- 1 rdd rdd 1.0M Jan 13 12:09 log-10
-rw-r--r-- 1 rdd rdd 1.0M Jan 13 12:09 log-11
-rw-r--r-- 1 rdd rdd 1.0M Jan 13 12:09 log-12
-rw-r--r-- 1 rdd rdd   47 Jan 13 12:09 log-13.meta
-rw-r--r-- 1 rdd rdd   47 Jan 13 12:09 log-1.meta
-rw-r--r-- 1 rdd rdd   47 Jan 13 12:09 log-2.meta
-rw-r--r-- 1 rdd rdd   47 Jan 13 12:09 log-3.meta
-rw-r--r-- 1 rdd rdd   47 Jan 13 12:09 log-4.meta
-rw-r--r-- 1 rdd rdd   47 Jan 13 12:09 log-5.meta
-rw-r--r-- 1 rdd rdd   47 Jan 13 12:09 log-6.meta
-rw-r--r-- 1 rdd rdd   47 Jan 13 12:09 log-7.meta
-rw-r--r-- 1 rdd rdd   47 Jan 13 12:09 log-8.meta
-rw-r--r-- 1 rdd rdd   47 Jan 13 12:09 log-9.meta
-rw-r--r-- 1 rdd rdd   47 Jan 13 12:09 log-10.meta
-rw-r--r-- 1 rdd rdd   47 Jan 13 12:09 log-11.meta
-rw-r--r-- 1 rdd rdd   47 Jan 13 12:09 log-12.meta
-rw-r--r-- 1 rdd rdd 1.0M Jan 13 12:09 log-13
-rw-r--r-- 1 rdd rdd 1.0M Jan 13 12:09 log-14
-rw-r--r-- 1 rdd rdd   47 Jan 13 12:09 log-14.meta
-rw-r--r-- 1 rdd rdd    0 Jan 13 12:15 log-15
-rw-r--r-- 1 rdd rdd   47 Jan 13 12:15 log-15.meta
 
This list can grow up to hundreds of log files which are all accessed around the same time. When looking in the log file, it appears that the channels are duplicates of one another, as the agent is writing to each channel at the same time, for example:
 
2015-01-13 12:09:52,451 (Log-BackgroundWorker-ch1) [INFO - org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1005)] Updated checkpoint for file: /hadoop/user/flume/channels/flumeTest/data/log-14 position: 120431 logWriteOrderID: 1421150977639
2015-01-13 12:09:52,451 (Log-BackgroundWorker-ch1) [DEBUG - org.apache.flume.channel.file.Log.removeOldLogs(Log.java:1067)] Files currently in use: [14]
2015-01-13 12:09:52,451 (Log-BackgroundWorker-ch1) [INFO - org.apache.flume.channel.file.LogFile$RandomReader.close(LogFile.java:504)] Closing RandomReader /hadoop/user/flume/channels/flumeTest/data/log-1
2015-01-13 12:09:52,456 (Log-BackgroundWorker-ch1) [INFO - org.apache.flume.channel.file.LogFile$RandomReader.close(LogFile.java:504)] Closing RandomReader /hadoop/user/flume/channels/flumeTest/data/log-2
2015-01-13 12:09:52,461 (Log-BackgroundWorker-ch1) [INFO - org.apache.flume.channel.file.LogFile$RandomReader.close(LogFile.java:504)] Closing RandomReader /hadoop/user/flume/channels/flumeTest/data/log-3
2015-01-13 12:09:52,467 (Log-BackgroundWorker-ch1) [INFO - org.apache.flume.channel.file.LogFile$RandomReader.close(LogFile.java:504)] Closing RandomReader /hadoop/user/flume/channels/flumeTest/data/log-4
2015-01-13 12:09:52,472 (Log-BackgroundWorker-ch1) [INFO - org.apache.flume.channel.file.LogFile$RandomReader.close(LogFile.java:504)] Closing RandomReader /hadoop/user/flume/channels/flumeTest/data/log-5
2015-01-13 12:09:52,477 (Log-BackgroundWorker-ch1) [INFO - org.apache.flume.channel.file.LogFile$RandomReader.close(LogFile.java:504)] Closing RandomReader /hadoop/user/flume/channels/flumeTest/data/log-6
2015-01-13 12:09:52,482 (Log-BackgroundWorker-ch1) [INFO - org.apache.flume.channel.file.LogFile$RandomReader.close(LogFile.java:504)] Closing RandomReader /hadoop/user/flume/channels/flumeTest/data/log-7
2015-01-13 12:09:52,487 (Log-BackgroundWorker-ch1) [INFO - org.apache.flume.channel.file.LogFile$RandomReader.close(LogFile.java:504)] Closing RandomReader /hadoop/user/flume/channels/flumeTest/data/log-8
2015-01-13 12:09:52,492 (Log-BackgroundWorker-ch1) [INFO - org.apache.flume.channel.file.LogFile$RandomReader.close(LogFile.java:504)] Closing RandomReader /hadoop/user/flume/channels/flumeTest/data/log-9
2015-01-13 12:09:52,497 (Log-BackgroundWorker-ch1) [INFO - org.apache.flume.channel.file.LogFile$RandomReader.close(LogFile.java:504)] Closing RandomReader /hadoop/user/flume/channels/flumeTest/data/log-10
2015-01-13 12:09:52,503 (Log-BackgroundWorker-ch1) [INFO - org.apache.flume.channel.file.LogFile$RandomReader.close(LogFile.java:504)] Closing RandomReader /hadoop/user/flume/channels/flumeTest/data/log-11
2015-01-13 12:09:52,508 (Log-BackgroundWorker-ch1) [INFO - org.apache.flume.channel.file.LogFile$RandomReader.close(LogFile.java:504)] Closing RandomReader /hadoop/user/flume/channels/flumeTest/data/log-12
Does anyone know why there are so many active files at one time? Is this expected behaviour?
 
Regards,
 
Guy Needham | Data Discovery
Virgin Media   | Technology and Transformation | Data
Bartley Wood Business Park, Hook, Hampshire RG27 9UP
D 01256 75 3362
I welcome VSRE emails. Learn more at http://vsre.info/
 
 


--------------------------------------------------------------------
Save Paper - Do you really need to print this e-mail?

Visit www.virginmedia.com for more information, and more fun.

This email and any attachments are or may be confidential and legally privileged
and are sent solely for the attention of the addressee(s). If you have received this
email in error, please delete it from your system: its use, disclosure or copying is
unauthorised. Statements and opinions expressed in this email may not represent
those of Virgin Media. Any representations or commitments in this email are
subject to contract.

Registered office: Media House, Bartley Wood Business Park, Hook, Hampshire, RG27 9UP
Registered in England and Wales with number 2591237