flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roberto Coluccio <roberto.coluc...@eng.it>
Subject Understand JMS source + HDFS sink batch management
Date Wed, 16 Nov 2016 10:35:58 GMT
Hello folks,

I'm testing a Flume agent defined by a topology made of :

*JMS source* (Tibco implementation) -> *memory channel* -> *hdfs sink*

The *JMS source* has:

  * my_agent.sources.my_source.batchSize = 100

The *memory channel* has:

  * my_agent.channels.my_channel.capacity = 100

The *HDFS sink* has:

  * my_agent.sinks.my_sink.hdfs.batchSize = 100
  * my_agent.sinks.my_sink.hdfs.rollCount = 0
  * my_agent.sinks.my_sink.hdfs.rollInterval = 0
  * my_agent.sinks.my_sink.hdfs.idleTimeout = 0

I don't understand how/why new files on HDFS are created/closed. In 
fact, when I:

 1. launch the agent (JMS queue empty)
 2. push a new text message on the JMS queue

It happens that a new file is created by the HDFS, but not yet closed 
(as I expect). BUT, when I

     3. push again a new text message on the JMS queue

regardles how much time I waited to perform step 3, the HDFS sink closes 
the previously open file, then open a new one for the new incoming 
message consumed from the queue and processed through the channel.

This way, files will always have 1 and only 1 message inside them. I was 
expecting that number to be 100, according to the configuration 
mentioned above.

Any hints?

Best regards,


View raw message