flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mahendran m <mahendra...@hotmail.com>
Subject RE: Shutdowning HDFS server leads to flume agent shutdown
Date Fri, 07 Nov 2014 09:58:09 GMT
Hi Needham , 
Thanks for your response .
If this is case then i facing the data lose . for example 
I sent 5129 event to flume and i configured  agent for file size should be 1MB ( contain 4890
events) then only it roll out .I have one completely roll out log file of size 1MB . another
file in is in process of writing so when stop HDFS it also stop flume .if i open that that
second log file it does not contain the event i sent it contain below text
java.io.IOException: Got error for OP_READ_BLOCK, self=/, remote=,
for file /, for pool BP-1861801959-
block 1073743344_2532  

now i start agent again .now it read the check point directory and move the missed event to
HDFS  but it moved only last two events(5128,5129) . event between 4891 to 5127 were completely
missing why it is happening . how i prevent data lose in this case?

From: Guy.Needham@virginmedia.co.uk
To: user@flume.apache.org; flume-user@incubator.apache.org
Subject: RE: Shutdowning HDFS server leads to flume agent shutdown
Date: Fri, 7 Nov 2014 09:36:33 +0000

Hi Mahendran,
yes that is expected behaviour - I suspect that if you look in the logs for this agent, it
will have thrown an exception when you shut down the HDFS, as it is depending
 on a compatible HDFS being available.


Guy Needham | Data Discovery

Virgin Media | Enterprise Data, Design & Management

Bartley Wood Business Park, Hook, Hampshire RG27 9UP

D 01256 75 3362 

I welcome VSRE emails. Learn more at


From: mahendran m [mailto:mahendranec@hotmail.com]

Sent: 07 November 2014 09:33

To: flume-user@incubator.apache.org

Subject: Shutdowning HDFS server leads to flume agent shutdown

Hi All,  
I am new to Apache flume . I have configured using thrift source to send log to HDFS my Config
as below 

# list sources, sinks and channels in the agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# avro sink properties
a1.sources.r1.type = thrift
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = DecoderInterceptor.CustomInterceptor$Builder

#HDFS sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.fileSuffix= .txt
a1.sinks.k1.hdfs.rollSize = 1048576
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.batchSize = 1000
a1.sinks.k1.hdfs.minBlockReplicas = 1
a1.sinks.k1.hdfs.callTimeout = 60000
a1.sinks.k1.hdfs.path = hdfs://localhost:9000/flumeChannel100/Thrift

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000000
a1.channels.c1.transactionCapacity = 1000
a1.channels.c1.byteCapacityBufferPercentage = 10
a1.channels.c1.byteCapacity = 5368709120

# define the flow
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

when i stated HDFS, flume service and generated the logs from c# application . my logs are
moved to HDFS everything OK still now. but when stopped HDFS service . flume agent itself
get stopped. Is this default behavior ? . or any wen wrong .



Save Paper - Do you really need to print this e-mail?

Visit www.virginmedia.com for more information, and more fun.

This email and any attachments are or may be confidential and legally privileged

and are sent solely for the attention of the addressee(s). If you have received this

email in error, please delete it from your system: its use, disclosure or copying is

unauthorised. Statements and opinions expressed in this email may not represent

those of Virgin Media. Any representations or commitments in this email are

subject to contract. 

Registered office: Media House, Bartley Wood Business Park, Hook, Hampshire, RG27 9UP

Registered in England and Wales with number 2591237
View raw message