flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ed Judge <ejud...@gmail.com>
Subject Re: HDFS IO error
Date Fri, 31 Oct 2014 01:14:52 GMT
I have been using 1.5 all along. I end up with a 0 length file which is a little concerning.
Not to mention that the timeout is adding 10 seconds to the overall transfer. Is this normal
or is there something I can do to prevent the timeout?

Thanks,
Ed. 

Sent from my iPhone


> On Oct 30, 2014, at 5:58 PM, Asim Zafir <asim.zafir@gmail.com> wrote:
> 
> Ed, 
> 
> Are you saying you resolved the problem with 1.5.0 or you still have an issue?
> 
> Thanks, 
> 
> Asim Zafir.
> 
>> On Thu, Oct 30, 2014 at 1:47 PM, Ed Judge <ejudgie@gmail.com> wrote:
>> Thanks for the replies.  We are using 1.5.0.
>> My observation is that Flume retries automatically (without my intervention) and
that no data is lost.  
>> The impact is a) a delay of 10 seconds due to the timeout and b) a zero length file.
>> 
>> -Ed
>> 
>>> On Oct 30, 2014, at 3:46 PM, Asim Zafir <asim.zafir@gmail.com> wrote:
>>> 
>>> Please check if ur sinks i.e. hdfs data nodes that were receiving the writes
are not having any bad blocks . Secondly I think you should also set hdfs roll interval or
size to a higher value.  The reason this problem happens is because flume sink is not able
to right to a data pipeline that was initially presented by hdfs. The solution in this case
should be for hdfs to  initialize a new pipeline and present to flume. The hack currently
Is to restart the flume process which then initializes a new hdfs pipeline enabling the sink
to push backlogged events. There is a fix to this incorporated In flume 1.5 (i havent test
it yet) but if u are on anything older the only way to make this work is restart the flume
process
>>> 
>>>> On Oct 30, 2014 11:54 AM, "Ed Judge" <ejudgie@gmail.com> wrote:
>>>> I am running into the following problem.
>>>> 
>>>> 30 Oct 2014 18:43:26,375 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO error
>>>> java.io.IOException: Callable timed out after 10000 ms on file: hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596209.ds.tmp
>>>> 	at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:732)
>>>> 	at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:262)
>>>> 	at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:554)
>>>> 	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
>>>> 	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>> 	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: java.util.concurrent.TimeoutException
>>>> 	at java.util.concurrent.FutureTask.get(FutureTask.java:201)
>>>> 	at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:725)
>>>> 	... 6 more
>>>> 30 Oct 2014 18:43:27,717 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596210.ds.tmp
>>>> 30 Oct 2014 18:43:46,971 INFO  [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)
 - Stopping lifecycle supervisor 10
>>>> 
>>>> 
>>>> The following is my configuration.  The source is just a script running a
curl command and downloading files from S3.
>>>> 
>>>> 
>>>> # Name the components on this agent
>>>> a1.sources = r1
>>>> a1.sinks = k1
>>>> a1.channels = c1
>>>> 
>>>> # Configure the source: STACK_S3
>>>> a1.sources.r1.type = exec
>>>> a1.sources.r1.command = ./conf/FlumeAgent.1.sh 
>>>> a1.sources.r1.channels = c1
>>>> 
>>>> # Use a channel which buffers events in memory
>>>> a1.channels.c1.type = memory
>>>> a1.channels.c1.capacity = 1000000
>>>> a1.channels.c1.transactionCapacity = 100
>>>> 
>>>> # Describe the sink
>>>> a1.sinks.k1.type = hdfs
>>>> a1.sinks.k1.hdfs.path = hdfs://localhost:9000/tmp/dm 
>>>> a1.sinks.k1.hdfs.filePrefix = dm-1-20 
>>>> a1.sinks.k1.hdfs.fileSuffix = .ds
>>>> a1.sinks.k1.hdfs.rollInterval = 0
>>>> a1.sinks.k1.hdfs.rollSize = 0
>>>> a1.sinks.k1.hdfs.rollCount = 0
>>>> a1.sinks.k1.hdfs.fileType = DataStream
>>>> a1.sinks.k1.serializer = TEXT
>>>> a1.sinks.k1.channel = c1
>>>> a1.sinks.k1.hdfs.minBlockReplicas = 1
>>>> a1.sinks.k1.hdfs.batchSize = 10
>>>> 
>>>> 
>>>> I had the HDFS batch size at the default (100) but this issue was still happening.
 Does anyone know what parameters I should change to make this error go away?
>>>> No data is lost but I end up with a 0 byte file.
>>>> 
>>>> Thanks,
>>>> Ed
> 

Mime
View raw message