flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denes Arvay <de...@cloudera.com>
Subject Re: Deadlock between roll timer and PollingRunner threads
Date Wed, 08 Feb 2017 14:28:38 GMT
Hi,

Yes, it seems to be a bug, I also bumped into it.
It seems that the conf file poller detects change in the config file and
tries to stop the components and in the same time HDFS sink tries to roll a
file.
It should be solved by https://issues.apache.org/jira/browse/FLUME-2973

>From your thread dump it seems that rolling is triggered by the
maxOpenFiles limit, is it overridden in your config file? A very low value
could increase the chances of this deadlock.

I'd also recommend to use the --no-reload-conf command line parameter if
the live config reload feature is not needed.

Kind regards,
Denes



On Mon, Feb 6, 2017 at 6:08 PM Chia-Hung Lin <clin4j@googlemail.com> wrote:

> I use flume 1.6.0 (revision 2561a23240a71ba20bf288c7c2cda88f443c2080)
> for testing to move files from local file system to s3. Only a flume
> process is launched (a single jvm process). The problem is each time a
> deadlock occurs between roll timer and PollingRunner threads after
> running a while. A thread dumps is shown as below:
>
> "hdfs-sk-roll-timer-0":
>   waiting to lock monitor 0x00007f46c40b5578 (object
> 0x00000000e002dc90, a java.lang.Object),
>   which is held by "SinkRunner-PollingRunner-DefaultSinkProcessor"
> "SinkRunner-PollingRunner-DefaultSinkProcessor":
>   waiting to lock monitor 0x00007f4684004db8 (object
> 0x00000000e17b64d8, a org.apache.flume.sink.hdfs.BucketWriter),
>   which is held by "hdfs-sk-roll-timer-0"
>
> Java stack information for the threads listed above:
> ===================================================
> "hdfs-sk-roll-timer-0":
>         at
> org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:396)
>         - waiting to lock <0x00000000e002dc90> (a java.lang.Object)
>         at
> org.apache.flume.sink.hdfs.BucketWriter.runCloseAction(BucketWriter.java:447)
>         at
> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:408)
>         - locked <0x00000000e17b64d8> (a
> org.apache.flume.sink.hdfs.BucketWriter)
>         at
> org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:280)
>         at
> org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:274)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> "SinkRunner-PollingRunner-DefaultSinkProcessor":
>         at
> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:304)
>         - waiting to lock <0x00000000e17b64d8> (a
> org.apache.flume.sink.hdfs.BucketWriter)
>         at
> org.apache.flume.sink.hdfs.HDFSEventSink$WriterLinkedHashMap.removeEldestEntry(HDFSEventSink.java:163)
>         at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:431)
>         at java.util.HashMap.put(HashMap.java:505)
>         at
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:407)
>         - locked <0x00000000e002dc90> (a java.lang.Object)
>         at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:745)
>
> Found 1 deadlock.
>
> The setting is below:
>
> a1.sources = src
> a1.sinks = sk
> a1.channels = ch
> ...
> a1.sinks.sk.type = hdfs
> a1.sinks.sk.channel = ch
> ...
> a1.sinks.sk.hdfs.fileType = DataStream
> ...
> a1.sinks.k1.hdfs.rollCount = 0
> a1.sinks.k1.hdfs.rollSize = 0
> a1.sinks.k1.hdfs.rollInterval = 100
> ...
> a1.channels.ch.type = file
> a1.channels.ch.checkpointDir = /path/to/chechkpointDir
> a1.channels.ch.dataDirs = /path/to/dataDir
>
> The command to run flume is
>
> nohup ./bin/flume-ng agent --conf conf/ --conf-file test.conf --name
> a1 ... > /path/to/test.log 2 >&1 &
>
> Is this a bug or something I can tune to fix it?
>
> Thanks
>

Mime
View raw message