flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: .SpoolingFileLineReader warning....
Date Tue, 20 Nov 2012 12:25:23 GMT
Thinking about this more, I think it's probably going to be quite
common for people to cp large files into the spooling directory.
Patrick, what do you think about waiting until the mtime is say 1
second old?

Brock

On Mon, Nov 19, 2012 at 5:29 PM, Brock Noland <brock@cloudera.com> wrote:
> My guess is that the file does not have the correct permissions while
> being copied.
>
> [noland@localhost cp-test]$ cp -p test-0 test-1 & sleep 0.1; ls -al test*
> [1] 18780
> -rw-rw-r-- 1 noland noland 1048576000 Nov 19 17:25 test-0
> -rw------- 1 noland noland   52334592 Nov 19 17:27 test-1
>
>
> For large files, it probably makes sense to copy the file in as .file
> and then rename it to file.
>
> Brock
>
> On Mon, Nov 19, 2012 at 5:04 PM, Patrick Wendell <pwendell@gmail.com> wrote:
>> The spooling source gets a directory listing, then reads each file, then
>> renames it to X.COMPLETED. Is it possible some other process deleted that
>> file between when Flume listed the directory and when it tried to open the
>> file? Otherwise, I'm confused why the file would not be present in the
>> listing you give here.
>>
>>
>> On Mon, Nov 19, 2012 at 6:03 PM, Patrick Wendell <pwendell@gmail.com> wrote:
>>>
>>> Hey Dan,
>>>
>>> You say that it seems like Flume has already processed the log... why do
>>> you think that?
>>>
>>> When you listed the directory contents I don't see the original or the
>>> COMPLETED version of the file that Flume is complaining about:
>>>
>>> /clickstream.log-2012-11-17-1353163623
>>>
>>> doesn't appear in the
>>>
>>> /mnt/flume/clickstream/
>>>
>>> directory listing anywhere.
>>>
>>>
>>> On Mon, Nov 19, 2012 at 2:33 PM, Dan Young <danoyoung@gmail.com> wrote:
>>>>
>>>> Hello Brock,
>>>>
>>>> It seems like we get this message each time that logrotate runs and is in
>>>> the process of copying the file to the SpoolingDirectory. It seems that
>>>> Flume starts reading the file as soon as it shows up in the
>>>> SpoolingDirectory.....  Maybe it's trying to read the file while it's still
>>>> being written to????
>>>>
>>>> 2012-11-19 19:27:27,924 (pool-12-thread-1) [WARN -
>>>> org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:328)]
>>>> Could not find file:
>>>> /mnt/flume/clickstream2/clickstream2.log-2012-11-19-1353353239
>>>> java.io.FileNotFoundException:
>>>> /mnt/flume/clickstream2/clickstream2.log-2012-11-19-1353353239 (Permission
>>>> denied)
>>>> at java.io.FileInputStream.open(Native Method)
>>>> at java.io.FileInputStream.<init>(FileInputStream.java:138)
>>>> at java.io.FileReader.<init>(FileReader.java:72)
>>>> at
>>>> org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:322)
>>>> at
>>>> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:172)
>>>> at
>>>> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
>>>> at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> at
>>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>>>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>> at java.lang.Thread.run(Thread.java:722)
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Nov 17, 2012 at 9:15 AM, Brock Noland <brock@cloudera.com>
wrote:
>>>>>
>>>>> Ok, do you mind sharing your log rotate config to see if we can
>>>>> reproduce?
>>>>>
>>>>> --
>>>>> Brock Noland
>>>>> Sent with Sparrow
>>>>>
>>>>> On Saturday, November 17, 2012 at 10:01 AM, Dan Young wrote:
>>>>>
>>>>> Hey Brock,
>>>>>
>>>>> No I have not modified the conf while the agent was running.
>>>>>
>>>>> /mnt/flume is local. Note that this is running on an ec2 instance and
>>>>> the disk is the ephemeral drive, not EBS.
>>>>>
>>>>> Regards ,
>>>>>
>>>>> Dano
>>>>>
>>>>> On Nov 17, 2012 8:58 AM, "Brock Noland" <brock@cloudera.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I highly doubt it's related to
>>>>> (https://issues.apache.org/jira/browse/FLUME-1721) but have you
>>>>> modified the configuration file since starting the agent?  If so, can
>>>>> you restart the agent and see if the error continues?
>>>>>
>>>>> Also, is /mnt/flume local disk or NAS?
>>>>>
>>>>> Brock
>>>>>
>>>>> On Sat, Nov 17, 2012 at 9:02 AM, Dan Young <danoyoung@gmail.com>
wrote:
>>>>> > First a bit of context, I'm using logrotate to monitor and copy
(cp
>>>>> > -p) log
>>>>> > files to a flume spooling directory source.  So every hour, logrotate
>>>>> > checks
>>>>> > for and copies a file from the source to the flume destination.
I see
>>>>> > the
>>>>> > following warning message in the flume logs.
>>>>> >
>>>>> >
>>>>> > 17 Nov 2012 14:47:07,682 WARN  [pool-10-thread-1]
>>>>> > (org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile:328)
>>>>> > -
>>>>> > Could not find file:
>>>>> > /mnt/flume/clickstream/clickstream.log-2012-11-17-1353163623
>>>>> > java.io.FileNotFoundException:
>>>>> > /mnt/flume/clickstream/clickstream.log-2012-11-17-1353163623
>>>>> > (Permission
>>>>> > denied)
>>>>> > at java.io.FileInputStream.open(Native Method)
>>>>> > at java.io.FileInputStream.<init>(FileInputStream.java:138)
>>>>> > at java.io.FileReader.<init>(FileReader.java:72)
>>>>> > at
>>>>> >
>>>>> > org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:322)
>>>>> > at
>>>>> >
>>>>> > org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:172)
>>>>> > at
>>>>> >
>>>>> > org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
>>>>> > at
>>>>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>> > at
>>>>> >
>>>>> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>>>>> > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>> > at
>>>>> >
>>>>> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>>>> > at
>>>>> >
>>>>> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>>> > at
>>>>> >
>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>> > at
>>>>> >
>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>> > at java.lang.Thread.run(Thread.java:722)
>>>>> >
>>>>> >
>>>>> > Although it appears that Flume processes the log, I'm curious why
I''m
>>>>> > seeing this and if I have anything with permissions incorrect?
>>>>> >
>>>>> >
>>>>> >
>>>>> > Here's the permissions:
>>>>> >
>>>>> > source log directory under /var/log:
>>>>> > drwxrwxr-x 2 ubuntu    ubuntu   4096 Nov 17 14:47 clickstream
>>>>> >
>>>>> > source files:
>>>>> > -rw-rw-r-- 1 ubuntu ubuntu   9055750 Nov 17 13:29
>>>>> > clickstream.log-2012-11-17-1353158953.gz
>>>>> > -rw-rw-r-- 1 ubuntu ubuntu  13583565 Nov 17 14:17
>>>>> > clickstream.log-2012-11-17-1353161821.gz
>>>>> > -rw-rw-r-- 1 ubuntu ubuntu 131296672 Nov 17 14:47
>>>>> > clickstream.log-2012-11-17-1353163623
>>>>> > -rw-rw-r-- 1 ubuntu ubuntu  65648336 Nov 17 14:52 clickstream.log
>>>>> >
>>>>> > flume source directory under /mnt/flume:
>>>>> > drwxrwxr-x 2 ubuntu ubuntu 4096 Nov 17 14:48 clickstream
>>>>> >
>>>>> > flume source files:
>>>>> > -rw-rw-r-- 1 ubuntu ubuntu 131296672 Nov 17 13:29
>>>>> > clickstream.log-2012-11-17-1353158953.COMPLETED
>>>>> > -rw-rw-r-- 1 ubuntu ubuntu 196945008 Nov 17 14:17
>>>>> > clickstream.log-2012-11-17-1353161821.COMPLETED
>>>>> > -rw-rw-r-- 1 ubuntu ubuntu 131296672 Nov 17 14:47
>>>>> > clickstream.log-2012-11-17-1353163623.COMPLETED
>>>>> >
>>>>> > Any insight would be appreciated.
>>>>> >
>>>>> > Regards,
>>>>> >
>>>>> > Dan
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Apache MRUnit - Unit testing MapReduce -
>>>>> http://incubator.apache.org/mrunit/
>>>>>
>>>>>
>>>>
>>>
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Mime
View raw message