flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: .SpoolingFileLineReader warning....
Date Mon, 19 Nov 2012 23:29:57 GMT
My guess is that the file does not have the correct permissions while
being copied.

[noland@localhost cp-test]$ cp -p test-0 test-1 & sleep 0.1; ls -al test*
[1] 18780
-rw-rw-r-- 1 noland noland 1048576000 Nov 19 17:25 test-0
-rw------- 1 noland noland   52334592 Nov 19 17:27 test-1


For large files, it probably makes sense to copy the file in as .file
and then rename it to file.

Brock

On Mon, Nov 19, 2012 at 5:04 PM, Patrick Wendell <pwendell@gmail.com> wrote:
> The spooling source gets a directory listing, then reads each file, then
> renames it to X.COMPLETED. Is it possible some other process deleted that
> file between when Flume listed the directory and when it tried to open the
> file? Otherwise, I'm confused why the file would not be present in the
> listing you give here.
>
>
> On Mon, Nov 19, 2012 at 6:03 PM, Patrick Wendell <pwendell@gmail.com> wrote:
>>
>> Hey Dan,
>>
>> You say that it seems like Flume has already processed the log... why do
>> you think that?
>>
>> When you listed the directory contents I don't see the original or the
>> COMPLETED version of the file that Flume is complaining about:
>>
>> /clickstream.log-2012-11-17-1353163623
>>
>> doesn't appear in the
>>
>> /mnt/flume/clickstream/
>>
>> directory listing anywhere.
>>
>>
>> On Mon, Nov 19, 2012 at 2:33 PM, Dan Young <danoyoung@gmail.com> wrote:
>>>
>>> Hello Brock,
>>>
>>> It seems like we get this message each time that logrotate runs and is in
>>> the process of copying the file to the SpoolingDirectory. It seems that
>>> Flume starts reading the file as soon as it shows up in the
>>> SpoolingDirectory.....  Maybe it's trying to read the file while it's still
>>> being written to????
>>>
>>> 2012-11-19 19:27:27,924 (pool-12-thread-1) [WARN -
>>> org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:328)]
>>> Could not find file:
>>> /mnt/flume/clickstream2/clickstream2.log-2012-11-19-1353353239
>>> java.io.FileNotFoundException:
>>> /mnt/flume/clickstream2/clickstream2.log-2012-11-19-1353353239 (Permission
>>> denied)
>>> at java.io.FileInputStream.open(Native Method)
>>> at java.io.FileInputStream.<init>(FileInputStream.java:138)
>>> at java.io.FileReader.<init>(FileReader.java:72)
>>> at
>>> org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:322)
>>> at
>>> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:172)
>>> at
>>> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> at
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>> at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>> at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> at java.lang.Thread.run(Thread.java:722)
>>>
>>>
>>>
>>>
>>> On Sat, Nov 17, 2012 at 9:15 AM, Brock Noland <brock@cloudera.com> wrote:
>>>>
>>>> Ok, do you mind sharing your log rotate config to see if we can
>>>> reproduce?
>>>>
>>>> --
>>>> Brock Noland
>>>> Sent with Sparrow
>>>>
>>>> On Saturday, November 17, 2012 at 10:01 AM, Dan Young wrote:
>>>>
>>>> Hey Brock,
>>>>
>>>> No I have not modified the conf while the agent was running.
>>>>
>>>> /mnt/flume is local. Note that this is running on an ec2 instance and
>>>> the disk is the ephemeral drive, not EBS.
>>>>
>>>> Regards ,
>>>>
>>>> Dano
>>>>
>>>> On Nov 17, 2012 8:58 AM, "Brock Noland" <brock@cloudera.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I highly doubt it's related to
>>>> (https://issues.apache.org/jira/browse/FLUME-1721) but have you
>>>> modified the configuration file since starting the agent?  If so, can
>>>> you restart the agent and see if the error continues?
>>>>
>>>> Also, is /mnt/flume local disk or NAS?
>>>>
>>>> Brock
>>>>
>>>> On Sat, Nov 17, 2012 at 9:02 AM, Dan Young <danoyoung@gmail.com> wrote:
>>>> > First a bit of context, I'm using logrotate to monitor and copy (cp
>>>> > -p) log
>>>> > files to a flume spooling directory source.  So every hour, logrotate
>>>> > checks
>>>> > for and copies a file from the source to the flume destination. I see
>>>> > the
>>>> > following warning message in the flume logs.
>>>> >
>>>> >
>>>> > 17 Nov 2012 14:47:07,682 WARN  [pool-10-thread-1]
>>>> > (org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile:328)
>>>> > -
>>>> > Could not find file:
>>>> > /mnt/flume/clickstream/clickstream.log-2012-11-17-1353163623
>>>> > java.io.FileNotFoundException:
>>>> > /mnt/flume/clickstream/clickstream.log-2012-11-17-1353163623
>>>> > (Permission
>>>> > denied)
>>>> > at java.io.FileInputStream.open(Native Method)
>>>> > at java.io.FileInputStream.<init>(FileInputStream.java:138)
>>>> > at java.io.FileReader.<init>(FileReader.java:72)
>>>> > at
>>>> >
>>>> > org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:322)
>>>> > at
>>>> >
>>>> > org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:172)
>>>> > at
>>>> >
>>>> > org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
>>>> > at
>>>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> > at
>>>> >
>>>> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>>>> > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>> > at
>>>> >
>>>> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>>> > at
>>>> >
>>>> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>> > at
>>>> >
>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>> > at
>>>> >
>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>> > at java.lang.Thread.run(Thread.java:722)
>>>> >
>>>> >
>>>> > Although it appears that Flume processes the log, I'm curious why I''m
>>>> > seeing this and if I have anything with permissions incorrect?
>>>> >
>>>> >
>>>> >
>>>> > Here's the permissions:
>>>> >
>>>> > source log directory under /var/log:
>>>> > drwxrwxr-x 2 ubuntu    ubuntu   4096 Nov 17 14:47 clickstream
>>>> >
>>>> > source files:
>>>> > -rw-rw-r-- 1 ubuntu ubuntu   9055750 Nov 17 13:29
>>>> > clickstream.log-2012-11-17-1353158953.gz
>>>> > -rw-rw-r-- 1 ubuntu ubuntu  13583565 Nov 17 14:17
>>>> > clickstream.log-2012-11-17-1353161821.gz
>>>> > -rw-rw-r-- 1 ubuntu ubuntu 131296672 Nov 17 14:47
>>>> > clickstream.log-2012-11-17-1353163623
>>>> > -rw-rw-r-- 1 ubuntu ubuntu  65648336 Nov 17 14:52 clickstream.log
>>>> >
>>>> > flume source directory under /mnt/flume:
>>>> > drwxrwxr-x 2 ubuntu ubuntu 4096 Nov 17 14:48 clickstream
>>>> >
>>>> > flume source files:
>>>> > -rw-rw-r-- 1 ubuntu ubuntu 131296672 Nov 17 13:29
>>>> > clickstream.log-2012-11-17-1353158953.COMPLETED
>>>> > -rw-rw-r-- 1 ubuntu ubuntu 196945008 Nov 17 14:17
>>>> > clickstream.log-2012-11-17-1353161821.COMPLETED
>>>> > -rw-rw-r-- 1 ubuntu ubuntu 131296672 Nov 17 14:47
>>>> > clickstream.log-2012-11-17-1353163623.COMPLETED
>>>> >
>>>> > Any insight would be appreciated.
>>>> >
>>>> > Regards,
>>>> >
>>>> > Dan
>>>>
>>>>
>>>>
>>>> --
>>>> Apache MRUnit - Unit testing MapReduce -
>>>> http://incubator.apache.org/mrunit/
>>>>
>>>>
>>>
>>
>



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Mime
View raw message