flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: .SpoolingFileLineReader warning....
Date Tue, 20 Nov 2012 16:21:43 GMT
Yeah I think that makes sense, I have created a JIRA for this

https://issues.apache.org/jira/browse/FLUME-1733

Brock

On Tue, Nov 20, 2012 at 9:02 AM, Dan Young <danoyoung@gmail.com> wrote:
> Hey Brock,
>
> I can do some more testing on my side with smaller files as well as doing a
> mv vs a cp . I do believe that a slight delay would be helpful since people
> will be moving/copying large files around.
>
> Regards ,
>
> Dano
>
> On Nov 20, 2012 5:26 AM, "Brock Noland" <brock@cloudera.com> wrote:
>>
>> Thinking about this more, I think it's probably going to be quite
>> common for people to cp large files into the spooling directory.
>> Patrick, what do you think about waiting until the mtime is say 1
>> second old?
>>
>> Brock
>>
>> On Mon, Nov 19, 2012 at 5:29 PM, Brock Noland <brock@cloudera.com> wrote:
>> > My guess is that the file does not have the correct permissions while
>> > being copied.
>> >
>> > [noland@localhost cp-test]$ cp -p test-0 test-1 & sleep 0.1; ls -al
>> > test*
>> > [1] 18780
>> > -rw-rw-r-- 1 noland noland 1048576000 Nov 19 17:25 test-0
>> > -rw------- 1 noland noland   52334592 Nov 19 17:27 test-1
>> >
>> >
>> > For large files, it probably makes sense to copy the file in as .file
>> > and then rename it to file.
>> >
>> > Brock
>> >
>> > On Mon, Nov 19, 2012 at 5:04 PM, Patrick Wendell <pwendell@gmail.com>
>> > wrote:
>> >> The spooling source gets a directory listing, then reads each file,
>> >> then
>> >> renames it to X.COMPLETED. Is it possible some other process deleted
>> >> that
>> >> file between when Flume listed the directory and when it tried to open
>> >> the
>> >> file? Otherwise, I'm confused why the file would not be present in the
>> >> listing you give here.
>> >>
>> >>
>> >> On Mon, Nov 19, 2012 at 6:03 PM, Patrick Wendell <pwendell@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hey Dan,
>> >>>
>> >>> You say that it seems like Flume has already processed the log... why
>> >>> do
>> >>> you think that?
>> >>>
>> >>> When you listed the directory contents I don't see the original or the
>> >>> COMPLETED version of the file that Flume is complaining about:
>> >>>
>> >>> /clickstream.log-2012-11-17-1353163623
>> >>>
>> >>> doesn't appear in the
>> >>>
>> >>> /mnt/flume/clickstream/
>> >>>
>> >>> directory listing anywhere.
>> >>>
>> >>>
>> >>> On Mon, Nov 19, 2012 at 2:33 PM, Dan Young <danoyoung@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Hello Brock,
>> >>>>
>> >>>> It seems like we get this message each time that logrotate runs
and
>> >>>> is in
>> >>>> the process of copying the file to the SpoolingDirectory. It seems
>> >>>> that
>> >>>> Flume starts reading the file as soon as it shows up in the
>> >>>> SpoolingDirectory.....  Maybe it's trying to read the file while
it's
>> >>>> still
>> >>>> being written to????
>> >>>>
>> >>>> 2012-11-19 19:27:27,924 (pool-12-thread-1) [WARN -
>> >>>>
>> >>>> org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:328)]
>> >>>> Could not find file:
>> >>>> /mnt/flume/clickstream2/clickstream2.log-2012-11-19-1353353239
>> >>>> java.io.FileNotFoundException:
>> >>>> /mnt/flume/clickstream2/clickstream2.log-2012-11-19-1353353239
>> >>>> (Permission
>> >>>> denied)
>> >>>> at java.io.FileInputStream.open(Native Method)
>> >>>> at java.io.FileInputStream.<init>(FileInputStream.java:138)
>> >>>> at java.io.FileReader.<init>(FileReader.java:72)
>> >>>> at
>> >>>>
>> >>>> org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:322)
>> >>>> at
>> >>>>
>> >>>> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:172)
>> >>>> at
>> >>>>
>> >>>> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
>> >>>> at
>> >>>>
>> >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >>>> at
>> >>>>
>> >>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>> >>>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>> >>>> at
>> >>>>
>> >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>> >>>> at
>> >>>>
>> >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>> >>>> at
>> >>>>
>> >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >>>> at
>> >>>>
>> >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >>>> at java.lang.Thread.run(Thread.java:722)
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Sat, Nov 17, 2012 at 9:15 AM, Brock Noland <brock@cloudera.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> Ok, do you mind sharing your log rotate config to see if we
can
>> >>>>> reproduce?
>> >>>>>
>> >>>>> --
>> >>>>> Brock Noland
>> >>>>> Sent with Sparrow
>> >>>>>
>> >>>>> On Saturday, November 17, 2012 at 10:01 AM, Dan Young wrote:
>> >>>>>
>> >>>>> Hey Brock,
>> >>>>>
>> >>>>> No I have not modified the conf while the agent was running.
>> >>>>>
>> >>>>> /mnt/flume is local. Note that this is running on an ec2 instance
>> >>>>> and
>> >>>>> the disk is the ephemeral drive, not EBS.
>> >>>>>
>> >>>>> Regards ,
>> >>>>>
>> >>>>> Dano
>> >>>>>
>> >>>>> On Nov 17, 2012 8:58 AM, "Brock Noland" <brock@cloudera.com>
wrote:
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> I highly doubt it's related to
>> >>>>> (https://issues.apache.org/jira/browse/FLUME-1721) but have
you
>> >>>>> modified the configuration file since starting the agent?  If
so,
>> >>>>> can
>> >>>>> you restart the agent and see if the error continues?
>> >>>>>
>> >>>>> Also, is /mnt/flume local disk or NAS?
>> >>>>>
>> >>>>> Brock
>> >>>>>
>> >>>>> On Sat, Nov 17, 2012 at 9:02 AM, Dan Young <danoyoung@gmail.com>
>> >>>>> wrote:
>> >>>>> > First a bit of context, I'm using logrotate to monitor
and copy
>> >>>>> > (cp
>> >>>>> > -p) log
>> >>>>> > files to a flume spooling directory source.  So every hour,
>> >>>>> > logrotate
>> >>>>> > checks
>> >>>>> > for and copies a file from the source to the flume destination.
I
>> >>>>> > see
>> >>>>> > the
>> >>>>> > following warning message in the flume logs.
>> >>>>> >
>> >>>>> >
>> >>>>> > 17 Nov 2012 14:47:07,682 WARN  [pool-10-thread-1]
>> >>>>> >
>> >>>>> > (org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile:328)
>> >>>>> > -
>> >>>>> > Could not find file:
>> >>>>> > /mnt/flume/clickstream/clickstream.log-2012-11-17-1353163623
>> >>>>> > java.io.FileNotFoundException:
>> >>>>> > /mnt/flume/clickstream/clickstream.log-2012-11-17-1353163623
>> >>>>> > (Permission
>> >>>>> > denied)
>> >>>>> > at java.io.FileInputStream.open(Native Method)
>> >>>>> > at java.io.FileInputStream.<init>(FileInputStream.java:138)
>> >>>>> > at java.io.FileReader.<init>(FileReader.java:72)
>> >>>>> > at
>> >>>>> >
>> >>>>> >
>> >>>>> > org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:322)
>> >>>>> > at
>> >>>>> >
>> >>>>> >
>> >>>>> > org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:172)
>> >>>>> > at
>> >>>>> >
>> >>>>> >
>> >>>>> > org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
>> >>>>> > at
>> >>>>> >
>> >>>>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >>>>> > at
>> >>>>> >
>> >>>>> >
>> >>>>> > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>> >>>>> > at
>> >>>>> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>> >>>>> > at
>> >>>>> >
>> >>>>> >
>> >>>>> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>> >>>>> > at
>> >>>>> >
>> >>>>> >
>> >>>>> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>> >>>>> > at
>> >>>>> >
>> >>>>> >
>> >>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >>>>> > at
>> >>>>> >
>> >>>>> >
>> >>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >>>>> > at java.lang.Thread.run(Thread.java:722)
>> >>>>> >
>> >>>>> >
>> >>>>> > Although it appears that Flume processes the log, I'm curious
why
>> >>>>> > I''m
>> >>>>> > seeing this and if I have anything with permissions incorrect?
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> > Here's the permissions:
>> >>>>> >
>> >>>>> > source log directory under /var/log:
>> >>>>> > drwxrwxr-x 2 ubuntu    ubuntu   4096 Nov 17 14:47 clickstream
>> >>>>> >
>> >>>>> > source files:
>> >>>>> > -rw-rw-r-- 1 ubuntu ubuntu   9055750 Nov 17 13:29
>> >>>>> > clickstream.log-2012-11-17-1353158953.gz
>> >>>>> > -rw-rw-r-- 1 ubuntu ubuntu  13583565 Nov 17 14:17
>> >>>>> > clickstream.log-2012-11-17-1353161821.gz
>> >>>>> > -rw-rw-r-- 1 ubuntu ubuntu 131296672 Nov 17 14:47
>> >>>>> > clickstream.log-2012-11-17-1353163623
>> >>>>> > -rw-rw-r-- 1 ubuntu ubuntu  65648336 Nov 17 14:52 clickstream.log
>> >>>>> >
>> >>>>> > flume source directory under /mnt/flume:
>> >>>>> > drwxrwxr-x 2 ubuntu ubuntu 4096 Nov 17 14:48 clickstream
>> >>>>> >
>> >>>>> > flume source files:
>> >>>>> > -rw-rw-r-- 1 ubuntu ubuntu 131296672 Nov 17 13:29
>> >>>>> > clickstream.log-2012-11-17-1353158953.COMPLETED
>> >>>>> > -rw-rw-r-- 1 ubuntu ubuntu 196945008 Nov 17 14:17
>> >>>>> > clickstream.log-2012-11-17-1353161821.COMPLETED
>> >>>>> > -rw-rw-r-- 1 ubuntu ubuntu 131296672 Nov 17 14:47
>> >>>>> > clickstream.log-2012-11-17-1353163623.COMPLETED
>> >>>>> >
>> >>>>> > Any insight would be appreciated.
>> >>>>> >
>> >>>>> > Regards,
>> >>>>> >
>> >>>>> > Dan
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Apache MRUnit - Unit testing MapReduce -
>> >>>>> http://incubator.apache.org/mrunit/
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Apache MRUnit - Unit testing MapReduce -
>> > http://incubator.apache.org/mrunit/
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce -
>> http://incubator.apache.org/mrunit/



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Mime
View raw message