flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marina <ppi...@yahoo.com>
Subject Re: MalformedInputException processing logs from Varnish server
Date Mon, 09 Mar 2015 20:09:14 GMT
Hi, Jeff, 
Thank you for your quick response!I could not easily find the exact log entry that had the
issue - as all I had were 30M input log files :).
After further debugging, I figured out what the issue was . Here is what happened.
For production, we use Exec sink with 'tail -f '. For my local testing I use a spooling dir.
The issue happened when I was using the spooldir sink, when a log file had non-UTF-8 characters.

However, the exception that I've posted came not from processing the log file! The flow was
as following:1. Flume is started with spooldir sink2. a log file with non-utf-8 chars is moved
into the spooldir3. Flume starts processing, encounters a "bad" character and stops (no errors
or anything)4. I kill Flume manually and restart - without cleaning out its .flumespool dir

5. FLume starts up and now chokes up processing its own .flumespool dir and the left-over
file in there! - this is where the MalformedInputException came from 
When I processed the same file via Exec sink, and 'tail -n 10000 ..' command - it was processed
successfully - which told me the issue is specific to the spooled sink.
The solution was to add this parameter to the spooldir sink:a1.sources.r1.inputCharset = ISO8859-1

      From: Jeff Lord <jlord@cloudera.com>
 To: "user@flume.apache.org" <user@flume.apache.org>; Marina <ppine7@yahoo.com>

 Sent: Monday, March 9, 2015 11:17 AM
 Subject: Re: MalformedInputException processing logs from Varnish server
Hi Marina,
Do you have a sample of the characters/data which you believe to be causing this?Can you just
confirm you are using apache version of flume or a specific distro?Also in your message you
mention that you are using tail -f which would be the exec source but the stack trace looks
like you are actually using the spooldir source.

On Mon, Mar 9, 2015 at 10:26 AM, Marina <ppine7@yahoo.com> wrote:

Hi,I have configured Flume to "tail -f" logs from my Varnish server - pretty much standard
Apache HTTP logs.However, sometimes Flume chokes on some special characters and dies - stops
processing new log entries.
See below for a stack trace.
It seems like this exact issue was reported as Flume bug in 1.4.x version:https://issues.apache.org/jira/browse/FLUME-2052and
it was marked as resolved in 1.5.0 version.The version I am using is Flume 1.5.2 - and I am
still seeing this issue...
Could somebody confirm/deny if what I am seeing is the same issue and should have been fixed?
OR is this completely different?
Thank you!Marina

06 Mar 2015 18:16:57,820 ERROR [pool-3-thread-1] (org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run:256) 
- FATAL: Spool Directory source r1: { spoolDir: /data1/varnish-logs-active }: Uncaught exception
in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.

java.nio.charset.MalformedInputException: Input length = 1

at java.nio.charset.CoderResult.throwException(CoderResult.java:260)

at org.apache.flume.serialization.ResettableFileInputStream.readChar(ResettableFileInputStream.java:195)

at org.apache.flume.serialization.LineDeserializer.readLine(LineDeserializer.java:134)

at org.apache.flume.serialization.LineDeserializer.readEvent(LineDeserializer.java:72)

at org.apache.flume.serialization.LineDeserializer.readEvents(LineDeserializer.java:91)

at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:238)

at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:227)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)

at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)

View raw message