flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Chavez <pcha...@ntent.com>
Subject RE: UTF-8 data mangled in flight
Date Tue, 09 Dec 2014 22:23:21 GMT
Thank you, Jeff. I tried adding that property to the Java command line to start Flume but unfortunately
it didn't change the observed behavior.


From: j.guilmard@accenture.com [mailto:j.guilmard@accenture.com]
Sent: Tuesday, December 09, 2014 1:03 PM
To: user@flume.apache.org
Subject: RE: UTF-8 data mangled in flight

Hi Paul,

I haven't used special characters in Flume, but I had previous issues in Java with Characters
encoding, and they were solved by specifying the JVM default Character encoding, with:
"-Dfile.encoding=UTF-8" (here for UTF-8)

Might be worth trying to add that in the Flume command line options? Or maybe on the front
application ?


From: Paul Chavez [mailto:pchavez@ntent.com]
Sent: mardi 9 d├ęcembre 2014 20:25
To: user@flume.apache.org<mailto:user@flume.apache.org>
Subject: UTF-8 data mangled in flight


Hoping to get some insight on where to further troubleshoot this issue. The scenario is we
have a web application which accepts URL encoded UTF-8 characters (Cyrillic text in this instance)
and then our web application sends this data to a Flume agent via HTTPSource with the JSONHandler.
This agent then in turn sends the event along via Avro sink to another Flume agent which writes
it to HDFS using the HDFS sink.

We initially noticed the data was no longer valid in the HDFS file and after investigating
have found the following:

-          The initial POST is correct, verified via a network trace and looking at binary
data on the wire.

-          The Avro event sent from the Flume agent is mangled, again verified via network
trace and looking at the binary payload.

We do not explicitly set the content type header on the POST from our application as documentation
states if not set then UTF-8 will be assumed.

Can anyone elaborate on when/why this data is being corrupted?

Paul Chavez


This message is for the designated recipient only and may contain privileged, proprietary,
or otherwise confidential information. If you have received it in error, please notify the
sender immediately and delete the original. Any other use of the e-mail by you is prohibited.
Where allowed by local law, electronic communications with Accenture and its affiliates, including
e-mail and instant messaging (including content), may be scanned by our systems for the purposes
of information security and assessment of internal compliance with Accenture policy.


View raw message