flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Chavez <pcha...@ntent.com>
Subject UTF-8 data mangled in flight
Date Tue, 09 Dec 2014 19:25:21 GMT

Hoping to get some insight on where to further troubleshoot this issue. The scenario is we
have a web application which accepts URL encoded UTF-8 characters (Cyrillic text in this instance)
and then our web application sends this data to a Flume agent via HTTPSource with the JSONHandler.
This agent then in turn sends the event along via Avro sink to another Flume agent which writes
it to HDFS using the HDFS sink.

We initially noticed the data was no longer valid in the HDFS file and after investigating
have found the following:

-          The initial POST is correct, verified via a network trace and looking at binary
data on the wire.

-          The Avro event sent from the Flume agent is mangled, again verified via network
trace and looking at the binary payload.

We do not explicitly set the content type header on the POST from our application as documentation
states if not set then UTF-8 will be assumed.

Can anyone elaborate on when/why this data is being corrupted?

Paul Chavez

View raw message