flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mingjie Lai <mjla...@gmail.com>
Subject Re: Multi-bytes letters are garbled in the output formatter "avrojson".
Date Tue, 08 Nov 2011 08:31:51 GMT
Interesting issue.

Since you can see the output as expected for raw format, it means the
flume processes the event as byte stream in the right way except the
avrojson encoding of the sink.

I took a look at the code and saw that flume uses avro 1.5.4 to encode
the output as avrojson, and I found there are a few open avro bugs
reported for 1.5.x unicode encoding:


Since the patch hasn't been committed I'm not sure whether it's caused
by the avro issue or not.

Do you have to use the avrojson formatter? How about raw?


On 11/05/2011 06:41 AM, Yoshiki Kajihara wrote:
> Hi,
> We have trouble in multi-bytes letters transfer.
> When we send the plain text, containing multi-bytes code such as "日本語"(meaninig
The Japanese Language),
> to the sink, we cannot see the multi-bytes letters, sent there, as expected,
> in the configuration where the output formatter is "avrojson".
> In the configuration where the output formatter is "raw", we can as expected.
> Why the formatter is set to "raw", we can see those letters, we expect?
> We are running version "Flume 0.9.4-cdh3u2".
> Tell us how to slove those problem.
> Thanks.
> ----
> Yoshiki Kajihara

View raw message