flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mingjie Lai <mjla...@gmail.com>
Subject Re: Multi-bytes letters are garbled in the output formatter "avrojson".
Date Thu, 10 Nov 2011 06:42:14 GMT
> Even if the formatter is set to "raw", multi-bytes code is not shown
as expected,
> when we use "text" source to send the data to the collector from
agent-side.
>
> What cause the thing? Please advise.

This is no doubt a flume bug at text source. It reads a line by using
RandomAccessFile.readLine() which doesn't support the full Unicode
character set.

Can you file a jira? (However I usually think text source is usually for
debugging purpose, while not really for any serious application. )

Thanks,
Mingjie

On 11/09/2011 01:16 AM, Yoshiki Kajihara wrote:
> Mingjie
> 
> Thank you for your prompt reply.
> We understand what you mean it looks like avro bug, not Flume one.
> 
> Just one more thing.
> 
> Even if the formatter is set to "raw", multi-bytes code is not shown as expected,
> when we use "text" source to send the data to the collector from agent-side.
> 
> What cause the thing? Please advise.
> 
> 
> FYI, it was "tail" source that the date could be shown as expected,
> as explained to you in the previous email.
> As for using just "raw", not "avrojson", we are now studying if we can.
> 
> ----
> Yoshiki Kajihara
> 
> 
> --- On Tue, 2011/11/8, Mingjie Lai  wrote:
> 
>> Interesting issue.
>>
>> Since you can see the output as expected for raw format, it means the
>> flume processes the event as byte stream in the right way except the
>> avrojson encoding of the sink.
>>
>> I took a look at the code and saw that flume uses avro 1.5.4 to encode
>> the output as avrojson, and I found there are a few open avro bugs
>> reported for 1.5.x unicode encoding:
>>
>> https://issues.apache.org/jira/browse/AVRO-851
>> https://issues.apache.org/jira/browse/AVRO-860
>>
>> Since the patch hasn't been committed I'm not sure whether it's caused
>> by the avro issue or not.
>>
>> Do you have to use the avrojson formatter? How about raw?
>>
>> -mingjie
>>
>> On 11/05/2011 06:41 AM, Yoshiki Kajihara wrote:
>>> Hi,
>>>
>>> We have trouble in multi-bytes letters transfer.
>>>
>>> When we send the plain text, containing multi-bytes code such as "日本語"(meaninig
The Japanese Language),
>>> to the sink, we cannot see the multi-bytes letters, sent there, as expected,
>>> in the configuration where the output formatter is "avrojson".
>>>
>>> In the configuration where the output formatter is "raw", we can as expected.
>>>
>>> Why the formatter is set to "raw", we can see those letters, we expect?
>>>
>>> We are running version "Flume 0.9.4-cdh3u2".
>>>
>>> Tell us how to slove those problem.
>>>
>>> Thanks.
>>>
>>> ----
>>> Yoshiki Kajihara
>>>
>>>
>>
> 

Mime
View raw message