flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yoshiki Kajihara <yosshiki_kajiharara_1...@yahoo.co.jp>
Subject Re: Multi-bytes letters are garbled in the output formatter "avrojson".
Date Thu, 10 Nov 2011 17:59:04 GMT
Mingjie

Thank you for your reply and advice.
We will post the bug to jira as soon as possible.

As to your comment starting with "However I usually think text source is usually ... ", we
agree that the thing happens only in debugging.
However, we need to see the correct multi letters evev in debugging, as expected.

Thanks.

----
Yoshiki Kajihara



On 2011/11/10, at 15:42, Mingjie Lai <mjlai09@gmail.com> wrote:

>> Even if the formatter is set to "raw", multi-bytes code is not shown
> as expected,
>> when we use "text" source to send the data to the collector from
> agent-side.
>> 
>> What cause the thing? Please advise.
> 
> This is no doubt a flume bug at text source. It reads a line by using
> RandomAccessFile.readLine() which doesn't support the full Unicode
> character set.
> 
> Can you file a jira? (However I usually think text source is usually for
> debugging purpose, while not really for any serious application. )
> 
> Thanks,
> Mingjie
> 
> On 11/09/2011 01:16 AM, Yoshiki Kajihara wrote:
>> Mingjie
>> 
>> Thank you for your prompt reply.
>> We understand what you mean it looks like avro bug, not Flume one.
>> 
>> Just one more thing.
>> 
>> Even if the formatter is set to "raw", multi-bytes code is not shown as expected,
>> when we use "text" source to send the data to the collector from agent-side.
>> 
>> What cause the thing? Please advise.
>> 
>> 
>> FYI, it was "tail" source that the date could be shown as expected,
>> as explained to you in the previous email.
>> As for using just "raw", not "avrojson", we are now studying if we can.
>> 
>> ----
>> Yoshiki Kajihara
>> 
>> 
>> --- On Tue, 2011/11/8, Mingjie Lai  wrote:
>> 
>>> Interesting issue.
>>> 
>>> Since you can see the output as expected for raw format, it means the
>>> flume processes the event as byte stream in the right way except the
>>> avrojson encoding of the sink.
>>> 
>>> I took a look at the code and saw that flume uses avro 1.5.4 to encode
>>> the output as avrojson, and I found there are a few open avro bugs
>>> reported for 1.5.x unicode encoding:
>>> 
>>> https://issues.apache.org/jira/browse/AVRO-851
>>> https://issues.apache.org/jira/browse/AVRO-860
>>> 
>>> Since the patch hasn't been committed I'm not sure whether it's caused
>>> by the avro issue or not.
>>> 
>>> Do you have to use the avrojson formatter? How about raw?
>>> 
>>> -mingjie
>>> 
>>> On 11/05/2011 06:41 AM, Yoshiki Kajihara wrote:
>>>> Hi,
>>>> 
>>>> We have trouble in multi-bytes letters transfer.
>>>> 
>>>> When we send the plain text, containing multi-bytes code such as "日本語"(meaninig
The Japanese Language),
>>>> to the sink, we cannot see the multi-bytes letters, sent there, as expected,
>>>> in the configuration where the output formatter is "avrojson".
>>>> 
>>>> In the configuration where the output formatter is "raw", we can as expected.
>>>> 
>>>> Why the formatter is set to "raw", we can see those letters, we expect?
>>>> 
>>>> We are running version "Flume 0.9.4-cdh3u2".
>>>> 
>>>> Tell us how to slove those problem.
>>>> 
>>>> Thanks.
>>>> 
>>>> ----
>>>> Yoshiki Kajihara
>>>> 
>>>> 
>>> 
>> 

Mime
View raw message