flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From NerdyNick <nerdyn...@gmail.com>
Subject Re: regex not matching 0 properly
Date Wed, 05 Oct 2011 03:41:22 GMT
If you plan on using the attributes you extract in any of the
escaped/formated output paths or strings they will be fine. As those
decorators/sinks/source actually convert the bite array. The fact that
console doesn't make me think it should be flagged as a bug and should
be fixed as to reduce confusion. However I do see it as beneficial for
developers to have the raw bit values. So maybe we should also be
logging a DEBUG level message for that version of the output.

On Tue, Oct 4, 2011 at 7:19 PM, AD <straightflush@gmail.com> wrote:
> Thanks, so is this a bug?  My issue is that i am storing the number of
> "bytes" served from my apache log, and when its 0, i will end up storing 48
> and skewing the reports.
> Any thoughts?
>
> Thanks for the find.
> _AD
>
> On Tue, Oct 4, 2011 at 2:56 PM, Mingjie Lai <mjlai09@gmail.com> wrote:
>>
>> AD.
>>
>> I noticed the issue before. It's actually not a regex problem, but the way
>> flume printing byte array as string at collector side.
>>
>> You can also reproduce it by:
>> # bin/flume node_nowatch -1 -s -n dump -c 'dump: tail("/tmp/integer") | {
>> value("bb", "b") => console};
>>
>> Below is the piece of code (Attributes.java). It takes a bytes array whose
>> length is 1, 4, or 8 and print them as int or long. In case of length 1, it
>> only prints the byte value.
>>
>> ---------------
>>      // this is a hack that prints in int, string and double format when
>> there
>>      // are 8 bytes.
>>      // TODO (jon) this gets grosser and grosser. make a final decision on
>> how
>>      // these attributes are going to be
>>      if (bytes.length == 8) {
>>
>>        return "(long)" + readLong(e, attr).toString() + "  (string) '"
>>            + readString(e, attr) + "'" + " (double)"
>>            + readDouble(e, attr).toString();
>>      }
>>
>>      // this is a similar hack that prints in int and string format when
>> there
>>      // are 4 bytes.
>>      if (bytes.length == 4) {
>>        return readInt(e, attr).toString() + " '" + readString(e, attr) +
>> "'";
>>      }
>>
>>      if (bytes.length == 1) {
>>        return "" + (((int) bytes[0]) & 0xff);
>>      }
>>
>> ---------------
>>
>> -mingjie
>>
>> On 10/03/2011 07:40 PM, AD wrote:
>>>
>>> Hello,
>>>
>>>  I noticed when trying to use regex to parse an integer from a file, a
>>> number of 0 was populating the number 48 into the output on the flume
>>> command line instead.  has anyone come across this before?  Example
>>> below:
>>>
>>> bash-3.2# cat /tmp/integer
>>> 0
>>>
>>> bash-3.2# cat parse.int <http://parse.int>
>>> ./flume node_nowatch -1 -s -n dump -c 'dump: tail("/tmp/integer") | {
>>> regexAll("^(\\d+)","mynum") => console }; '
>>>
>>> bash-3.2# ./parse.int <http://parse.int> 2>&1 | grep mynum
>>>
>>> 2011-10-03 22:37:49,526 [main] INFO agent.FlumeNode: System property
>>> sun.java.command=com.cloudera.flume.agent.FlumeNode -1 -s -n dump -c
>>> dump: tail("/tmp/integer") | { regexAll("^(\\d+)","mynum") => console };
>>> 2011-10-03 22:37:49,966 [main] INFO agent.FlumeNode: Loading spec from
>>> command line: 'dump: tail("/tmp/integer") | {
>>> regexAll("^(\\d+)","mynum") => console }; '
>>> lilmac.home [INFO Mon Oct 03 22:37:50 EDT 2011] { *mynum : 48* } {
>>> tailSrcFile : integer } 0
>>>
>>> Cheers,
>>> AD
>
>



-- 
Nick Verbeck - NerdyNick
----------------------------------------------------
NerdyNick.com
Coloco.ubuntu-rocks.org

Mime
View raw message