flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mingjie Lai <mjla...@gmail.com>
Subject Re: regex not matching 0 properly
Date Tue, 04 Oct 2011 18:56:10 GMT
AD.

I noticed the issue before. It's actually not a regex problem, but the 
way flume printing byte array as string at collector side.

You can also reproduce it by:
# bin/flume node_nowatch -1 -s -n dump -c 'dump: tail("/tmp/integer") | 
{ value("bb", "b") => console};

Below is the piece of code (Attributes.java). It takes a bytes array 
whose length is 1, 4, or 8 and print them as int or long. In case of 
length 1, it only prints the byte value.

---------------
       // this is a hack that prints in int, string and double format 
when there
       // are 8 bytes.
       // TODO (jon) this gets grosser and grosser. make a final 
decision on how
       // these attributes are going to be
       if (bytes.length == 8) {

         return "(long)" + readLong(e, attr).toString() + "  (string) '"
             + readString(e, attr) + "'" + " (double)"
             + readDouble(e, attr).toString();
       }

       // this is a similar hack that prints in int and string format 
when there
       // are 4 bytes.
       if (bytes.length == 4) {
         return readInt(e, attr).toString() + " '" + readString(e, attr) 
+ "'";
       }

       if (bytes.length == 1) {
         return "" + (((int) bytes[0]) & 0xff);
       }

---------------

-mingjie

On 10/03/2011 07:40 PM, AD wrote:
> Hello,
>
>   I noticed when trying to use regex to parse an integer from a file, a
> number of 0 was populating the number 48 into the output on the flume
> command line instead.  has anyone come across this before?  Example below:
>
> bash-3.2# cat /tmp/integer
> 0
>
> bash-3.2# cat parse.int <http://parse.int>
> ./flume node_nowatch -1 -s -n dump -c 'dump: tail("/tmp/integer") | {
> regexAll("^(\\d+)","mynum") => console }; '
>
> bash-3.2# ./parse.int <http://parse.int> 2>&1 | grep mynum
>
> 2011-10-03 22:37:49,526 [main] INFO agent.FlumeNode: System property
> sun.java.command=com.cloudera.flume.agent.FlumeNode -1 -s -n dump -c
> dump: tail("/tmp/integer") | { regexAll("^(\\d+)","mynum") => console };
> 2011-10-03 22:37:49,966 [main] INFO agent.FlumeNode: Loading spec from
> command line: 'dump: tail("/tmp/integer") | {
> regexAll("^(\\d+)","mynum") => console }; '
> lilmac.home [INFO Mon Oct 03 22:37:50 EDT 2011] { *mynum : 48* } {
> tailSrcFile : integer } 0
>
> Cheers,
> AD

Mime
View raw message