If you plan on using the attributes you extract in any of the escaped/formated output paths or strings they will be fine. As those decorators/sinks/source actually convert the bite array. The fact that console doesn't make me think it should be flagged as a bug and should be fixed as to reduce confusion. However I do see it as beneficial for developers to have the raw bit values. So maybe we should also be logging a DEBUG level message for that version of the output. On Tue, Oct 4, 2011 at 7:19 PM, AD wrote: > Thanks, so is this a bug?  My issue is that i am storing the number of > "bytes" served from my apache log, and when its 0, i will end up storing 48 > and skewing the reports. > Any thoughts? > > Thanks for the find. > _AD > > On Tue, Oct 4, 2011 at 2:56 PM, Mingjie Lai wrote: >> >> AD. >> >> I noticed the issue before. It's actually not a regex problem, but the way >> flume printing byte array as string at collector side. >> >> You can also reproduce it by: >> # bin/flume node_nowatch -1 -s -n dump -c 'dump: tail("/tmp/integer") | { >> value("bb", "b") => console}; >> >> Below is the piece of code (Attributes.java). It takes a bytes array whose >> length is 1, 4, or 8 and print them as int or long. In case of length 1, it >> only prints the byte value. >> >> --------------- >>      // this is a hack that prints in int, string and double format when >> there >>      // are 8 bytes. >>      // TODO (jon) this gets grosser and grosser. make a final decision on >> how >>      // these attributes are going to be >>      if (bytes.length == 8) { >> >>        return "(long)" + readLong(e, attr).toString() + "  (string) '" >>            + readString(e, attr) + "'" + " (double)" >>            + readDouble(e, attr).toString(); >>      } >> >>      // this is a similar hack that prints in int and string format when >> there >>      // are 4 bytes. >>      if (bytes.length == 4) { >>        return readInt(e, attr).toString() + " '" + readString(e, attr) + >> "'"; >>      } >> >>      if (bytes.length == 1) { >>        return "" + (((int) bytes[0]) & 0xff); >>      } >> >> --------------- >> >> -mingjie >> >> On 10/03/2011 07:40 PM, AD wrote: >>> >>> Hello, >>> >>>  I noticed when trying to use regex to parse an integer from a file, a >>> number of 0 was populating the number 48 into the output on the flume >>> command line instead.  has anyone come across this before?  Example >>> below: >>> >>> bash-3.2# cat /tmp/integer >>> 0 >>> >>> bash-3.2# cat parse.int >>> ./flume node_nowatch -1 -s -n dump -c 'dump: tail("/tmp/integer") | { >>> regexAll("^(\\d+)","mynum") => console }; ' >>> >>> bash-3.2# ./parse.int 2>&1 | grep mynum >>> >>> 2011-10-03 22:37:49,526 [main] INFO agent.FlumeNode: System property >>> sun.java.command=com.cloudera.flume.agent.FlumeNode -1 -s -n dump -c >>> dump: tail("/tmp/integer") | { regexAll("^(\\d+)","mynum") => console }; >>> 2011-10-03 22:37:49,966 [main] INFO agent.FlumeNode: Loading spec from >>> command line: 'dump: tail("/tmp/integer") | { >>> regexAll("^(\\d+)","mynum") => console }; ' >>> lilmac.home [INFO Mon Oct 03 22:37:50 EDT 2011] { *mynum : 48* } { >>> tailSrcFile : integer } 0 >>> >>> Cheers, >>> AD > > -- Nick Verbeck - NerdyNick ---------------------------------------------------- NerdyNick.com Coloco.ubuntu-rocks.org