flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tao Li <litao.bupt...@gmail.com>
Subject ScribeSource string encoding problem when message not UTF-8 encoding
Date Wed, 10 Jun 2015 06:02:15 GMT
Hi,

I am using ScribeSource and I met a string encoding problem.

I found that LogEntry class use method *iprot.readingString()* when read.
But the thrift *TBinaryProtocal*'s implementation of readingString() is to
convert byte array to string with "*UTF-8*" encoding. But my scribe data to
send is "*GBK*" encoding, so thrift use "*UTF-8*" to encode my message
cause a encoding problem.

I don't know if flume scribe source only accept UTF-8 encoding message now?
If we can auto support other message encoding or through configuration, it
would be nice to me.


LogEntry

public void read(org.apache.thrift.protocol.TProtocol iprot) throws
org.apache.thrift.TException {
  org.apache.thrift.protocol.TField field;
  iprot.readStructBegin();
  while (true)
  {
    field = iprot.readFieldBegin();
    if (field.type == org.apache.thrift.protocol.TType.STOP) {
      break;
    }
    switch (field.id) {
      case 1: // CATEGORY
        if (field.type == org.apache.thrift.protocol.TType.STRING) {
          this.category = iprot.readString();
        } else {
          org.apache.thrift.protocol.TProtocolUtil.skip(iprot, field.type);
        }
        break;
      case 2: // MESSAGE
        if (field.type == org.apache.thrift.protocol.TType.STRING) {
          this.message = iprot.readString();
        } else {
          org.apache.thrift.protocol.TProtocolUtil.skip(iprot, field.type);
        }
        break;
      default:
        org.apache.thrift.protocol.TProtocolUtil.skip(iprot, field.type);
    }
    iprot.readFieldEnd();
  }
  iprot.readStructEnd();

  // check for required fields of primitive type, which can't be
checked in the validate method
  validate();
}


TBinaryProtocol

public String readString() throws TException {
  int size = this.readI32();
  if(this.trans_.getBytesRemainingInBuffer() >= size) {
    try {
      String e = new String(this.trans_.getBuffer(),
this.trans_.getBufferPosition(), size, "UTF-8");
      this.trans_.consumeBuffer(size);
      return e;
    } catch (UnsupportedEncodingException var3) {
      throw new TException("JVM DOES NOT SUPPORT UTF-8");
    }
  } else {
    return this.readStringBody(size);
  }
}

Mime
View raw message