Hi Dev,

Currently I am planning to write data from HDFS to HBASE. And to read data I am using Phoenix.

Phoenix is converting its primary keys separated by bytes("\x00") and storing it in HBASE as row key.

I want to write a custom UDF in hive to create ROW KEY value of HBASE such that Phoenix will be able to split it into multiple columns.

Following is the custom UDF code I am trying to write;


import org.apache.hadoop.hive.ql.exec.Description;

import org.apache.hadoop.hive.ql.exec.UDF;

import org.apache.hadoop.hive.ql.udf.UDFType;


@UDFType(stateful = true)

@Description(name = "hbasekeygenerator", value = "_FUNC_(existing) - Returns a unique rowkey value for hbase")

public class CIHbaseKeyGenerator extends UDF{

public String evaluate(String [] args){

byte zerobyte = 0x00;

String zbyte = Byte.toString(zerobyte);

StringBuilder sb = new StringBuilder();


for (int i = 0; i < args.length-1;++i) {

sb.append(args[i]);

sb.append(zbyte);


}

sb.append(args[args.length-1]);

return sb.toString();

}

}


Following are my questions, 


1.is it possible to emulate the behavior of phoenix(decoding) using hive custom UDF.


2. If it is possible, what is the better approach for this. It will be great if some one can share some pointers on this.


Thanks,

Chethan.










Collective[i] dramatically improves sales and marketing performance using technology, applications and a revolutionary network designed to provide next generation analytics and decision-support directly to business users. Our goal is to maximize human potential and minimize mistakes. In most cases, the results are astounding. We cannot, however, stop emails from sometimes being sent to the wrong person. If you are not the intended recipient, please notify us by replying to this email's sender and deleting it (and any attachments) permanently from your system. If you are, please respect the confidentiality of this communication's contents.