phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From universal localhost <universal.localh...@gmail.com>
Subject Suggestions to define schema for storing aggregates
Date Wed, 26 Mar 2014 07:18:38 GMT
Hey All,

I am using Phoenix Jdbc Api to insert raw data, do OLAP processing (mainly
groupby aggregations) and insert the aggregated data back in HBase.

Sample phoenix table:
CREATE TABLE ABCD (TXID INTEGER NOT NULL, INSTANCEID INTEGER NOT NULL,
DATELOGGED TIMESTAMP NOT NULL, ORGNAME VARCHAR, MERCHANT_NAME VARCHAR,
COUNTRY VARCHAR, AMOUNT FLOAT, CODE INTEGER, CONTYPE VARCHAR,
IP_ROUTINGTYPE VARCHAR, SCORE INTEGER,RULE VARCHAR, STATUS INTEGER, ACTION
VARCHAR CONSTRAINT pk PRIMARY KEY (DATELOGGED, ORGNAME, TXID, INSTANCEID ))

I want to do aggregations on many dimensions over the data in this table
and store the output in a Phoenix table.

Sample aggregation query:
1.  Select TRUNC(DATELOGGED, 'hour'), ORGNAME, 'fraudrep1' as
reportid, count(*) from ABCD GROUP BY TRUNC(DATELOGGED, 'hour'), ORGNAME,
STATUS WHERE DATELOGGED > X && <Y

2. Select TRUNC(DATELOGGED, 'minute'), ORGNAME, 'fraudrep2' ' as reportid,
STATUS, count(*) from ABCD GROUP BY TRUNC(DATELOGGED, 'minute'), ORGNAME,
STATUS  WHERE DATELOGGED > X && <Y

3. Select TRUNC(DATELOGGED, 'minute'), ORGNAME,  'fraudrep3'' as reportid,
CODE, count(*) from ABCD GROUP BY TRUNC(DATELOGGED, 'minute'), ORGNAME,
CODE WHERE DATELOGGED > X && <Y;

4.  Select TRUNC(DATELOGGED, 'hour'), ORGNAME, 'fraudrep4'' as reportid,
STATUS, CODE, count(*) from ABCD GROUP BY TRUNC(DATELOGGED, 'hour'),
ORGNAME, STATUS, CODE  WHERE DATELOGGED > X && <Y;

5. Select TRUNC(DATELOGGED, 'hour'), ORGNAME, 'fraudrep5'' as reportid,
CONTYPE, count(*) from ABCD GROUP BY TRUNC(DATELOGGED, 'hour'), ORGNAME,
CONTYPE  WHERE DATELOGGED > X && <Y;
There can be many queries with 2-4 dimensions clubbed.

*How should I design the schema of the table to store the aggregated data ?*
FYI, When these processed data is queried then Datelogged, reportId,
timedimension(hour mor minute or day) are known.


--Unilocal

Mime
View raw message