phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vladrodio...@gmail.com>
Subject Re: RFE: using native HBase timestamp(version) to optimize date range queries in Phoenix
Date Sun, 06 Apr 2014 18:10:58 GMT
James,

for many applications one of the column of a table can be (and must
be) naturally mapped

to HBase timestamp. What it gives us is the optimization on
StoreScanner where HFiles with timestamps out of range of

a Scan operator will be omitted. Let us say that we have time-series
type of data (EVENTS) and custom compaction, where we create
series of HFiles with continuous non-overlapping timestamp ranges.

CREATE TABLE IF NOT EXISTS ODS.EVENTS (
    METRICID  VARCHAR NOT NULL,
    METRICNAME VARCHAR,
    SERVICENAME VARCHAR NOT NULL,
    ORIGIN VARCHAR NOT NULL,
    APPID VARCHAR,
    IPID VARCHAR,
    NVALUE DOUBLE,
    TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/,
    DATA VARCHAR,
    SVALUE VARCHAR
    CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, IPID,
TIME)
) SALT_BUCKETS=40,
IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE';

Make note on   TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/ - this is the
Hint to Phoenix that the column

TIME must be mapped to HBase timestamp.

The Query:

Select all events of type 'X' for last 7 days

SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME >
NOW() - 7*24*3600000; (this may be not correct SQL syntax of course)

These types of queries will be efficiently optimized if:

1. Phoenix maps  TIME column to HBase timestamp

2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan timerange


Although this :

Properties props = new Properties();
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
Connection conn = DriverManager.connect(myUrl, props);

conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");
conn.commit();

will work in my case- it may not be efficient from performance point
of view because for every INSERT/UPSERT
new Connection object and new Statement is created, beside this we
still need the optimization 2. (see above).


-Vladimir









On Sun, Apr 6, 2014 at 8:45 AM, James Taylor <jamestaylor@apache.org> wrote:

> Hi Vladimir,
> By default, Phoenix queries show you the "latest" data. You can override
> this at connection time and do "flashback" queries, DDL, and DML as
> described here:
> http://phoenix.incubator.apache.org/faq.html#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API
>
> What ever optimizations HBase does, Phoenix will see too, since at the end
> of the day, we're issuing a regular set of scans.
>
> Is this the kind of functionality for which you're looking? Or is it more
> along the lines of a query returning multiple versions of the same row:
> https://issues.apache.org/jira/browse/PHOENIX-590? This isn't supported
> yet.
>
> Or maybe something else?
>
> Thanks,
> James
>
>
> On Sat, Apr 5, 2014 at 10:42 AM, Vladimir Rodionov <vladrodionov@gmail.com
> > wrote:
>
>> I am not sure it was implemented already, or it was not  ...
>> It would be nice to have a way to tell Phoenix which column is naturally
>> mapped to HBase timestamp. This will greatly improve range queries on
>> this column by utilizing internal HBase optimizations (skipping store
>> files, which are completely out of the specified timestamp range).
>>
>>
>> -Vladimir Rodionov
>>
>
>

Mime
View raw message