phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: RFE: using native HBase timestamp(version) to optimize date range queries in Phoenix
Date Sun, 06 Apr 2014 18:40:24 GMT
Thanks, Vladimir - that's a very nice writeup. Would you mind adding it as
a comment on https://issues.apache.org/jira/browse/PHOENIX-590? Also, would
you be interested in contributing this functionality?
Regards,
James


On Sun, Apr 6, 2014 at 11:10 AM, Vladimir Rodionov
<vladrodionov@gmail.com>wrote:

> James,
>
> for many applications one of the column of a table can be (and must be) naturally mapped
>
> to HBase timestamp. What it gives us is the optimization on StoreScanner where HFiles
with timestamps out of range of
>
> a Scan operator will be omitted. Let us say that we have time-series type of data (EVENTS)
and custom compaction, where we create
> series of HFiles with continuous non-overlapping timestamp ranges.
>
> CREATE TABLE IF NOT EXISTS ODS.EVENTS (
>     METRICID  VARCHAR NOT NULL,
>     METRICNAME VARCHAR,
>     SERVICENAME VARCHAR NOT NULL,
>     ORIGIN VARCHAR NOT NULL,
>     APPID VARCHAR,
>     IPID VARCHAR,
>     NVALUE DOUBLE,
>     TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/,
>     DATA VARCHAR,
>     SVALUE VARCHAR
>     CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, IPID,
> TIME)
> ) SALT_BUCKETS=40,
> IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE';
>
> Make note on   TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/ - this is the Hint to Phoenix
that the column
>
> TIME must be mapped to HBase timestamp.
>
> The Query:
>
> Select all events of type 'X' for last 7 days
>
> SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW() - 7*24*3600000;
(this may be not correct SQL syntax of course)
>
> These types of queries will be efficiently optimized if:
>
> 1. Phoenix maps  TIME column to HBase timestamp
>
> 2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan timerange
>
>
> Although this :
>
> Properties props = new Properties();
> props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
> Connection conn = DriverManager.connect(myUrl, props);
>
> conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");
> conn.commit();
>
> will work in my case- it may not be efficient from performance point of view because
for every INSERT/UPSERT
> new Connection object and new Statement is created, beside this we still need the optimization
2. (see above).
>
>
> -Vladimir
>
>
>
>
>
>
>
>
>
> On Sun, Apr 6, 2014 at 8:45 AM, James Taylor <jamestaylor@apache.org>wrote:
>
>> Hi Vladimir,
>> By default, Phoenix queries show you the "latest" data. You can override
>> this at connection time and do "flashback" queries, DDL, and DML as
>> described here:
>> http://phoenix.incubator.apache.org/faq.html#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API
>>
>> What ever optimizations HBase does, Phoenix will see too, since at the
>> end of the day, we're issuing a regular set of scans.
>>
>> Is this the kind of functionality for which you're looking? Or is it more
>> along the lines of a query returning multiple versions of the same row:
>> https://issues.apache.org/jira/browse/PHOENIX-590? This isn't supported
>> yet.
>>
>> Or maybe something else?
>>
>> Thanks,
>> James
>>
>>
>> On Sat, Apr 5, 2014 at 10:42 AM, Vladimir Rodionov <
>> vladrodionov@gmail.com> wrote:
>>
>>> I am not sure it was implemented already, or it was not  ...
>>> It would be nice to have a way to tell Phoenix which column is naturally
>>> mapped to HBase timestamp. This will greatly improve range queries on
>>> this column by utilizing internal HBase optimizations (skipping store
>>> files, which are completely out of the specified timestamp range).
>>>
>>>
>>> -Vladimir Rodionov
>>>
>>
>>
>

Mime
View raw message