phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vladrodio...@gmail.com>
Subject Re: RFE: using native HBase timestamp(version) to optimize date range queries in Phoenix
Date Sun, 06 Apr 2014 18:58:42 GMT
I created
https://issues.apache.org/jira/browse/PHOENIX-914

You can assign it to me, James

-Vladimir


On Sun, Apr 6, 2014 at 11:40 AM, James Taylor <jamestaylor@apache.org>wrote:

> Thanks, Vladimir - that's a very nice writeup. Would you mind adding it as
> a comment on https://issues.apache.org/jira/browse/PHOENIX-590? Also,
> would you be interested in contributing this functionality?
> Regards,
> James
>
>
> On Sun, Apr 6, 2014 at 11:10 AM, Vladimir Rodionov <vladrodionov@gmail.com
> > wrote:
>
>> James,
>>
>> for many applications one of the column of a table can be (and must be) naturally
mapped
>>
>> to HBase timestamp. What it gives us is the optimization on StoreScanner where HFiles
with timestamps out of range of
>>
>> a Scan operator will be omitted. Let us say that we have time-series type of data
(EVENTS) and custom compaction, where we create
>> series of HFiles with continuous non-overlapping timestamp ranges.
>>
>> CREATE TABLE IF NOT EXISTS ODS.EVENTS (
>>     METRICID  VARCHAR NOT NULL,
>>     METRICNAME VARCHAR,
>>     SERVICENAME VARCHAR NOT NULL,
>>     ORIGIN VARCHAR NOT NULL,
>>     APPID VARCHAR,
>>     IPID VARCHAR,
>>     NVALUE DOUBLE,
>>     TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/,
>>     DATA VARCHAR,
>>     SVALUE VARCHAR
>>     CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID,
>> IPID, TIME)
>> ) SALT_BUCKETS=40,
>> IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE';
>>
>> Make note on   TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/ - this is the Hint to Phoenix
that the column
>>
>> TIME must be mapped to HBase timestamp.
>>
>> The Query:
>>
>> Select all events of type 'X' for last 7 days
>>
>> SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW()
- 7*24*3600000; (this may be not correct SQL syntax of course)
>>
>> These types of queries will be efficiently optimized if:
>>
>>
>> 1. Phoenix maps  TIME column to HBase timestamp
>>
>> 2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan timerange
>>
>>
>> Although this :
>>
>> Properties props = new Properties();
>> props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
>> Connection conn = DriverManager.connect(myUrl, props);
>>
>> conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");
>> conn.commit();
>>
>> will work in my case- it may not be efficient from performance point of view because
for every INSERT/UPSERT
>> new Connection object and new Statement is created, beside this we still need the
optimization 2. (see above).
>>
>>
>>
>> -Vladimir
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sun, Apr 6, 2014 at 8:45 AM, James Taylor <jamestaylor@apache.org>wrote:
>>
>>> Hi Vladimir,
>>> By default, Phoenix queries show you the "latest" data. You can override
>>> this at connection time and do "flashback" queries, DDL, and DML as
>>> described here:
>>> http://phoenix.incubator.apache.org/faq.html#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API
>>>
>>> What ever optimizations HBase does, Phoenix will see too, since at the
>>> end of the day, we're issuing a regular set of scans.
>>>
>>> Is this the kind of functionality for which you're looking? Or is it
>>> more along the lines of a query returning multiple versions of the same
>>> row: https://issues.apache.org/jira/browse/PHOENIX-590? This isn't
>>> supported yet.
>>>
>>> Or maybe something else?
>>>
>>> Thanks,
>>> James
>>>
>>>
>>> On Sat, Apr 5, 2014 at 10:42 AM, Vladimir Rodionov <
>>> vladrodionov@gmail.com> wrote:
>>>
>>>> I am not sure it was implemented already, or it was not  ...
>>>> It would be nice to have a way to tell Phoenix which column is
>>>> naturally mapped to HBase timestamp. This will greatly improve range
>>>> queries on
>>>> this column by utilizing internal HBase optimizations (skipping store
>>>> files, which are completely out of the specified timestamp range).
>>>>
>>>>
>>>> -Vladimir Rodionov
>>>>
>>>
>>>
>>
>

Mime
View raw message