phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Problem in finding the largest value of an indexed column
Date Thu, 02 Jul 2015 05:31:47 GMT
Yufan,
What version of Phoenix are you using?
Thanks,
James

On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu <yliu33@kent.edu> wrote:

> When I made more tests, I find that this problem happens after table got
> split.
>
> Here is the DDL I use to create table and index:
> CREATE TABLE IF NOT EXISTS t1 (
> uid BIGINT NOT NULL,
> timestamp BIGINT NOT NULL,
> eventName VARCHAR
> CONSTRAINT my_pk PRIMARY KEY (uid,  timestamp)) COMPRESSION='SNAPPY';
>
> CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName)
>
> Attach is the sample data I used for test. It has about 4000 rows, when
> the timestamp_index table has one region, the query returns correct result:
> 1433334048443, but when I manually split it into 4 regions (use hbase
> tool), it returns 1433333024961.
>
> Let know if you find anything. Thanks!
>
>
> 2015-07-01 11:27 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>
>> If you could put a complete test (including your DDL and upsert of data),
>> that would be much appreciated.
>> Thanks,
>> James
>>
>> On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu <yliu33@kent.edu> wrote:
>>
>>> I have tried to use query "SELECT timestamp FROM t1 ORDER BY timestamp
>>> DESC NULLS LAST LIMIT 1". But it still returns the same unexpected result.
>>> There seems to be some internal problems related.
>>>
>>> 2015-06-30 18:03 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>>>
>>>> Yes, reverse scan will be leveraged when possible. Make you use NULLS
>>>> LAST in your ORDER BY as rows are ordered with nulls first.
>>>>
>>>> On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu <yliu33@kent.edu> wrote:
>>>>
>>>>> I used the HBase reverse scan to find the last row on the index table.
>>>>> It returned the expected result. I would like to know is Phoenix's "ORDER
>>>>> BY"
>>>>> and "DESC" implemented based on HBase reverse scan?
>>>>>
>>>>> 2015-06-26 17:25 GMT-07:00 Yufan Liu <yliu33@kent.edu>:
>>>>>
>>>>>> Thank you anyway, Michael!
>>>>>>
>>>>>> 2015-06-26 17:21 GMT-07:00 Michael McAllister <
>>>>>> mmcallister@homeaway.com>:
>>>>>>
>>>>>>>  OK, I’m a Phoenix newbie, so that was the extent of the advice
I
>>>>>>> could give you. There are people here far more experienced than
I am who
>>>>>>> should be able to give you deeper advice. Have a great weekend!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From:* Yufan Liu [mailto:yliu33@kent.edu]
>>>>>>> *Sent:* Friday, June 26, 2015 7:19 PM
>>>>>>> *To:* user@phoenix.apache.org
>>>>>>> *Subject:* Re: Problem in finding the largest value of an indexed
>>>>>>> column
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Michael,
>>>>>>>
>>>>>>> Thanks for the advice, for the first one, it's "CLIENT 67-CHUNK
>>>>>>> PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER
BY FIRST KEY
>>>>>>> ONLY; SERVER AGGREGATE INTO SINGLE ROW" which is as expected.
For the
>>>>>>> second one, it's "CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN
OVER
>>>>>>> TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW
LIMIT" which
>>>>>>> looks correct, but still returns the unexpected result.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2015-06-26 16:59 GMT-07:00 Michael McAllister <
>>>>>>> mmcallister@homeaway.com>:
>>>>>>>
>>>>>>> Yufan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Have you tried using the EXPLAIN command to see what plan is
being
>>>>>>> used to access the data?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Michael McAllister
>>>>>>>
>>>>>>> Staff Data Warehouse Engineer | Decision Systems
>>>>>>>
>>>>>>> mmcallister@HomeAway.com | C: 512.423.7447 | skype:
>>>>>>> michael.mcallister.ha <zimmkate@hotmail.com> | webex:
>>>>>>> https://h.a/mikewebex
>>>>>>>
>>>>>>> [image: Description: Description: cid:3410354473_30269081]
>>>>>>>
>>>>>>> This electronic communication (including any attachment) is
>>>>>>> confidential.  If you are not an intended recipient of this communication,
>>>>>>> please be advised that any disclosure, dissemination, distribution,
copying
>>>>>>> or other use of this communication or any attachment is strictly
>>>>>>> prohibited.  If you have received this communication in error,
please
>>>>>>> notify the sender immediately by reply e-mail and promptly destroy
all
>>>>>>> electronic and printed copies of this communication and any attachment.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From:* Yufan Liu [mailto:yliu33@kent.edu]
>>>>>>> *Sent:* Friday, June 26, 2015 6:31 PM
>>>>>>> *To:* user@phoenix.apache.org
>>>>>>> *Subject:* Problem in finding the largest value of an indexed
column
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We have created a table (eg, t1), and a global index of one numeric
>>>>>>> column of t1 (eg, timestamp). Now we want to find the largest
value of
>>>>>>> timestamp, we have tried two approaches:
>>>>>>>
>>>>>>>
>>>>>>> 1. select max(timestamp) from t1; This query takes forever to
>>>>>>> finish, so I think it maybe doing a full table scan/comparison
.
>>>>>>>
>>>>>>> 2. select timestamp from t1 order by timestamp desc limit 1;
This
>>>>>>> query finished fast, but the result it returns is far from the
largest
>>>>>>> value. It seems it just return the largest value for a certain
range of
>>>>>>> data.
>>>>>>>
>>>>>>> Did anyone else encounter this issue/have any suggestion?
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yufan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> best,
>>>>>>> Yufan
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> best,
>>>>>> Yufan
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> best,
>>>>> Yufan
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> best,
>>> Yufan
>>>
>>>
>>
>
>
> --
> best,
> Yufan
>
>

Mime
View raw message