phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yufan Liu <yli...@kent.edu>
Subject Re: Problem in finding the largest value of an indexed column
Date Thu, 02 Jul 2015 16:58:59 GMT
Sure, let me have a try

2015-07-02 9:46 GMT-07:00 James Taylor <jamestaylor@apache.org>:

> Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch. Would
> you mind confirming that this fixes the issue you're seeing?
>
>     James
>
> On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu <yliu33@kent.edu> wrote:
>
>> I'm using 4.4.0-HBase-0.98
>>
>> 2015-07-01 22:31 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>>
>>> Yufan,
>>> What version of Phoenix are you using?
>>> Thanks,
>>> James
>>>
>>> On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu <yliu33@kent.edu> wrote:
>>>
>>>> When I made more tests, I find that this problem happens after table
>>>> got split.
>>>>
>>>> Here is the DDL I use to create table and index:
>>>> CREATE TABLE IF NOT EXISTS t1 (
>>>> uid BIGINT NOT NULL,
>>>> timestamp BIGINT NOT NULL,
>>>> eventName VARCHAR
>>>> CONSTRAINT my_pk PRIMARY KEY (uid,  timestamp)) COMPRESSION='SNAPPY';
>>>>
>>>> CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName)
>>>>
>>>> Attach is the sample data I used for test. It has about 4000 rows, when
>>>> the timestamp_index table has one region, the query returns correct result:
>>>> 1433334048443, but when I manually split it into 4 regions (use hbase
>>>> tool), it returns 1433333024961.
>>>>
>>>> Let know if you find anything. Thanks!
>>>>
>>>>
>>>> 2015-07-01 11:27 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>>>>
>>>>> If you could put a complete test (including your DDL and upsert of
>>>>> data), that would be much appreciated.
>>>>> Thanks,
>>>>> James
>>>>>
>>>>> On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu <yliu33@kent.edu> wrote:
>>>>>
>>>>>> I have tried to use query "SELECT timestamp FROM t1 ORDER BY
>>>>>> timestamp DESC NULLS LAST LIMIT 1". But it still returns the same
>>>>>> unexpected result. There seems to be some internal problems related.
>>>>>>
>>>>>> 2015-06-30 18:03 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>>>>>>
>>>>>>> Yes, reverse scan will be leveraged when possible. Make you use
>>>>>>> NULLS LAST in your ORDER BY as rows are ordered with nulls first.
>>>>>>>
>>>>>>> On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu <yliu33@kent.edu>
wrote:
>>>>>>>
>>>>>>>> I used the HBase reverse scan to find the last row on the
index
>>>>>>>> table. It returned the expected result. I would like to know
is Phoenix's
>>>>>>>> "ORDER BY"
>>>>>>>> and "DESC" implemented based on HBase reverse scan?
>>>>>>>>
>>>>>>>> 2015-06-26 17:25 GMT-07:00 Yufan Liu <yliu33@kent.edu>:
>>>>>>>>
>>>>>>>>> Thank you anyway, Michael!
>>>>>>>>>
>>>>>>>>> 2015-06-26 17:21 GMT-07:00 Michael McAllister <
>>>>>>>>> mmcallister@homeaway.com>:
>>>>>>>>>
>>>>>>>>>>  OK, I’m a Phoenix newbie, so that was the extent
of the advice
>>>>>>>>>> I could give you. There are people here far more
experienced than I am who
>>>>>>>>>> should be able to give you deeper advice. Have a
great weekend!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Mike
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *From:* Yufan Liu [mailto:yliu33@kent.edu]
>>>>>>>>>> *Sent:* Friday, June 26, 2015 7:19 PM
>>>>>>>>>> *To:* user@phoenix.apache.org
>>>>>>>>>> *Subject:* Re: Problem in finding the largest value
of an
>>>>>>>>>> indexed column
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Michael,
>>>>>>>>>>
>>>>>>>>>> Thanks for the advice, for the first one, it's "CLIENT
67-CHUNK
>>>>>>>>>> PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER
FILTER BY FIRST KEY
>>>>>>>>>> ONLY; SERVER AGGREGATE INTO SINGLE ROW" which is
as expected. For the
>>>>>>>>>> second one, it's "CLIENT 67-CHUNK SERIAL 1-WAY REVERSE
FULL SCAN OVER
>>>>>>>>>> TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY;
SERVER 1 ROW LIMIT" which
>>>>>>>>>> looks correct, but still returns the unexpected result.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2015-06-26 16:59 GMT-07:00 Michael McAllister <
>>>>>>>>>> mmcallister@homeaway.com>:
>>>>>>>>>>
>>>>>>>>>> Yufan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Have you tried using the EXPLAIN command to see what
plan is
>>>>>>>>>> being used to access the data?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Michael McAllister
>>>>>>>>>>
>>>>>>>>>> Staff Data Warehouse Engineer | Decision Systems
>>>>>>>>>>
>>>>>>>>>> mmcallister@HomeAway.com | C: 512.423.7447 | skype:
>>>>>>>>>> michael.mcallister.ha <zimmkate@hotmail.com>
| webex:
>>>>>>>>>> https://h.a/mikewebex
>>>>>>>>>>
>>>>>>>>>> [image: Description: Description: cid:3410354473_30269081]
>>>>>>>>>>
>>>>>>>>>> This electronic communication (including any attachment)
is
>>>>>>>>>> confidential.  If you are not an intended recipient
of this communication,
>>>>>>>>>> please be advised that any disclosure, dissemination,
distribution, copying
>>>>>>>>>> or other use of this communication or any attachment
is strictly
>>>>>>>>>> prohibited.  If you have received this communication
in error, please
>>>>>>>>>> notify the sender immediately by reply e-mail and
promptly destroy all
>>>>>>>>>> electronic and printed copies of this communication
and any attachment.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *From:* Yufan Liu [mailto:yliu33@kent.edu]
>>>>>>>>>> *Sent:* Friday, June 26, 2015 6:31 PM
>>>>>>>>>> *To:* user@phoenix.apache.org
>>>>>>>>>> *Subject:* Problem in finding the largest value of
an indexed
>>>>>>>>>> column
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> We have created a table (eg, t1), and a global index
of one
>>>>>>>>>> numeric column of t1 (eg, timestamp). Now we want
to find the largest value
>>>>>>>>>> of timestamp, we have tried two approaches:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 1. select max(timestamp) from t1; This query takes
forever to
>>>>>>>>>> finish, so I think it maybe doing a full table scan/comparison
.
>>>>>>>>>>
>>>>>>>>>> 2. select timestamp from t1 order by timestamp desc
limit 1; This
>>>>>>>>>> query finished fast, but the result it returns is
far from the largest
>>>>>>>>>> value. It seems it just return the largest value
for a certain range of
>>>>>>>>>> data.
>>>>>>>>>>
>>>>>>>>>> Did anyone else encounter this issue/have any suggestion?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yufan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> best,
>>>>>>>>>> Yufan
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> best,
>>>>>>>>> Yufan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> best,
>>>>>>>> Yufan
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> best,
>>>>>> Yufan
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> best,
>>>> Yufan
>>>>
>>>>
>>>
>>
>>
>> --
>> best,
>> Yufan
>>
>>
>


-- 
best,
Yufan

Mime
View raw message