phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yufan Liu <yli...@kent.edu>
Subject Re: Problem in finding the largest value of an indexed column
Date Fri, 03 Jul 2015 00:26:01 GMT
The query on test dataset is returning the expected result with the patch.
But on the original dataset (10million rows, 6 regions), it still return
the same unexpected result, I will dig more into this. Thank you, James!

2015-07-02 9:58 GMT-07:00 Yufan Liu <yliu33@kent.edu>:

> Sure, let me have a try
>
> 2015-07-02 9:46 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>
>> Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch.
>> Would you mind confirming that this fixes the issue you're seeing?
>>
>>     James
>>
>> On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu <yliu33@kent.edu> wrote:
>>
>>> I'm using 4.4.0-HBase-0.98
>>>
>>> 2015-07-01 22:31 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>>>
>>>> Yufan,
>>>> What version of Phoenix are you using?
>>>> Thanks,
>>>> James
>>>>
>>>> On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu <yliu33@kent.edu> wrote:
>>>>
>>>>> When I made more tests, I find that this problem happens after table
>>>>> got split.
>>>>>
>>>>> Here is the DDL I use to create table and index:
>>>>> CREATE TABLE IF NOT EXISTS t1 (
>>>>> uid BIGINT NOT NULL,
>>>>> timestamp BIGINT NOT NULL,
>>>>> eventName VARCHAR
>>>>> CONSTRAINT my_pk PRIMARY KEY (uid,  timestamp)) COMPRESSION='SNAPPY';
>>>>>
>>>>> CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName)
>>>>>
>>>>> Attach is the sample data I used for test. It has about 4000 rows,
>>>>> when the timestamp_index table has one region, the query returns correct
>>>>> result: 1433334048443, but when I manually split it into 4 regions (use
>>>>> hbase tool), it returns 1433333024961.
>>>>>
>>>>> Let know if you find anything. Thanks!
>>>>>
>>>>>
>>>>> 2015-07-01 11:27 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>>>>>
>>>>>> If you could put a complete test (including your DDL and upsert of
>>>>>> data), that would be much appreciated.
>>>>>> Thanks,
>>>>>> James
>>>>>>
>>>>>> On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu <yliu33@kent.edu>
wrote:
>>>>>>
>>>>>>> I have tried to use query "SELECT timestamp FROM t1 ORDER BY
>>>>>>> timestamp DESC NULLS LAST LIMIT 1". But it still returns the
same
>>>>>>> unexpected result. There seems to be some internal problems related.
>>>>>>>
>>>>>>> 2015-06-30 18:03 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>>>>>>>
>>>>>>>> Yes, reverse scan will be leveraged when possible. Make you
use
>>>>>>>> NULLS LAST in your ORDER BY as rows are ordered with nulls
first.
>>>>>>>>
>>>>>>>> On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu <yliu33@kent.edu>
wrote:
>>>>>>>>
>>>>>>>>> I used the HBase reverse scan to find the last row on
the index
>>>>>>>>> table. It returned the expected result. I would like
to know is Phoenix's
>>>>>>>>> "ORDER BY"
>>>>>>>>> and "DESC" implemented based on HBase reverse scan?
>>>>>>>>>
>>>>>>>>> 2015-06-26 17:25 GMT-07:00 Yufan Liu <yliu33@kent.edu>:
>>>>>>>>>
>>>>>>>>>> Thank you anyway, Michael!
>>>>>>>>>>
>>>>>>>>>> 2015-06-26 17:21 GMT-07:00 Michael McAllister <
>>>>>>>>>> mmcallister@homeaway.com>:
>>>>>>>>>>
>>>>>>>>>>>  OK, I’m a Phoenix newbie, so that was the
extent of the advice
>>>>>>>>>>> I could give you. There are people here far more
experienced than I am who
>>>>>>>>>>> should be able to give you deeper advice. Have
a great weekend!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Mike
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *From:* Yufan Liu [mailto:yliu33@kent.edu]
>>>>>>>>>>> *Sent:* Friday, June 26, 2015 7:19 PM
>>>>>>>>>>> *To:* user@phoenix.apache.org
>>>>>>>>>>> *Subject:* Re: Problem in finding the largest
value of an
>>>>>>>>>>> indexed column
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Michael,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the advice, for the first one, it's
"CLIENT 67-CHUNK
>>>>>>>>>>> PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX;
SERVER FILTER BY FIRST KEY
>>>>>>>>>>> ONLY; SERVER AGGREGATE INTO SINGLE ROW" which
is as expected. For the
>>>>>>>>>>> second one, it's "CLIENT 67-CHUNK SERIAL 1-WAY
REVERSE FULL SCAN OVER
>>>>>>>>>>> TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY;
SERVER 1 ROW LIMIT" which
>>>>>>>>>>> looks correct, but still returns the unexpected
result.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2015-06-26 16:59 GMT-07:00 Michael McAllister
<
>>>>>>>>>>> mmcallister@homeaway.com>:
>>>>>>>>>>>
>>>>>>>>>>> Yufan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Have you tried using the EXPLAIN command to see
what plan is
>>>>>>>>>>> being used to access the data?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Michael McAllister
>>>>>>>>>>>
>>>>>>>>>>> Staff Data Warehouse Engineer | Decision Systems
>>>>>>>>>>>
>>>>>>>>>>> mmcallister@HomeAway.com | C: 512.423.7447 |
skype:
>>>>>>>>>>> michael.mcallister.ha <zimmkate@hotmail.com>
| webex:
>>>>>>>>>>> https://h.a/mikewebex
>>>>>>>>>>>
>>>>>>>>>>> [image: Description: Description: cid:3410354473_30269081]
>>>>>>>>>>>
>>>>>>>>>>> This electronic communication (including any
attachment) is
>>>>>>>>>>> confidential.  If you are not an intended recipient
of this communication,
>>>>>>>>>>> please be advised that any disclosure, dissemination,
distribution, copying
>>>>>>>>>>> or other use of this communication or any attachment
is strictly
>>>>>>>>>>> prohibited.  If you have received this communication
in error, please
>>>>>>>>>>> notify the sender immediately by reply e-mail
and promptly destroy all
>>>>>>>>>>> electronic and printed copies of this communication
and any attachment.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *From:* Yufan Liu [mailto:yliu33@kent.edu]
>>>>>>>>>>> *Sent:* Friday, June 26, 2015 6:31 PM
>>>>>>>>>>> *To:* user@phoenix.apache.org
>>>>>>>>>>> *Subject:* Problem in finding the largest value
of an indexed
>>>>>>>>>>> column
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> We have created a table (eg, t1), and a global
index of one
>>>>>>>>>>> numeric column of t1 (eg, timestamp). Now we
want to find the largest value
>>>>>>>>>>> of timestamp, we have tried two approaches:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 1. select max(timestamp) from t1; This query
takes forever to
>>>>>>>>>>> finish, so I think it maybe doing a full table
scan/comparison .
>>>>>>>>>>>
>>>>>>>>>>> 2. select timestamp from t1 order by timestamp
desc limit 1;
>>>>>>>>>>> This query finished fast, but the result it returns
is far from the largest
>>>>>>>>>>> value. It seems it just return the largest value
for a certain range of
>>>>>>>>>>> data.
>>>>>>>>>>>
>>>>>>>>>>> Did anyone else encounter this issue/have any
suggestion?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yufan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> best,
>>>>>>>>>>> Yufan
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> best,
>>>>>>>>>> Yufan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> best,
>>>>>>>>> Yufan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> best,
>>>>>>> Yufan
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> best,
>>>>> Yufan
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> best,
>>> Yufan
>>>
>>>
>>
>
>
> --
> best,
> Yufan
>
>


-- 
best,
Yufan

Mime
View raw message