phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Problem in finding the largest value of an indexed column
Date Fri, 03 Jul 2015 02:55:06 GMT
On further investigation, I believe it should have been working before. I
did a bit of cleanup and attached a new patch to PHOENIX-2096, but this
would only prevent a merge sort when one is not required (basically
improving performance).

Maybe your index is invalid? You can try rebuilding with this command:
https://phoenix.apache.org/language/index.html#alter_index

On Thu, Jul 2, 2015 at 5:26 PM, Yufan Liu <yliu33@kent.edu> wrote:

> The query on test dataset is returning the expected result with the patch.
> But on the original dataset (10million rows, 6 regions), it still return
> the same unexpected result, I will dig more into this. Thank you, James!
>
> 2015-07-02 9:58 GMT-07:00 Yufan Liu <yliu33@kent.edu>:
>
>> Sure, let me have a try
>>
>> 2015-07-02 9:46 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>>
>>> Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch.
>>> Would you mind confirming that this fixes the issue you're seeing?
>>>
>>>     James
>>>
>>> On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu <yliu33@kent.edu> wrote:
>>>
>>>> I'm using 4.4.0-HBase-0.98
>>>>
>>>> 2015-07-01 22:31 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>>>>
>>>>> Yufan,
>>>>> What version of Phoenix are you using?
>>>>> Thanks,
>>>>> James
>>>>>
>>>>> On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu <yliu33@kent.edu> wrote:
>>>>>
>>>>>> When I made more tests, I find that this problem happens after table
>>>>>> got split.
>>>>>>
>>>>>> Here is the DDL I use to create table and index:
>>>>>> CREATE TABLE IF NOT EXISTS t1 (
>>>>>> uid BIGINT NOT NULL,
>>>>>> timestamp BIGINT NOT NULL,
>>>>>> eventName VARCHAR
>>>>>> CONSTRAINT my_pk PRIMARY KEY (uid,  timestamp)) COMPRESSION='SNAPPY';
>>>>>>
>>>>>> CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName)
>>>>>>
>>>>>> Attach is the sample data I used for test. It has about 4000 rows,
>>>>>> when the timestamp_index table has one region, the query returns
correct
>>>>>> result: 1433334048443, but when I manually split it into 4 regions
(use
>>>>>> hbase tool), it returns 1433333024961.
>>>>>>
>>>>>> Let know if you find anything. Thanks!
>>>>>>
>>>>>>
>>>>>> 2015-07-01 11:27 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>>>>>>
>>>>>>> If you could put a complete test (including your DDL and upsert
of
>>>>>>> data), that would be much appreciated.
>>>>>>> Thanks,
>>>>>>> James
>>>>>>>
>>>>>>> On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu <yliu33@kent.edu>
wrote:
>>>>>>>
>>>>>>>> I have tried to use query "SELECT timestamp FROM t1 ORDER
BY
>>>>>>>> timestamp DESC NULLS LAST LIMIT 1". But it still returns
the same
>>>>>>>> unexpected result. There seems to be some internal problems
related.
>>>>>>>>
>>>>>>>> 2015-06-30 18:03 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>>>>>>>>
>>>>>>>>> Yes, reverse scan will be leveraged when possible. Make
you use
>>>>>>>>> NULLS LAST in your ORDER BY as rows are ordered with
nulls first.
>>>>>>>>>
>>>>>>>>> On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu <yliu33@kent.edu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I used the HBase reverse scan to find the last row
on the index
>>>>>>>>>> table. It returned the expected result. I would like
to know is Phoenix's
>>>>>>>>>> "ORDER BY"
>>>>>>>>>> and "DESC" implemented based on HBase reverse scan?
>>>>>>>>>>
>>>>>>>>>> 2015-06-26 17:25 GMT-07:00 Yufan Liu <yliu33@kent.edu>:
>>>>>>>>>>
>>>>>>>>>>> Thank you anyway, Michael!
>>>>>>>>>>>
>>>>>>>>>>> 2015-06-26 17:21 GMT-07:00 Michael McAllister
<
>>>>>>>>>>> mmcallister@homeaway.com>:
>>>>>>>>>>>
>>>>>>>>>>>>  OK, I’m a Phoenix newbie, so that was
the extent of the
>>>>>>>>>>>> advice I could give you. There are people
here far more experienced than I
>>>>>>>>>>>> am who should be able to give you deeper
advice. Have a great weekend!
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Mike
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *From:* Yufan Liu [mailto:yliu33@kent.edu]
>>>>>>>>>>>> *Sent:* Friday, June 26, 2015 7:19 PM
>>>>>>>>>>>> *To:* user@phoenix.apache.org
>>>>>>>>>>>> *Subject:* Re: Problem in finding the largest
value of an
>>>>>>>>>>>> indexed column
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Michael,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the advice, for the first one,
it's "CLIENT 67-CHUNK
>>>>>>>>>>>> PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX;
SERVER FILTER BY FIRST KEY
>>>>>>>>>>>> ONLY; SERVER AGGREGATE INTO SINGLE ROW" which
is as expected. For the
>>>>>>>>>>>> second one, it's "CLIENT 67-CHUNK SERIAL
1-WAY REVERSE FULL SCAN OVER
>>>>>>>>>>>> TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY
ONLY; SERVER 1 ROW LIMIT" which
>>>>>>>>>>>> looks correct, but still returns the unexpected
result.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-06-26 16:59 GMT-07:00 Michael McAllister
<
>>>>>>>>>>>> mmcallister@homeaway.com>:
>>>>>>>>>>>>
>>>>>>>>>>>> Yufan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Have you tried using the EXPLAIN command
to see what plan is
>>>>>>>>>>>> being used to access the data?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Michael McAllister
>>>>>>>>>>>>
>>>>>>>>>>>> Staff Data Warehouse Engineer | Decision
Systems
>>>>>>>>>>>>
>>>>>>>>>>>> mmcallister@HomeAway.com | C: 512.423.7447
| skype:
>>>>>>>>>>>> michael.mcallister.ha <zimmkate@hotmail.com>
| webex:
>>>>>>>>>>>> https://h.a/mikewebex
>>>>>>>>>>>>
>>>>>>>>>>>> [image: Description: Description: cid:3410354473_30269081]
>>>>>>>>>>>>
>>>>>>>>>>>> This electronic communication (including
any attachment) is
>>>>>>>>>>>> confidential.  If you are not an intended
recipient of this communication,
>>>>>>>>>>>> please be advised that any disclosure, dissemination,
distribution, copying
>>>>>>>>>>>> or other use of this communication or any
attachment is strictly
>>>>>>>>>>>> prohibited.  If you have received this communication
in error, please
>>>>>>>>>>>> notify the sender immediately by reply e-mail
and promptly destroy all
>>>>>>>>>>>> electronic and printed copies of this communication
and any attachment.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *From:* Yufan Liu [mailto:yliu33@kent.edu]
>>>>>>>>>>>> *Sent:* Friday, June 26, 2015 6:31 PM
>>>>>>>>>>>> *To:* user@phoenix.apache.org
>>>>>>>>>>>> *Subject:* Problem in finding the largest
value of an indexed
>>>>>>>>>>>> column
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> We have created a table (eg, t1), and a global
index of one
>>>>>>>>>>>> numeric column of t1 (eg, timestamp). Now
we want to find the largest value
>>>>>>>>>>>> of timestamp, we have tried two approaches:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 1. select max(timestamp) from t1; This query
takes forever to
>>>>>>>>>>>> finish, so I think it maybe doing a full
table scan/comparison .
>>>>>>>>>>>>
>>>>>>>>>>>> 2. select timestamp from t1 order by timestamp
desc limit 1;
>>>>>>>>>>>> This query finished fast, but the result
it returns is far from the largest
>>>>>>>>>>>> value. It seems it just return the largest
value for a certain range of
>>>>>>>>>>>> data.
>>>>>>>>>>>>
>>>>>>>>>>>> Did anyone else encounter this issue/have
any suggestion?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yufan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> best,
>>>>>>>>>>>> Yufan
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> best,
>>>>>>>>>>> Yufan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> best,
>>>>>>>>>> Yufan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> best,
>>>>>>>> Yufan
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> best,
>>>>>> Yufan
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> best,
>>>> Yufan
>>>>
>>>>
>>>
>>
>>
>> --
>> best,
>> Yufan
>>
>>
>
>
> --
> best,
> Yufan
>
>

Mime
View raw message