phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yufan Liu <yli...@kent.edu>
Subject Re: Problem in finding the largest value of an indexed column
Date Thu, 09 Jul 2015 20:44:34 GMT
Just got a chance to revisit this issue: I have rebuilt the index and it
still returns the unexpected result. By using the test case, I tried to
insert enough rows to make it auto-split and it reproduces the problem too.
It seems it still has trouble returning last row sorted by first component
of primary key on split tables. Maybe there is another issue than
PHOENIX-2096? The phoenix I am using is pulled from latest 4.x-HBase-0.98
branch which includes the patch of PHOENIX-2096.

2015-07-02 19:55 GMT-07:00 James Taylor <jamestaylor@apache.org>:

> On further investigation, I believe it should have been working before. I
> did a bit of cleanup and attached a new patch to PHOENIX-2096, but this
> would only prevent a merge sort when one is not required (basically
> improving performance).
>
> Maybe your index is invalid? You can try rebuilding with this command:
> https://phoenix.apache.org/language/index.html#alter_index
>
> On Thu, Jul 2, 2015 at 5:26 PM, Yufan Liu <yliu33@kent.edu> wrote:
>
>> The query on test dataset is returning the expected result with the
>> patch. But on the original dataset (10million rows, 6 regions), it still
>> return the same unexpected result, I will dig more into this. Thank you,
>> James!
>>
>> 2015-07-02 9:58 GMT-07:00 Yufan Liu <yliu33@kent.edu>:
>>
>>> Sure, let me have a try
>>>
>>> 2015-07-02 9:46 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>>>
>>>> Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch.
>>>> Would you mind confirming that this fixes the issue you're seeing?
>>>>
>>>>     James
>>>>
>>>> On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu <yliu33@kent.edu> wrote:
>>>>
>>>>> I'm using 4.4.0-HBase-0.98
>>>>>
>>>>> 2015-07-01 22:31 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>>>>>
>>>>>> Yufan,
>>>>>> What version of Phoenix are you using?
>>>>>> Thanks,
>>>>>> James
>>>>>>
>>>>>> On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu <yliu33@kent.edu>
wrote:
>>>>>>
>>>>>>> When I made more tests, I find that this problem happens after
table
>>>>>>> got split.
>>>>>>>
>>>>>>> Here is the DDL I use to create table and index:
>>>>>>> CREATE TABLE IF NOT EXISTS t1 (
>>>>>>> uid BIGINT NOT NULL,
>>>>>>> timestamp BIGINT NOT NULL,
>>>>>>> eventName VARCHAR
>>>>>>> CONSTRAINT my_pk PRIMARY KEY (uid,  timestamp)) COMPRESSION='SNAPPY';
>>>>>>>
>>>>>>> CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName)
>>>>>>>
>>>>>>> Attach is the sample data I used for test. It has about 4000
rows,
>>>>>>> when the timestamp_index table has one region, the query returns
correct
>>>>>>> result: 1433334048443, but when I manually split it into 4 regions
(use
>>>>>>> hbase tool), it returns 1433333024961.
>>>>>>>
>>>>>>> Let know if you find anything. Thanks!
>>>>>>>
>>>>>>>
>>>>>>> 2015-07-01 11:27 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>>>>>>>
>>>>>>>> If you could put a complete test (including your DDL and
upsert of
>>>>>>>> data), that would be much appreciated.
>>>>>>>> Thanks,
>>>>>>>> James
>>>>>>>>
>>>>>>>> On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu <yliu33@kent.edu>
wrote:
>>>>>>>>
>>>>>>>>> I have tried to use query "SELECT timestamp FROM t1 ORDER
BY
>>>>>>>>> timestamp DESC NULLS LAST LIMIT 1". But it still returns
the same
>>>>>>>>> unexpected result. There seems to be some internal problems
related.
>>>>>>>>>
>>>>>>>>> 2015-06-30 18:03 GMT-07:00 James Taylor <jamestaylor@apache.org>:
>>>>>>>>>
>>>>>>>>>> Yes, reverse scan will be leveraged when possible.
Make you use
>>>>>>>>>> NULLS LAST in your ORDER BY as rows are ordered with
nulls first.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu <yliu33@kent.edu>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I used the HBase reverse scan to find the last
row on the index
>>>>>>>>>>> table. It returned the expected result. I would
like to know is Phoenix's
>>>>>>>>>>> "ORDER BY"
>>>>>>>>>>> and "DESC" implemented based on HBase reverse
scan?
>>>>>>>>>>>
>>>>>>>>>>> 2015-06-26 17:25 GMT-07:00 Yufan Liu <yliu33@kent.edu>:
>>>>>>>>>>>
>>>>>>>>>>>> Thank you anyway, Michael!
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-06-26 17:21 GMT-07:00 Michael McAllister
<
>>>>>>>>>>>> mmcallister@homeaway.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>>  OK, I’m a Phoenix newbie, so that
was the extent of the
>>>>>>>>>>>>> advice I could give you. There are people
here far more experienced than I
>>>>>>>>>>>>> am who should be able to give you deeper
advice. Have a great weekend!
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mike
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *From:* Yufan Liu [mailto:yliu33@kent.edu]
>>>>>>>>>>>>> *Sent:* Friday, June 26, 2015 7:19 PM
>>>>>>>>>>>>> *To:* user@phoenix.apache.org
>>>>>>>>>>>>> *Subject:* Re: Problem in finding the
largest value of an
>>>>>>>>>>>>> indexed column
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Michael,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the advice, for the first
one, it's "CLIENT
>>>>>>>>>>>>> 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER
TIMESTAMP_INDEX; SERVER FILTER BY
>>>>>>>>>>>>> FIRST KEY ONLY; SERVER AGGREGATE INTO
SINGLE ROW" which is as expected. For
>>>>>>>>>>>>> the second one, it's "CLIENT 67-CHUNK
SERIAL 1-WAY REVERSE FULL SCAN OVER
>>>>>>>>>>>>> TIMESTAMP_INDEX; SERVER FILTER BY FIRST
KEY ONLY; SERVER 1 ROW LIMIT" which
>>>>>>>>>>>>> looks correct, but still returns the
unexpected result.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-06-26 16:59 GMT-07:00 Michael McAllister
<
>>>>>>>>>>>>> mmcallister@homeaway.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yufan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Have you tried using the EXPLAIN command
to see what plan is
>>>>>>>>>>>>> being used to access the data?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Michael McAllister
>>>>>>>>>>>>>
>>>>>>>>>>>>> Staff Data Warehouse Engineer | Decision
Systems
>>>>>>>>>>>>>
>>>>>>>>>>>>> mmcallister@HomeAway.com | C: 512.423.7447
| skype:
>>>>>>>>>>>>> michael.mcallister.ha <zimmkate@hotmail.com>
| webex:
>>>>>>>>>>>>> https://h.a/mikewebex
>>>>>>>>>>>>>
>>>>>>>>>>>>> [image: Description: Description: cid:3410354473_30269081]
>>>>>>>>>>>>>
>>>>>>>>>>>>> This electronic communication (including
any attachment) is
>>>>>>>>>>>>> confidential.  If you are not an intended
recipient of this communication,
>>>>>>>>>>>>> please be advised that any disclosure,
dissemination, distribution, copying
>>>>>>>>>>>>> or other use of this communication or
any attachment is strictly
>>>>>>>>>>>>> prohibited.  If you have received this
communication in error, please
>>>>>>>>>>>>> notify the sender immediately by reply
e-mail and promptly destroy all
>>>>>>>>>>>>> electronic and printed copies of this
communication and any attachment.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *From:* Yufan Liu [mailto:yliu33@kent.edu]
>>>>>>>>>>>>> *Sent:* Friday, June 26, 2015 6:31 PM
>>>>>>>>>>>>> *To:* user@phoenix.apache.org
>>>>>>>>>>>>> *Subject:* Problem in finding the largest
value of an indexed
>>>>>>>>>>>>> column
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> We have created a table (eg, t1), and
a global index of one
>>>>>>>>>>>>> numeric column of t1 (eg, timestamp).
Now we want to find the largest value
>>>>>>>>>>>>> of timestamp, we have tried two approaches:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. select max(timestamp) from t1; This
query takes forever to
>>>>>>>>>>>>> finish, so I think it maybe doing a full
table scan/comparison .
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. select timestamp from t1 order by
timestamp desc limit 1;
>>>>>>>>>>>>> This query finished fast, but the result
it returns is far from the largest
>>>>>>>>>>>>> value. It seems it just return the largest
value for a certain range of
>>>>>>>>>>>>> data.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Did anyone else encounter this issue/have
any suggestion?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yufan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>> best,
>>>>>>>>>>>>> Yufan
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> best,
>>>>>>>>>>>> Yufan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> best,
>>>>>>>>>>> Yufan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> best,
>>>>>>>>> Yufan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> best,
>>>>>>> Yufan
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> best,
>>>>> Yufan
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> best,
>>> Yufan
>>>
>>>
>>
>>
>> --
>> best,
>> Yufan
>>
>>
>


-- 
best,
Yufan

Mime
View raw message