phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maryann Xue <maryann....@gmail.com>
Subject Re: Error when using aggregates with correlated subqueries
Date Sun, 26 Apr 2015 15:33:44 GMT
Glad that your queries worked. Please do let us know if any further
problems.

The bug is with EXISTS. Right now Phoenix does not handle aggregate
function calls correctly within an EXISTS subquery or EXISTS nested
subquery. I have opened a JIRA:
https://issues.apache.org/jira/browse/PHOENIX-1923.


Thanks,
Maryann

On Sun, Apr 26, 2015 at 8:54 AM, khaleel mershad <khellom007@gmail.com>
wrote:

>
> Dear Maryann,
>
> I tried your query and it worked. I also executed a more complex query
> which I need in my testing, which is:
>
> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
> AS W
> where EXISTS (select * from "History" AS H
>                         where (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+')
> = W."rowId")
>                         AND H."timestamp" = (select MAX(H2."timestamp")
> from "History" AS H2
>                                                           where
> (REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+') =
>
> REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+'))
>                                                           AND (H2."status"
> = 'approved')) AND (TO_NUMBER(H."value") > 1000))
> AND NOT EXISTS (select * from "History" AS H3
>                               where
> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId")
>                                 AND (H3."status" = 'pending') AND
> (TO_NUMBER(H3."value") < 1000));
>
> and it also worked after I understood your fix of the original query.
>
> So the trick here is that we can use the reference to the outer query
> within the next subquery level only, and not up to two levels as I was
> doing? Maybe this limitation exists because Phoenix joins the tables from
> the outer and the inner correlated query, but it can perform this join up
> to one level only?
>
>
>
> Best,
> Khaleel
>
>
> On Sat, Apr 25, 2015 at 8:11 PM, Maryann Xue <maryann.xue@gmail.com>
> wrote:
>
>> Hi Khaleel,
>>
>> Mind if you try the following query? I think it's the same semantics as
>> you meant for your query.
>>
>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
>> AS W
>> where
>> W."rowId" in (select H"rowId" from "History" AS H
>>                          where H."timestamp" =
>>                                 (select MAX(H2."timestamp") from
>> "History" AS H2 where H2."rowId" = H."rowId")
>>                          AND
>>  H."status" = 'approved')
>>
>>
>>
>> Thanks,
>> Maryann
>>
>> On Fri, Apr 24, 2015 at 5:25 PM, khaleel mershad <khellom007@gmail.com>
>> wrote:
>>
>>> Dear Maryann,
>>>
>>> Thanks for your question. You are right: the query that I was writing
>>> wasn't the correct one for my purpose. The query that will satisfy my
>>> request would be:
>>>
>>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
>>> AS W
>>> where
>>> EXISTS (select * from "History" AS H where (H."rowId" = W."rowId") AND
>>> H."timestamp" = (select MAX(H2."timestamp") from "History" AS H2
>>> where (H2."rowId" = W."rowId")) AND
>>>  (H."status" = 'approved') )
>>>
>>> In this query I specify that I need to select the exact version which
>>> has the Maximum timestamp among all versions of the same data item by using
>>> MAX(H2."timestamp") within the inner subquery within EXISTS.
>>>
>>> However I tried such query and it still produces the same error as the
>>> old query, which is: (Aggregate may not contain columns not in GROUP BY.
>>> )
>>>
>>> Thank you for your help.
>>>
>>>
>>> Best,
>>> Khaleel
>>>
>>>
>>> On Thu, Apr 23, 2015 at 5:49 PM, Maryann Xue <maryann.xue@gmail.com>
>>> wrote:
>>>
>>>> Hi Khaleel,
>>>>
>>>> Thanks for the explanation! But my question was since this is an
>>>> EXISTS, I assume if there ever is a H."timestamp" with an 'approved'
>>>> status, the EXISTS will return true regardless of whether you are testing
>>>> H."timestamp" or Max(H."timestamp"). Is that correct? or have I missed
>>>> something?
>>>>
>>>>
>>>> Thanks,
>>>> Maryann
>>>>
>>>> On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad <khellom007@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Hello Maryann,
>>>>>
>>>>> Thanks very much for your reply. Hopefully this bug gets fixed in the
>>>>> next release so that I can continue working with this part in my research
>>>>> project. Thanks for keeping me posted.
>>>>>
>>>>> With respect to your question, I am using the "History" table as a
>>>>> data store of all versions of a certain data item. When I say Max(
>>>>> H."timestamp") inside the query, I am selecting the latest version
>>>>> (most recent) that is "approved" (which is checked using the condition
H."status"
>>>>> = 'approved')
>>>>>
>>>>>
>>>>>
>>>>> Best Regards,
>>>>> Khaleel
>>>>>
>>>>> On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue <maryann.xue@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Khaleel,
>>>>>>
>>>>>> Thanks a lot for reporting the problem, which looks like a bug. I
>>>>>> will file a JIRA and keep you posted.
>>>>>>
>>>>>> One question though, why would we use MAX(H."timestamp") instead
of H."timestamp"?
>>>>>> What difference would it make?
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Maryann
>>>>>>
>>>>>> On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad <
>>>>>> khellom007@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I tried executing the following query in Phoenix:
>>>>>>> select distinct W."rowId", W."md5_current", W."size_current"
from
>>>>>>> "Wiki" AS W where EXISTS( select MAX(H."timestamp") from "History"
AS H
>>>>>>> where (H."rowId" = W."rowId") AND (H."status" = 'approved') );
>>>>>>>
>>>>>>> I got the following error:
>>>>>>> ERROR 1018 (42Y27): Aggregate may not contain columns not in
GROUP
>>>>>>> BY. (state=42Y27,code=1018)
>>>>>>>
>>>>>>> If I replace W."rowId" (in the where clause) with a constant
value
>>>>>>> (for example: '3587'), the query works fine. Also, if I replace
the
>>>>>>> aggregate MAX(H."timestamp") with H."timestamp", the query also
works fine.
>>>>>>>
>>>>>>> So it seems that Phoenix generates error when using a reference
to
>>>>>>> an outer query while using an aggregate within the inner query.
>>>>>>>
>>>>>>> Any solutions?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Khaleel Mershad, Ph.D.
>>>>>>> Research Associate
>>>>>>>
>>>>>>> American University of Beirut
>>>>>>> Department of Electrical and Computer Engineering
>>>>>>> Bliss Street, Beirut, Lebanon
>>>>>>> email: kwm03@aub.edu.lb
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message