phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Mottram <Simon.Mott...@cucumber.co.nz>
Subject Re: Issue using ANY ARRAY feature
Date Sun, 08 Dec 2019 23:12:39 GMT
Update:

Just in case anyone hits this issue in future with the AWS managed
HBase, the fix is to use a very specific version of the driver

For thick client:

        <dependency>
            <groupId>org.apache.phoenix</groupId>
            <artifactId>phoenix-core</artifactId>
            <version>4.14.1-HBase-1.4</version>
        </dependency>

For thin client:

        <dependency>
            <groupId>org.apache.phoenix</groupId>
            <artifactId>phoenix-queryserver-client</artifactId>
            <version>4.14.1-HBase-1.4</version>
        </dependency>

This required support help from AWS as this driver version is not
mentioned on the official Apache Phoenix download page

Regards

Simon


On Sun, 2019-11-17 at 21:03 +0000, Simon Mottram wrote:
> Phoenix Version: 4.14.2-HBase-1.4
> HBase Version: AWS EMR 
> Release
> label:emr-5.24.1
> Hadoop distribution:Amazon
> Applications:Phoenix 4.14.1,
> Hue 4.4.0, HBase 1.4.9
> 
> 
> Having an issue where ANY(ARRAY[]) stops the query returning any
> results when used in a 'AND' conjunction
> 
> e.g (fielda = 'a') AND (fieldb = any(array['a','b','c]))
> 
> Always returns zero results, if i change to disjunction (OR) it works
> fine but obviously isn't what's wanted here.  Excuse the long post
> but
> I wanted to be as clear as possible.
> 
> It's quite possible I have misunderstood the way ANY() works but...
> 
> Here's a simple query that returns a correct number of results:
> 
> SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
>  FROM DEV_OAPI.OBSERVATION
> WHERE (BIOMATERIALNAME = 'ROOT00386') AND (TRIALNAME =
> 'TRIAL00015')  
>  ORDER BY OBSERVATIONDATE DESC
>  LIMIT 10
>  OFFSET 0
> 
> So there's definitely records where biomaterial name and trialname
> have
> these values
> 
> If I change it to
> 
> SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
>  FROM DEV_OAPI.OBSERVATION
> WHERE (TRIALNAME = ANY(ARRAY['TRIAL00015', 'SOMETHING ELSE']))  
>  ORDER BY OBSERVATIONDATE DESC
>  LIMIT 10
>  OFFSET 0
> 
> I get valid results
> 
> So the ANY(ARRAY[]) function works
> 
> Here's the explain which looks very odd to more, but it works
> 
> PLAN                                                                 
>   
>                                                                      
>   
> CLIENT 1-CHUNK 50574 ROWS 314572800 BYTES PARALLEL 1-WAY FULL SCAN
> OVER
> DEV_OAPI.OBSERVATION                                                 
>   
>                                                                      
>   
> SERVER FILTER BY
> org.apache.phoenix.expression.function.ArrayAnyComparisonExpression
> [children=[ARRAY['TRIAL00015'], TRIALNAME =
> org.apache.phoenix.expression.function.ArrayElemRefExpression
> [children=[ARRAY['TRIAL00015'], 1]]]]
>     SERVER TOP 10 ROWS SORTED BY [OBSERVATIONDATE
> DESC]                                                                
>  C
> CLIENT MERGE
> SORT                                                                 
>   
> CLIENT LIMIT 10    
> 
> So far so good, BUT.
> 
> However if I combine the ARRAY expression with any expression using
> AND
> I get zero results, even tho as above both sides of the conjunction
> return true.
> 
> e.g.
> 
> SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
>  FROM DEV_OAPI.OBSERVATION
> WHERE (BIOMATERIALNAME = 'ROOT00386') AND (TRIALNAME =
> ANY(ARRAY['TRIAL00015', 'SOMETHING ELSE']))
>  ORDER BY OBSERVATIONDATE DESC
>  LIMIT 10
>  OFFSET 0
> 
> Explain (newlines added):
>  SERVER FILTER BY 
>  (BIOMATERIALNAME = 'SCION00424' 
>  AND 
>  org.apache.phoenix.expression.function.ArrayAnyComparisonExpression
> [children=[ARRAY['TRIAL00015','SOMETHING ELSE'], 
>  TRIALNAME =
> org.apache.phoenix.expression.function.ArrayElemRefExpression
> [children=[ARRAY['TRIAL00015','SOMETHING ELSE'], 1]]]])
> 
> Just out of interest I tried with strings only in the array check
> 
> SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
>  FROM DEV_OAPI.OBSERVATION
> WHERE (BIOMATERIALNAME = 'ROOT00386') AND ('TRIAL00015' =
> ANY(ARRAY['TRIAL00015']))  
>  ORDER BY OBSERVATIONDATE DESC
>  LIMIT 10
>  OFFSET 0
> 
> This works fine (in a kind of unhelpful way)
> 
> I have tested using the thick client:
>         <!-- HBASE THICK CLIENT DEPS -->
>         <dependency>
>             <groupId>org.apache.phoenix</groupId>
>             <artifactId>phoenix-core</artifactId>
>             <version>4.14.2-HBase-1.4</version>
>         </dependency>
> 
> and the thin client
> 
>          <dependency>
>             <groupId>org.apache.phoenix</groupId>
>             <artifactId>phoenix-queryserver-client</artifactId>
>             <version>4.14.2-HBase-1.4</version>
>         </dependency>
> 
> I've tried using braces and re-ordering but any query of the form:
> <anyclause> AND field = ANY(ARRAY['value1'...])  
> Returns zero results regardless of values
> 
> We can't change version as we are using the Amazon AWS EMR managed
> stack and no other phoenix libraries work.
> 
> Thanks for taking the time to read this far!
> 
> Cheers
> 
> Simon
Mime
View raw message