Hi Maryann,

 

Thanks for that - I will schedule an update to the latest version of Phoenix then for later this week (and try out the merge-join hints).

 

In the meantime, here are my explain plans:

 

JOIN WITH NO SQUIRREL LIMIT

 

PLAN

CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1

    PARALLEL INNER-JOIN TABLE 0

        CLIENT 3-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2

 

JOIN WITH SQUIRREL LIMIT 100

 

PLAN

CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1

CLIENT 100 ROW LIMIT

    PARALLEL INNER-JOIN TABLE 0

        CLIENT 3-CHUNK SERIAL 1-WAY FULL SCAN OVER mytable2

            SERVER FILTER BY PageFilter 100

            SERVER 100 ROW LIMIT

        CLIENT 100 ROW LIMIT

 

 

I’m not really sure how to read that, but it does seem to suggest that ‘mytable2’ is being limited to 100 – thoughts?

 

Cheers,

Matt

 

From: Maryann Xue [mailto:maryann.xue@gmail.com]
Sent: 23 February 2015 18:10
To: user@phoenix.apache.org
Subject: Re: Inner Join not returning any results in Phoenix

 

Hi Matt,

 

Yes, the upgrade is as easy as that. I believe things will work fine with existing tables.

I tried with a similar query but didn't see that it was a Phoenix bug. So could you please try the following explain statement and see the execution plan:

 

EXPLAIN SELECT * FROM "mytable1" hc

INNER JOIN “mytable2” bs

On hc."myId" = bs.”myId”

 

 

Thanks,

Maryann

 

 

On Fri, Feb 20, 2015 at 1:09 PM, Matthew Johnson <matt.johnson@algomi.com> wrote:

Hi Maryann,

 

That’s a good point – I am using 4.2.2, so if that feature is 4.3+ then that would explain why it’s not working. Is upgrading versions of Phoenix as simple as removing the previous jar from HBase lib folder and dropping the new Phoenix jar in (and restarting HBase)? Will all the existing Phoenix tables and views be backwards-compatible and work with the new version?

 

Cheers,

Matt

 

 

From: Maryann Xue [mailto:maryann.xue@gmail.com]
Sent: 20 February 2015 17:46


To: user@phoenix.apache.org
Subject: Re: Inner Join not returning any results in Phoenix

 

Which version of Phoenix are you using, Matt? This feature is only available in the latest releases of 4.3/3.3.

 

On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson <matt.johnson@algomi.com> wrote:

Hi Maryann,

 

Unfortunately my two tables are roughly the same size (~500k), but I have tested a different join where one table is ~500k and the other is ~20k and putting the larger one first is definitely far more performant. I believe you are right about running out of memory, I can see this repeated a few times in the region server logs followed by what appears to be a restart or disconnect:

 

[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1083ms

 

I have been looking at the Phoenix page on joins (http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge joins for large tables by using a hint. I have tried this though with no success:

 

SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc

INNER JOIN “mytable2” bs

On hc.”myId” = bs.”myId”

 

Am I putting the hint in the wrong place? Does it need to go next to the JOIN rather than the SELECT?

 

I will try increasing the memory available to the Region Servers as well to see if that helps.

 

Thanks!

Matt

 

 

From: Maryann Xue [mailto:maryann.xue@gmail.com]
Sent: 20 February 2015 16:28
To: user@phoenix.apache.org


Subject: Re: Inner Join not returning any results in Phoenix

 

Hi Matt,

 

The error you got with "Limit Rows" off might be related to insufficient memory on region servers for one of your tables. Which is the larger table between table1 and table2? You might want to try putting the larger table as the first table in your join query and see if it works. 

 

And I will quickly check if the LIMIT problem is a Phoenix bug and will keep you posted.

 

 

Thanks,

Maryann

 

 

On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson <matt.johnson@algomi.com> wrote:

Hi Abe,

 

Glad to hear I’m not alone! Will try and figure out exactly what’s happening and maybe raise a Jira :-)

 

 

@Constantin – I have tried with and without the “Limit Rows” – but without it, and without any indexes, the query runs for a while (about 10 minutes?) and then throws an error:

 

Error: Encountered exception in sub plan [0] execution.

 

Which I’m guessing is either HBase or Zookeeper timeout. The weird thing is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit Rows” does not affect any aggregate functions like ‘count’, because the actual number of result rows for a count is just 1 row (the count itself). But in HBase it seems that the Row Limit, as Abe mentioned, is applied to one of the table BEFORE it does the join, so it affects the results of the ‘count’ function.

 

When I try to create my indexes so I am able to do the join without Row Limit, I get the following error:

 

ERROR 1029 (42Y88): Mutable secondary indexes must have the hbase.regionserver.wal.codec property set to org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the hbase-sites.xml of every region server

 

Which I am happy to do (will have to wait until outside of business hours though), but I am curious, will this have any impact on the rest of my cluster and could it have any unforeseen consequences?

 

Thanks again for the input!

 

Cheers,

Matt

 

 

From: Ciureanu, Constantin (GfK) [mailto:Constantin.Ciureanu@gfk.com]
Sent: 20 February 2015 15:48


To: user@phoenix.apache.org
Subject: RE: Inner Join not returning any results in Phoenix

 

Hello Matt,

 

http://codingclues.eu/2008/the-squirrel-100-rows-problem/

 

Can you please test again after unchecking “Contents- Limit rows” and “SQL – Limit rows”?

 

SQL tab

 

P.S. Off-topic – it’s as funny as this “problem” (not possible to send an email for more than 500 miles away J  http://www.ibiblio.org/harris/500milemail.html )

 

Regards,

  Constantin

 

From: Abe Weinograd [mailto:abe@flonet.com]
Sent: Friday, February 20, 2015 4:18 PM
To: user
Subject: Re: Inner Join not returning any results in Phoenix

 

Matt,

 

I have seen this same issue.  When passing a LIMIT to a query with joins (most query tools do it implicitly), Phoenix seems to apply that to the table on the right of the join I believe.  I hadn't had a chance to play with it more and file a JIRA, but what you are describing is consistent with what I have seen.

 

Abe

 

On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson <matt.johnson@algomi.com> wrote:

Hi Constantin,

 

Many thanks for your reply – the quotes were both of the same type (double quotes for table and column names, single quotes for string literals), it is just my email client that formatted them weirdly, sorry!

 

I have discovered what I believe is an important piece of the puzzle to my problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has a “Limit Rows” feature. When I try and count the number of rows in a single table:

 

select count(*) from “mytable1”

 

I get the expected number of results (eg 20,000). But when I join two tables together, it seems that the “Limit Rows” from Squirrel is somehow being applied before the join is performed, and if “Limit Rows” is set to 100 I get 100 results or less. If the inner join is quite sparse (eg 20,000 rows in a table but only 100 of these will join with a second table) then I believe it tries to join the first 100 it finds and returns no results. In my experience of Oracle or MySQL, joins are done entirely on server side and then you just get back the number of rows you limited, rather than what appears to be happening which is the row limit is applied to the first table before the join is attempted with the second table. Is that how Phoenix works?

 

I have also discovered that I get different results (with “Limit Rows” turned on) depending on which order I join the tables:

 

SELECT count(*) FROM “mytable1” hc

INNER JOIN “mytable2” bs

On hc.”myId” = bs.”myId”

 

Gives me a very different number of results than:

 

SELECT count(*) FROM “mytable2” bs

INNER JOIN “mytable1” hc

On hc.”myId” = bs.”myId”

 

 

Unfortunately I cannot test whether I get the same number of results with “Limit Rows” turned off because my query times out! So I am now looking at creating secondary indexes on the “myId” column in both tables to see if I am able to do this join quicker. Does a join like this use a lot of memory on server side? Is something likely to be running out of resources?

 

Many thanks again for your time.

 

Cheers,

Matt

 

 

From: Ciureanu, Constantin (GfK) [mailto:Constantin.Ciureanu@gfk.com]
Sent: 20 February 2015 14:40
To: user@phoenix.apache.org
Subject: RE: Inner Join not returning any results in Phoenix

 

Hi Matthew,

 

Is it working without the quotes “ / "  ?   (I see you are using 2 types of quotes, weird)

I guess that’s not needed, and probably causing troubles.   I don’t have to use quotes anyway.

 

Alternatively check the types of data in those 2 tables (if the field types are not the same in both tables, the join will not work).

 

Good luck,

  Constantin

 

From: Matthew Johnson [mailto:matt.johnson@algomi.com]
Sent: Friday, February 20, 2015 12:54 PM
To: user@phoenix.apache.org
Subject: Inner Join not returning any results in Phoenix

 

Hi guys,

 

I’m a little bit stuck with doing an Inner Join with Phoenix. I set up one environment, created tables in HBase, and then created views (rather than tables) in Phoenix, and am able to query as expected (when I join my two tables I see results). I’ve just promoted to another environment, with the exact same setup, but my Inner Join returns no results!

 

I run the following two individual queries:

 

SELECT * FROM "mytable1" hc

where hc."myId" = 'XS0'

 

SELECT * FROM "mytable2" bs

where bs."myId" = 'XS0'

 

And both of these queries give results. But when I run:

 

SELECT * FROM "mytable1" hc

INNER JOIN “mytable2” bs

On hc."myId" = bs.”myId”

 

I get no results. I also get no results if I try:

 

SELECT * FROM "mytable1" hc

where hc."myId" in (select distinct “myId” from “mytable2”)

 

I have checked in HBase shell and can see the “myId” value is as expected (XS0 in both tables). I am not sure if there are any logs that I can look at to get some insight?

 

Many thanks in advance for any suggestions!

 

Cheers,

Matt

 

PS Something that may or may not be of note: In the environments I am using:

WORKING: hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar

FAILING:     hbase-0.98.9-hadoop2 / phoenix-4.2.2-server.jar