phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Inner Join not returning any results in Phoenix
Date Tue, 24 Feb 2015 16:27:01 GMT
FYI, SQuirrel sets the max rows to return as 100. You can change this in
the tool, though.

On Tuesday, February 24, 2015, Maryann Xue <maryann.xue@gmail.com> wrote:

> Thanks a lot, Matt, for the reply! Very helpful. "*SERVER FILTER BY
> PageFilter 100*" does look like a but here. I will try again to reproduce
> it.
>
>
> Thanks,
> Maryann
>
> On Tue, Feb 24, 2015 at 6:07 AM, Matthew Johnson <matt.johnson@algomi.com
> <javascript:_e(%7B%7D,'cvml','matt.johnson@algomi.com');>> wrote:
>
>> Hi Maryann,
>>
>>
>>
>> Thanks for that - I will schedule an update to the latest version of
>> Phoenix then for later this week (and try out the merge-join hints).
>>
>>
>>
>> In the meantime, here are my explain plans:
>>
>>
>>
>> *JOIN WITH NO SQUIRREL LIMIT*
>>
>>
>>
>> *PLAN*
>>
>> *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1*
>>
>> *    PARALLEL INNER-JOIN TABLE 0*
>>
>> *        CLIENT 3-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2*
>>
>>
>>
>> *JOIN WITH SQUIRREL LIMIT 100*
>>
>>
>>
>> *PLAN*
>>
>> *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1*
>>
>> *CLIENT 100 ROW LIMIT*
>>
>> *    PARALLEL INNER-JOIN TABLE 0*
>>
>> *        CLIENT 3-CHUNK SERIAL 1-WAY FULL SCAN OVER mytable2*
>>
>> *            SERVER FILTER BY PageFilter 100*
>>
>> *            SERVER 100 ROW LIMIT*
>>
>> *        CLIENT 100 ROW LIMIT*
>>
>>
>>
>>
>>
>> I’m not really sure how to read that, but it does seem to suggest that
>> ‘mytable2’ is being limited to 100 – thoughts?
>>
>>
>>
>> Cheers,
>>
>> Matt
>>
>>
>>
>> *From:* Maryann Xue [mailto:maryann.xue@gmail.com
>> <javascript:_e(%7B%7D,'cvml','maryann.xue@gmail.com');>]
>> *Sent:* 23 February 2015 18:10
>>
>> *To:* user@phoenix.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>
>> *Subject:* Re: Inner Join not returning any results in Phoenix
>>
>>
>>
>> Hi Matt,
>>
>>
>>
>> Yes, the upgrade is as easy as that. I believe things will work fine with
>> existing tables.
>>
>> I tried with a similar query but didn't see that it was a Phoenix bug. So
>> could you please try the following explain statement and see the execution
>> plan:
>>
>>
>>
>> EXPLAIN *SELECT * FROM "mytable1" hc*
>>
>> *INNER JOIN “mytable2” bs*
>>
>> *On hc."myId" = bs.”myId”*
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Maryann
>>
>>
>>
>>
>>
>> On Fri, Feb 20, 2015 at 1:09 PM, Matthew Johnson <matt.johnson@algomi.com
>> <javascript:_e(%7B%7D,'cvml','matt.johnson@algomi.com');>> wrote:
>>
>> Hi Maryann,
>>
>>
>>
>> That’s a good point – I am using 4.2.2, so if that feature is 4.3+ then
>> that would explain why it’s not working. Is upgrading versions of Phoenix
>> as simple as removing the previous jar from HBase lib folder and dropping
>> the new Phoenix jar in (and restarting HBase)? Will all the existing
>> Phoenix tables and views be backwards-compatible and work with the new
>> version?
>>
>>
>>
>> Cheers,
>>
>> Matt
>>
>>
>>
>>
>>
>> *From:* Maryann Xue [mailto:maryann.xue@gmail.com
>> <javascript:_e(%7B%7D,'cvml','maryann.xue@gmail.com');>]
>> *Sent:* 20 February 2015 17:46
>>
>>
>> *To:* user@phoenix.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>
>> *Subject:* Re: Inner Join not returning any results in Phoenix
>>
>>
>>
>> Which version of Phoenix are you using, Matt? This feature is only
>> available in the latest releases of 4.3/3.3.
>>
>>
>>
>> On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson <
>> matt.johnson@algomi.com
>> <javascript:_e(%7B%7D,'cvml','matt.johnson@algomi.com');>> wrote:
>>
>> Hi Maryann,
>>
>>
>>
>> Unfortunately my two tables are roughly the same size (~500k), but I have
>> tested a different join where one table is ~500k and the other is ~20k and
>> putting the larger one first is definitely far more performant. I believe
>> you are right about running out of memory, I can see this repeated a few
>> times in the region server logs followed by what appears to be a restart or
>> disconnect:
>>
>>
>>
>> *[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host
>> machine (eg GC): pause of approximately 1083ms*
>>
>>
>>
>> I have been looking at the Phoenix page on joins (
>> http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge
>> joins for large tables by using a hint. I have tried this though with no
>> success:
>>
>>
>>
>> *SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc*
>>
>> *INNER JOIN “mytable2” bs*
>>
>> *On hc.”myId” = bs.”myId”*
>>
>>
>>
>> Am I putting the hint in the wrong place? Does it need to go next to the
>> JOIN rather than the SELECT?
>>
>>
>>
>> I will try increasing the memory available to the Region Servers as well
>> to see if that helps.
>>
>>
>>
>> Thanks!
>>
>> Matt
>>
>>
>>
>>
>>
>> *From:* Maryann Xue [mailto:maryann.xue@gmail.com
>> <javascript:_e(%7B%7D,'cvml','maryann.xue@gmail.com');>]
>> *Sent:* 20 February 2015 16:28
>> *To:* user@phoenix.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>
>>
>>
>> *Subject:* Re: Inner Join not returning any results in Phoenix
>>
>>
>>
>> Hi Matt,
>>
>>
>>
>> The error you got with "Limit Rows" off might be related to insufficient
>> memory on region servers for one of your tables. Which is the larger table
>> between table1 and table2? You might want to try putting the larger table
>> as the first table in your join query and see if it works.
>>
>>
>>
>> And I will quickly check if the LIMIT problem is a Phoenix bug and will
>> keep you posted.
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Maryann
>>
>>
>>
>>
>>
>> On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson <
>> matt.johnson@algomi.com
>> <javascript:_e(%7B%7D,'cvml','matt.johnson@algomi.com');>> wrote:
>>
>> Hi Abe,
>>
>>
>>
>> Glad to hear I’m not alone! Will try and figure out exactly what’s
>> happening and maybe raise a Jira :-)
>>
>>
>>
>>
>>
>> @Constantin – I have tried with and without the “Limit Rows” – but
>> without it, and without any indexes, the query runs for a while (about 10
>> minutes?) and then throws an error:
>>
>>
>>
>> *Error: Encountered exception in sub plan [0] execution.*
>>
>>
>>
>> Which I’m guessing is either HBase or Zookeeper timeout. The weird thing
>> is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit
>> Rows” does not affect any aggregate functions like ‘count’, because the
>> actual number of result rows for a count is just 1 row (the count itself).
>> But in HBase it seems that the Row Limit, as Abe mentioned, is applied to
>> one of the table BEFORE it does the join, so it affects the results of the
>> ‘count’ function.
>>
>>
>>
>> When I try to create my indexes so I am able to do the join without Row
>> Limit, I get the following error:
>>
>>
>>
>> ERROR 1029 (42Y88): Mutable secondary indexes must have the
>> hbase.regionserver.wal.codec property set to
>> org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the
>> hbase-sites.xml of every region server
>>
>>
>>
>> Which I am happy to do (will have to wait until outside of business hours
>> though), but I am curious, will this have any impact on the rest of my
>> cluster and could it have any unforeseen consequences?
>>
>>
>>
>> Thanks again for the input!
>>
>>
>>
>> Cheers,
>>
>> Matt
>>
>>
>>
>>
>>
>> *From:* Ciureanu, Constantin (GfK) [mailto:Constantin.Ciureanu@gfk.com
>> <javascript:_e(%7B%7D,'cvml','Constantin.Ciureanu@gfk.com');>]
>> *Sent:* 20 February 2015 15:48
>>
>>
>> *To:* user@phoenix.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>
>> *Subject:* RE: Inner Join not returning any results in Phoenix
>>
>>
>>
>> Hello Matt,
>>
>>
>>
>> http://codingclues.eu/2008/the-squirrel-100-rows-problem/
>>
>>
>>
>> Can you please test again after unchecking “Contents- Limit rows” and
>> “SQL – Limit rows”?
>>
>>
>>
>> [image: SQL tab]
>>
>>
>>
>> P.S. Off-topic – it’s as funny as this “problem” (not possible to send an
>> email for more than 500 miles away J
>> http://www.ibiblio.org/harris/500milemail.html )
>>
>>
>>
>> Regards,
>>
>>   Constantin
>>
>>
>>
>> *From:* Abe Weinograd [mailto:abe@flonet.com
>> <javascript:_e(%7B%7D,'cvml','abe@flonet.com');>]
>> *Sent:* Friday, February 20, 2015 4:18 PM
>> *To:* user
>> *Subject:* Re: Inner Join not returning any results in Phoenix
>>
>>
>>
>> Matt,
>>
>>
>>
>> I have seen this same issue.  When passing a LIMIT to a query with joins
>> (most query tools do it implicitly), Phoenix seems to apply that to the
>> table on the right of the join I believe.  I hadn't had a chance to play
>> with it more and file a JIRA, but what you are describing is consistent
>> with what I have seen.
>>
>>
>>
>> Abe
>>
>>
>>
>> On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson <
>> matt.johnson@algomi.com
>> <javascript:_e(%7B%7D,'cvml','matt.johnson@algomi.com');>> wrote:
>>
>> Hi Constantin,
>>
>>
>>
>> Many thanks for your reply – the quotes were both of the same type
>> (double quotes for table and column names, single quotes for string
>> literals), it is just my email client that formatted them weirdly, sorry!
>>
>>
>>
>> I have discovered what I believe is an important piece of the puzzle to
>> my problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it
>> has a “Limit Rows” feature. When I try and count the number of rows in a
>> single table:
>>
>>
>>
>> *select count(*) from “mytable1”*
>>
>>
>>
>> I get the expected number of results (eg 20,000). But when I join two
>> tables together, it seems that the “Limit Rows” from Squirrel is somehow
>> being applied before the join is performed, and if “Limit Rows” is set to
>> 100 I get 100 results or less. If the inner join is quite sparse (eg 20,000
>> rows in a table but only 100 of these will join with a second table) then I
>> believe it tries to join the first 100 it finds and returns no results. In
>> my experience of Oracle or MySQL, joins are done entirely on server side
>> and then you just get back the number of rows you limited, rather than what
>> appears to be happening which is the row limit is applied to the first
>> table before the join is attempted with the second table. Is that how
>> Phoenix works?
>>
>>
>>
>> I have also discovered that I get different results (with “Limit Rows”
>> turned on) depending on which order I join the tables:
>>
>>
>>
>> *SELECT count(*) FROM “mytable1” hc*
>>
>> *INNER JOIN “mytable2” bs*
>>
>> *On hc.”myId” = bs.”myId”*
>>
>>
>>
>> Gives me a very different number of results than:
>>
>>
>>
>> *SELECT count(*) FROM “mytable2” bs*
>>
>> *INNER JOIN “mytable1” hc*
>>
>> *On hc.”myId” = bs.”myId”*
>>
>>
>>
>>
>>
>> Unfortunately I cannot test whether I get the same number of results with
>> “Limit Rows” turned off because my query times out! So I am now looking at
>> creating secondary indexes on the “myId” column in both tables to see if I
>> am able to do this join quicker. Does a join like this use a lot of memory
>> on server side? Is something likely to be running out of resources?
>>
>>
>>
>> Many thanks again for your time.
>>
>>
>>
>> Cheers,
>>
>> Matt
>>
>>
>>
>>
>>
>> *From:* Ciureanu, Constantin (GfK) [mailto:Constantin.Ciureanu@gfk.com
>> <javascript:_e(%7B%7D,'cvml','Constantin.Ciureanu@gfk.com');>]
>> *Sent:* 20 February 2015 14:40
>> *To:* user@phoenix.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>
>> *Subject:* RE: Inner Join not returning any results in Phoenix
>>
>>
>>
>> Hi Matthew,
>>
>>
>>
>> Is it working without the quotes “ / *"*  ?   (I see you are using 2
>> types of quotes, weird)
>>
>> I guess that’s not needed, and probably causing troubles.   I don’t have
>> to use quotes anyway.
>>
>>
>>
>> Alternatively check the types of data in those 2 tables (if the field
>> types are not the same in both tables, the join will not work).
>>
>>
>>
>> Good luck,
>>
>>   Constantin
>>
>>
>>
>> *From:* Matthew Johnson [mailto:matt.johnson@algomi.com
>> <javascript:_e(%7B%7D,'cvml','matt.johnson@algomi.com');>]
>> *Sent:* Friday, February 20, 2015 12:54 PM
>> *To:* user@phoenix.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>
>> *Subject:* Inner Join not returning any results in Phoenix
>>
>>
>>
>> Hi guys,
>>
>>
>>
>> I’m a little bit stuck with doing an Inner Join with Phoenix. I set up
>> one environment, created tables in HBase, and then created views (rather
>> than tables) in Phoenix, and am able to query as expected (when I join my
>> two tables I see results). I’ve just promoted to another environment, with
>> the exact same setup, but my Inner Join returns no results!
>>
>>
>>
>> I run the following two individual queries:
>>
>>
>>
>> *SELECT * FROM "mytable1" hc*
>>
>> *where hc."myId" = 'XS0'*
>>
>>
>>
>> *SELECT * FROM "mytable2" bs*
>>
>> *where bs."myId" = 'XS0'*
>>
>>
>>
>> And both of these queries give results. But when I run:
>>
>>
>>
>> *SELECT * FROM "mytable1" hc*
>>
>> *INNER JOIN “mytable2” bs*
>>
>> *On hc."myId" = bs.”myId”*
>>
>>
>>
>> I get no results. I also get no results if I try:
>>
>>
>>
>> *SELECT * FROM "mytable1" hc*
>>
>> *where hc."myId" in (select distinct “myId” from “mytable2”)*
>>
>>
>>
>> I have checked in HBase shell and can see the “myId” value is as expected
>> (XS0 in both tables). I am not sure if there are any logs that I can look
>> at to get some insight?
>>
>>
>>
>> Many thanks in advance for any suggestions!
>>
>>
>>
>> Cheers,
>>
>> Matt
>>
>>
>>
>> *PS* Something that may or may not be of note: In the environments I am
>> using:
>>
>> WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar*
>>
>> FAILING:     *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar*
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Mime
View raw message