phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abe Weinograd <...@flonet.com>
Subject Re: Inner Join not returning any results in Phoenix
Date Tue, 24 Feb 2015 19:34:29 GMT
Thanks!

On Tue, Feb 24, 2015 at 1:25 PM, Matthew Johnson <matt.johnson@algomi.com>
wrote:

> Legend! Thanks :-)
>
>
>
>
>
> *From:* Maryann Xue [mailto:maryann.xue@gmail.com]
> *Sent:* 24 February 2015 18:15
>
> *To:* user@phoenix.apache.org
> *Subject:* Re: Inner Join not returning any results in Phoenix
>
>
>
> Thanks, James and Matt!
>
>
>
> I successfully repro the bug using setMaxRow(). Adding a LIMIT to the
> query itself won't cause such a bug.
>
>
>
> Please watch on https://issues.apache.org/jira/browse/PHOENIX-1680. I
> will post a patch right away.
>
>
>
>
>
> Maryann
>
>
>
>
>
> On Tue, Feb 24, 2015 at 12:57 PM, Matthew Johnson <matt.johnson@algomi.com>
> wrote:
>
> Thanks James.
>
>
>
> Maryann, with the upserts I sent, I run the following code (using
> phoenix-client-4.2.2.jar):
>
>
>
>                              // Register JDBC Driver
>
>                              Class.*forName*(
> "org.apache.phoenix.jdbc.PhoenixDriver").newInstance();
>
>
>
>                              Connection conn = DriverManager.
> *getConnection*("jdbc:phoenix:box1,box2,box3", "", "");
>
>
>
>                              // Create a Statement class to execute the
> SQL statement
>
>                              Statement stmtLimited =
> conn.createStatement();
>
>                              stmtLimited.setMaxRows(10);
>
>
>
>                              // Execute the SQL statement and get the
> results in a *Resultset*
>
>                              ResultSet rsLimited =
> stmtLimited.executeQuery("select * from mytable1 m1 inner join mytable2
> m2 on m1.firstletter = m2.firstletter where m1.firstletter = 'Z'");
>
>                              *int* rsLimitedCount = 0;
>
>                              *while*(rsLimited.next()) {
>
>                                       rsLimitedCount++;
>
>                              }
>
>
>
>                              // Create a Statement class to execute the
> SQL statement
>
>                              Statement stmtNoLimit =
> conn.createStatement();
>
>
>
>                              // Execute the SQL statement and get the
> results in a *Resultset*
>
>                              ResultSet rsNoLimit =
> stmtNoLimit.executeQuery("select * from mytable1 m1 inner join mytable2
> m2 on m1.firstletter = m2.firstletter where m1.firstletter = 'Z'");
>
>                              *int* rsNoLimitedCount = 0;
>
>                              *while*(rsNoLimit.next()) {
>
>                                       rsNoLimitedCount++;
>
>                              }
>
>
>
>                              System.*out*.println("Results found when
> LIMIT 10 was " + rsLimitedCount + " but with NO LIMIT was " +
> rsNoLimitedCount);
>
>
>
>
>
> And the output is:
>
>
>
> Results found when LIMIT 10 was 0 but with NO LIMIT was 1
>
>
>
>
>
> Hope that helps!
>
>
>
> Cheers,
>
> Matt
>
>
>
>
>
> *From:* James Taylor [mailto:jamestaylor@apache.org]
> *Sent:* 24 February 2015 17:27
>
>
> *To:* user
> *Subject:* Re: Inner Join not returning any results in Phoenix
>
>
>
> Mary,
>
> You can just call Statement.setMaxRows(10) to emulate what SQuirrel is
> doing.
>
> Thanks,
>
> James
>
>
>
> On Tue, Feb 24, 2015 at 9:09 AM, Matthew Johnson <matt.johnson@algomi.com>
> wrote:
>
> Hi Maryann,
>
>
>
> I have two environments, one with just some dummy data, and one with real
> data in it. The dummy data one gave me results when I queried and the real
> data didn’t, but turns out that is just because the dummy data was much
> more coherent (because it was manually created for a specific test) so the
> RHS table always matches the LHS and therefore the join gives me results.
>
>
>
> I have attached a script that demonstrates my problem (create 2 Phoenix
> tables, insert some rows, and run a query using a join). When I run this on
> my cluster, I consistently see the issue I am having. If I set the LIMIT in
> Squirrel to 10, I get no results, but if I set it to 26, I get a result
> (since I have 26 rows in the RHS table and I am intentionally querying for
> the last one).
>
>
>
> Please give it a go and see if it reproduces for you – are you using
> Squirrel? If so, what version?
>
>
>
> Thanks!
>
> Matt
>
>
>
>
>
> *From:* Maryann Xue [mailto:maryann.xue@gmail.com]
> *Sent:* 24 February 2015 16:41
>
>
> *To:* user@phoenix.apache.org
> *Subject:* Re: Inner Join not returning any results in Phoenix
>
>
>
> Hi Matt,
>
>
>
> I just noticed these lines in your very first message:
>
>
>
> *PS* Something that may or may not be of note: In the environments I am
> using:
>
> WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar*
>
> FAILING:     *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar*
>
>
>
> What do you mean by WORKING and FAILING?
>
>
>
> I still cannot reproduce the bug here. Could you please post DDLs you used
> for related tables?
>
>
>
>
>
> Thanks,
>
> Maryann
>
>
>
>
>
>
>
> On Tue, Feb 24, 2015 at 11:27 AM, James Taylor <jamestaylor@apache.org>
> wrote:
>
> FYI, SQuirrel sets the max rows to return as 100. You can change this in
> the tool, though.
>
>
>
> On Tuesday, February 24, 2015, Maryann Xue <maryann.xue@gmail.com> wrote:
>
> Thanks a lot, Matt, for the reply! Very helpful. "*SERVER FILTER BY
> PageFilter 100*" does look like a but here. I will try again to reproduce
> it.
>
>
>
>
>
> Thanks,
>
> Maryann
>
>
>
> On Tue, Feb 24, 2015 at 6:07 AM, Matthew Johnson <matt.johnson@algomi.com>
> wrote:
>
> Hi Maryann,
>
>
>
> Thanks for that - I will schedule an update to the latest version of
> Phoenix then for later this week (and try out the merge-join hints).
>
>
>
> In the meantime, here are my explain plans:
>
>
>
> *JOIN WITH NO SQUIRREL LIMIT*
>
>
>
> *PLAN*
>
> *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1*
>
> *    PARALLEL INNER-JOIN TABLE 0*
>
> *        CLIENT 3-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2*
>
>
>
> *JOIN WITH SQUIRREL LIMIT 100*
>
>
>
> *PLAN*
>
> *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1*
>
> *CLIENT 100 ROW LIMIT*
>
> *    PARALLEL INNER-JOIN TABLE 0*
>
> *        CLIENT 3-CHUNK SERIAL 1-WAY FULL SCAN OVER mytable2*
>
> *            SERVER FILTER BY PageFilter 100*
>
> *            SERVER 100 ROW LIMIT*
>
> *        CLIENT 100 ROW LIMIT*
>
>
>
>
>
> I’m not really sure how to read that, but it does seem to suggest that
> ‘mytable2’ is being limited to 100 – thoughts?
>
>
>
> Cheers,
>
> Matt
>
>
>
> *From:* Maryann Xue [mailto:maryann.xue@gmail.com <maryann.xue@gmail.com>]
>
> *Sent:* 23 February 2015 18:10
>
>
> *To:* user@phoenix.apache.org
> *Subject:* Re: Inner Join not returning any results in Phoenix
>
>
>
> Hi Matt,
>
>
>
> Yes, the upgrade is as easy as that. I believe things will work fine with
> existing tables.
>
> I tried with a similar query but didn't see that it was a Phoenix bug. So
> could you please try the following explain statement and see the execution
> plan:
>
>
>
> EXPLAIN *SELECT * FROM "mytable1" hc*
>
> *INNER JOIN “mytable2” bs*
>
> *On hc."myId" = bs.”myId”*
>
>
>
>
>
> Thanks,
>
> Maryann
>
>
>
>
>
> On Fri, Feb 20, 2015 at 1:09 PM, Matthew Johnson <matt.johnson@algomi.com>
> wrote:
>
> Hi Maryann,
>
>
>
> That’s a good point – I am using 4.2.2, so if that feature is 4.3+ then
> that would explain why it’s not working. Is upgrading versions of Phoenix
> as simple as removing the previous jar from HBase lib folder and dropping
> the new Phoenix jar in (and restarting HBase)? Will all the existing
> Phoenix tables and views be backwards-compatible and work with the new
> version?
>
>
>
> Cheers,
>
> Matt
>
>
>
>
>
> *From:* Maryann Xue [mailto:maryann.xue@gmail.com <maryann.xue@gmail.com>]
>
> *Sent:* 20 February 2015 17:46
>
>
> *To:* user@phoenix.apache.org
> *Subject:* Re: Inner Join not returning any results in Phoenix
>
>
>
> Which version of Phoenix are you using, Matt? This feature is only
> available in the latest releases of 4.3/3.3.
>
>
>
> On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson <matt.johnson@algomi.com>
> wrote:
>
> Hi Maryann,
>
>
>
> Unfortunately my two tables are roughly the same size (~500k), but I have
> tested a different join where one table is ~500k and the other is ~20k and
> putting the larger one first is definitely far more performant. I believe
> you are right about running out of memory, I can see this repeated a few
> times in the region server logs followed by what appears to be a restart or
> disconnect:
>
>
>
> *[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host
> machine (eg GC): pause of approximately 1083ms*
>
>
>
> I have been looking at the Phoenix page on joins (
> http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge
> joins for large tables by using a hint. I have tried this though with no
> success:
>
>
>
> *SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc*
>
> *INNER JOIN “mytable2” bs*
>
> *On hc.”myId” = bs.”myId”*
>
>
>
> Am I putting the hint in the wrong place? Does it need to go next to the
> JOIN rather than the SELECT?
>
>
>
> I will try increasing the memory available to the Region Servers as well
> to see if that helps.
>
>
>
> Thanks!
>
> Matt
>
>
>
>
>
> *From:* Maryann Xue [mailto:maryann.xue@gmail.com <maryann.xue@gmail.com>]
>
> *Sent:* 20 February 2015 16:28
> *To:* user@phoenix.apache.org
>
>
> *Subject:* Re: Inner Join not returning any results in Phoenix
>
>
>
> Hi Matt,
>
>
>
> The error you got with "Limit Rows" off might be related to insufficient
> memory on region servers for one of your tables. Which is the larger table
> between table1 and table2? You might want to try putting the larger table
> as the first table in your join query and see if it works.
>
>
>
> And I will quickly check if the LIMIT problem is a Phoenix bug and will
> keep you posted.
>
>
>
>
>
> Thanks,
>
> Maryann
>
>
>
>
>
> On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson <matt.johnson@algomi.com>
> wrote:
>
> Hi Abe,
>
>
>
> Glad to hear I’m not alone! Will try and figure out exactly what’s
> happening and maybe raise a Jira :-)
>
>
>
>
>
> @Constantin – I have tried with and without the “Limit Rows” – but without
> it, and without any indexes, the query runs for a while (about 10 minutes?)
> and then throws an error:
>
>
>
> *Error: Encountered exception in sub plan [0] execution.*
>
>
>
> Which I’m guessing is either HBase or Zookeeper timeout. The weird thing
> is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit
> Rows” does not affect any aggregate functions like ‘count’, because the
> actual number of result rows for a count is just 1 row (the count itself).
> But in HBase it seems that the Row Limit, as Abe mentioned, is applied to
> one of the table BEFORE it does the join, so it affects the results of the
> ‘count’ function.
>
>
>
> When I try to create my indexes so I am able to do the join without Row
> Limit, I get the following error:
>
>
>
> ERROR 1029 (42Y88): Mutable secondary indexes must have the
> hbase.regionserver.wal.codec property set to
> org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the
> hbase-sites.xml of every region server
>
>
>
> Which I am happy to do (will have to wait until outside of business hours
> though), but I am curious, will this have any impact on the rest of my
> cluster and could it have any unforeseen consequences?
>
>
>
> Thanks again for the input!
>
>
>
> Cheers,
>
> Matt
>
>
>
>
>
> *From:* Ciureanu, Constantin (GfK) [mailto:Constantin.Ciureanu@gfk.com
> <Constantin.Ciureanu@gfk.com>]
> *Sent:* 20 February 2015 15:48
>
>
> *To:* user@phoenix.apache.org
> *Subject:* RE: Inner Join not returning any results in Phoenix
>
>
>
> Hello Matt,
>
>
>
> http://codingclues.eu/2008/the-squirrel-100-rows-problem/
>
>
>
> Can you please test again after unchecking “Contents- Limit rows” and “SQL
> – Limit rows”?
>
>
>
> [image: SQL tab]
>
>
>
> P.S. Off-topic – it’s as funny as this “problem” (not possible to send an
> email for more than 500 miles away J
> http://www.ibiblio.org/harris/500milemail.html )
>
>
>
> Regards,
>
>   Constantin
>
>
>
> *From:* Abe Weinograd [mailto:abe@flonet.com <abe@flonet.com>]
> *Sent:* Friday, February 20, 2015 4:18 PM
> *To:* user
> *Subject:* Re: Inner Join not returning any results in Phoenix
>
>
>
> Matt,
>
>
>
> I have seen this same issue.  When passing a LIMIT to a query with joins
> (most query tools do it implicitly), Phoenix seems to apply that to the
> table on the right of the join I believe.  I hadn't had a chance to play
> with it more and file a JIRA, but what you are describing is consistent
> with what I have seen.
>
>
>
> Abe
>
>
>
> On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson <matt.johnson@algomi.com>
> wrote:
>
> Hi Constantin,
>
>
>
> Many thanks for your reply – the quotes were both of the same type (double
> quotes for table and column names, single quotes for string literals), it
> is just my email client that formatted them weirdly, sorry!
>
>
>
> I have discovered what I believe is an important piece of the puzzle to my
> problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has
> a “Limit Rows” feature. When I try and count the number of rows in a single
> table:
>
>
>
> *select count(*) from “mytable1”*
>
>
>
> I get the expected number of results (eg 20,000). But when I join two
> tables together, it seems that the “Limit Rows” from Squirrel is somehow
> being applied before the join is performed, and if “Limit Rows” is set to
> 100 I get 100 results or less. If the inner join is quite sparse (eg 20,000
> rows in a table but only 100 of these will join with a second table) then I
> believe it tries to join the first 100 it finds and returns no results. In
> my experience of Oracle or MySQL, joins are done entirely on server side
> and then you just get back the number of rows you limited, rather than what
> appears to be happening which is the row limit is applied to the first
> table before the join is attempted with the second table. Is that how
> Phoenix works?
>
>
>
> I have also discovered that I get different results (with “Limit Rows”
> turned on) depending on which order I join the tables:
>
>
>
> *SELECT count(*) FROM “mytable1” hc*
>
> *INNER JOIN “mytable2” bs*
>
> *On hc.”myId” = bs.”myId”*
>
>
>
> Gives me a very different number of results than:
>
>
>
> *SELECT count(*) FROM “mytable2” bs*
>
> *INNER JOIN “mytable1” hc*
>
> *On hc.”myId” = bs.”myId”*
>
>
>
>
>
> Unfortunately I cannot test whether I get the same number of results with
> “Limit Rows” turned off because my query times out! So I am now looking at
> creating secondary indexes on the “myId” column in both tables to see if I
> am able to do this join quicker. Does a join like this use a lot of memory
> on server side? Is something likely to be running out of resources?
>
>
>
> Many thanks again for your time.
>
>
>
> Cheers,
>
> Matt
>
>
>
>
>
> *From:* Ciureanu, Constantin (GfK) [mailto:Constantin.Ciureanu@gfk.com
> <Constantin.Ciureanu@gfk.com>]
> *Sent:* 20 February 2015 14:40
> *To:* user@phoenix.apache.org
> *Subject:* RE: Inner Join not returning any results in Phoenix
>
>
>
> Hi Matthew,
>
>
>
> Is it working without the quotes “ / *"*  ?   (I see you are using 2
> types of quotes, weird)
>
> I guess that’s not needed, and probably causing troubles.   I don’t have
> to use quotes anyway.
>
>
>
> Alternatively check the types of data in those 2 tables (if the field
> types are not the same in both tables, the join will not work).
>
>
>
> Good luck,
>
>   Constantin
>
>
>
> *From:* Matthew Johnson [mailto:matt.johnson@algomi.com
> <matt.johnson@algomi.com>]
> *Sent:* Friday, February 20, 2015 12:54 PM
> *To:* user@phoenix.apache.org
> *Subject:* Inner Join not returning any results in Phoenix
>
>
>
> Hi guys,
>
>
>
> I’m a little bit stuck with doing an Inner Join with Phoenix. I set up one
> environment, created tables in HBase, and then created views (rather than
> tables) in Phoenix, and am able to query as expected (when I join my two
> tables I see results). I’ve just promoted to another environment, with the
> exact same setup, but my Inner Join returns no results!
>
>
>
> I run the following two individual queries:
>
>
>
> *SELECT * FROM "mytable1" hc*
>
> *where hc."myId" = 'XS0'*
>
>
>
> *SELECT * FROM "mytable2" bs*
>
> *where bs."myId" = 'XS0'*
>
>
>
> And both of these queries give results. But when I run:
>
>
>
> *SELECT * FROM "mytable1" hc*
>
> *INNER JOIN “mytable2” bs*
>
> *On hc."myId" = bs.”myId”*
>
>
>
> I get no results. I also get no results if I try:
>
>
>
> *SELECT * FROM "mytable1" hc*
>
> *where hc."myId" in (select distinct “myId” from “mytable2”)*
>
>
>
> I have checked in HBase shell and can see the “myId” value is as expected
> (XS0 in both tables). I am not sure if there are any logs that I can look
> at to get some insight?
>
>
>
> Many thanks in advance for any suggestions!
>
>
>
> Cheers,
>
> Matt
>
>
>
> *PS* Something that may or may not be of note: In the environments I am
> using:
>
> WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar*
>
> FAILING:     *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar*
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Mime
View raw message