phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Wang <davidwang...@gmail.com>
Subject Cannot run query containing inner join on Phoenix 3.0.0 when data size increased from 10 MB to 1 GB
Date Sat, 01 Mar 2014 04:56:36 GMT
Hi,


I successfully ran a query containing an inner join in Phoenix 3.0 on a 10
MB data set.

But when I increased the data size from 10 MB to 1 GB, and try to run the
same query, I get an error.  I summarized each step of what I did and the
error I encountered.  I would appreciate any help or advice.


1).  Below is the original TPC-H Query 5 before I translated it to
phoenix-style:

select
   n_name,
   sum(l_extendedprice * (1 - l_discount)) as revenue
from
   customer,
   orders,
   lineitem,
   supplier,
   nation,
   region
where
   c_custkey = o_custkey
   and l_orderkey = o_orderkey
   and l_suppkey = s_suppkey
   and c_nationkey = s_nationkey
   and s_nationkey = n_nationkey
   and n_regionkey = r_regionkey
   and r_name = '[REGION]'
   and o_orderdate >= date '[DATE]'
   and o_orderdate < date '[DATE]' + interval '1' year
group by
   n_name
order by
   revenue desc;


2). The sizes of each table in my query are as follows:

lineitem - 725 MB
orders - 164 MB
customer - 24 MB
supplier - 1.4 MB
nation - 2.2 KB
region - 400 B
The heap size of my region servers is 4 GB.

3). I modified this statement to following according to Maryann's
suggestion (which was to place the largest table first):

select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue
from lineitem inner join orders on l_orderkey = o_orderkey
                   inner join supplier on l_suppkey = s_suppkey
                   inner join customer on c_nationkey = s_nationkey and
c_custkey = o_custkey
                   inner join nation on s_nationkey = n_nationkey
                   inner join region on n_regionkey = r_regionkey
where r_name = 'AMERICA' and o_orderdate >= '1993-01-01' and o_orderdate <
'1994-01-01'
group by n_name order by revenue desc

4).When I execute at very first time I get the following error:

java.lang.RuntimeException:
com.salesforce.phoenix.exception.PhoenixIOException:
com.salesforce.phoenix.exception.PhoenixIOException: Failed after
attempts=14, exceptions:
Mon Feb 24 19:36:50 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:36:51 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:36:52 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:36:54 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:36:56 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:37:00 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:37:04 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:37:12 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:37:28 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:38:00 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId: i? 0
Mon Feb 24 19:39:05 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId: i? 0
Mon Feb 24 19:40:09 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId: i? 0
Mon Feb 24 19:41:13 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId: i? 0
Mon Feb 24 19:42:18 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId: i? 0

        at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2440)
        at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2074)
        at sqlline.SqlLine.print(SqlLine.java:1735)
        at sqlline.SqlLine$Commands.execute(SqlLine.java:3683)
        at sqlline.SqlLine$Commands.sql(SqlLine.java:3584)
        at sqlline.SqlLine.dispatch(SqlLine.java:821)
        at sqlline.SqlLine.begin(SqlLine.java:699)
        at sqlline.SqlLine.mainWithInputRedirection(SqlLine.java:441)
        at sqlline.SqlLine.main(SqlLine.java:424)

5).I re-execute at 2nd time I got the result that is correct with the
solution.

My cluster settting is one master and three slaves. Each machine has
8-cores and 8-GB RAM. A total of 1 GB data was distributed in three slaves
and running in three machines (monitoring by top command on each machine).

Thank you so much,

David

Mime
View raw message