phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Wang <davidwang...@gmail.com>
Subject Re: Cannot run query containing inner join on Phoenix 3.0.0 when data size increased from 10 MB to 1 GB
Date Tue, 25 Feb 2014 01:17:53 GMT
Hi Maryann,

Below is the original TPC-H Query 5 before I translated it to phoenix-style:

select
   n_name,
   sum(l_extendedprice * (1 - l_discount)) as revenue
from
   customer,
   orders,
   lineitem,
   supplier,
   nation,
   region
where
   c_custkey = o_custkey
   and l_orderkey = o_orderkey
   and l_suppkey = s_suppkey
   and c_nationkey = s_nationkey
   and s_nationkey = n_nationkey
   and n_regionkey = r_regionkey
   and r_name = '[REGION]'
   and o_orderdate >= date '[DATE]'
   and o_orderdate < date '[DATE]' + interval '1' year
group by
   n_name
order by
   revenue desc;

Thanks,

David


On Mon, Feb 24, 2014 at 7:58 PM, David Wang <davidwang400@gmail.com> wrote:

> Hi Maryann,
>
> Could you please explain what is the complete test case is?
>
> I just try to run the query from tpc-h query #5, which I provided in an
> earlier email.  I just wanted to make sure the inner join works on phoenix.
>
> Thank you so much,
>
> David
>
>
> On Mon, Feb 24, 2014 at 7:52 PM, Maryann Xue <maryann.xue@gmail.com>wrote:
>
>> Could you please give the complete test case so that I can repo the issue?
>>
>>
>> On Mon, Feb 24, 2014 at 7:48 PM, David Wang <davidwang400@gmail.com>wrote:
>>
>>> Hi Maryann,
>>>
>>> I tried moving the lineitem table (the largest table) to the first place
>>> in the query below:
>>> select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from
>>> lineitem inner join orders on l_orderkey = o_orderkey inner join supplier
>>> on l_suppkey = s_suppkey inner join customer on c_nationkey = s_nationkey
>>> and c_custkey = o_custkey inner join nation on s_nationkey = n_nationkey
>>> inner join region on n_regionkey = r_regionkey where r_name = 'AMERICA' and
>>> o_orderdate >= '1993-01-01' and o_orderdate < '1994-01-01' group by n_name
>>> order by revenue desc
>>>
>>>
>>> But when I execute I get the following error:
>>>
>>> java.lang.RuntimeException:
>>> com.salesforce.phoenix.exception.PhoenixIOException:
>>> com.salesforce.phoenix.exception.PhoenixIOException: Failed after
>>> attempts=14, exceptions:
>>> Mon Feb 24 19:36:50 EST 2014,
>>> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
>>> java.io.IOException: java.io.IOException: Could not find hash cache for
>>> joinId: �@2[]
>>> Mon Feb 24 19:36:51 EST 2014,
>>> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
>>> java.io.IOException: java.io.IOException: Could not find hash cache for
>>> joinId: �@2[]
>>> Mon Feb 24 19:36:52 EST 2014,
>>> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
>>> java.io.IOException: java.io.IOException: Could not find hash cache for
>>> joinId: �@2[]
>>> Mon Feb 24 19:36:54 EST 2014,
>>> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
>>> java.io.IOException: java.io.IOException: Could not find hash cache for
>>> joinId: �@2[]
>>> Mon Feb 24 19:36:56 EST 2014,
>>> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
>>> java.io.IOException: java.io.IOException: Could not find hash cache for
>>> joinId: �@2[]
>>> Mon Feb 24 19:37:00 EST 2014,
>>> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
>>> java.io.IOException: java.io.IOException: Could not find hash cache for
>>> joinId: �@2[]
>>> Mon Feb 24 19:37:04 EST 2014,
>>> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
>>> java.io.IOException: java.io.IOException: Could not find hash cache for
>>> joinId: �@2[]
>>> Mon Feb 24 19:37:12 EST 2014,
>>> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
>>> java.io.IOException: java.io.IOException: Could not find hash cache for
>>> joinId: �@2[]
>>> Mon Feb 24 19:37:28 EST 2014,
>>> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
>>> java.io.IOException: java.io.IOException: Could not find hash cache for
>>> joinId: �@2[]
>>> Mon Feb 24 19:38:00 EST 2014,
>>> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
>>> java.io.IOException: java.io.IOException: Could not find hash cache for
>>> joinId: i?�0��
>>> Mon Feb 24 19:39:05 EST 2014,
>>> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
>>> java.io.IOException: java.io.IOException: Could not find hash cache for
>>> joinId: i?�0��
>>> Mon Feb 24 19:40:09 EST 2014,
>>> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
>>> java.io.IOException: java.io.IOException: Could not find hash cache for
>>> joinId: i?�0��
>>> Mon Feb 24 19:41:13 EST 2014,
>>> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
>>> java.io.IOException: java.io.IOException: Could not find hash cache for
>>> joinId: i?�0��
>>> Mon Feb 24 19:42:18 EST 2014,
>>> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
>>> java.io.IOException: java.io.IOException: Could not find hash cache for
>>> joinId: i?�0��
>>>
>>>         at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2440)
>>>         at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2074)
>>>         at sqlline.SqlLine.print(SqlLine.java:1735)
>>>         at sqlline.SqlLine$Commands.execute(SqlLine.java:3683)
>>>         at sqlline.SqlLine$Commands.sql(SqlLine.java:3584)
>>>         at sqlline.SqlLine.dispatch(SqlLine.java:821)
>>>         at sqlline.SqlLine.begin(SqlLine.java:699)
>>>         at sqlline.SqlLine.mainWithInputRedirection(SqlLine.java:441)
>>>         at sqlline.SqlLine.main(SqlLine.java:424)
>>>
>>>
>>> Do you know of any way I can fix this?
>>>
>>> Thank you so much,
>>>
>>> David
>>>
>>>
>>> On Mon, Feb 24, 2014 at 7:02 PM, Maryann Xue <maryann.xue@gmail.com>wrote:
>>>
>>>> Hi David,
>>>>
>>>> Since join is implemented through hash-join and table stats are
>>>> currently unavailable yet to help better decide the join order (and which
>>>> tables should be small enough to go into memory), users may sometimes have
>>>> to be careful about the table order in the query.
>>>> In your case, you can try moving the lineitem table (which is the
>>>> largest here) to the first place in the query. Think that should work.
>>>>
>>>>
>>>> Thanks,
>>>> Maryann
>>>>
>>>>
>>>>
>>>> On Mon, Feb 24, 2014 at 6:08 PM, David Wang <davidwang400@gmail.com>wrote:
>>>>
>>>>> Hi Maryann,
>>>>>
>>>>> The sizes of each table in my query are as follows:
>>>>> customer - 24 MB
>>>>> orders - 164 MB
>>>>> lineitem - 725 MB
>>>>> supplier - 1.4 MB
>>>>> nation - 2.2 KB
>>>>> region - 400 B
>>>>>
>>>>> The heap size of my region servers is 4 GB.
>>>>>
>>>>> Thank you,
>>>>> David
>>>>>
>>>>>
>>>>> On Mon, Feb 24, 2014 at 5:58 PM, Maryann Xue <maryann.xue@gmail.com>wrote:
>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> What are sizes of each table in your query? And what is the heap
size
>>>>>> of your region server?
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Maryann
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 24, 2014 at 5:50 PM, David Wang <davidwang400@gmail.com>wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I had a question about running queries containing inner joins
on
>>>>>>> phoenix, but wasn't sure where to post.
>>>>>>> I downloaded the phoenix-master (3.0.0), and inserted 10 MB of
data
>>>>>>> through psql.sh.  I found I was able to run a query,
>>>>>>> which contains an inner join in sqlline.
>>>>>>>
>>>>>>> The query I tried:
>>>>>>>
>>>>>>> select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue
>>>>>>> from customer inner join orders on c_custkey = o_custkey inner
join
>>>>>>> lineitem on l_orderkey = o_orderkey inner join supplier on l_suppkey
=
>>>>>>> s_suppkey and c_nationkey = s_nationkey inner join nation on
s_nationkey =
>>>>>>> n_nationkey inner join region on n_regionkey = r_regionkey where
r_name =
>>>>>>> 'AMERICA' and o_orderdate >= '1993-01-01' and o_orderdate
< '1994-01-01'
>>>>>>> group by n_name order by revenue desc;
>>>>>>>
>>>>>>> But when I increased the data size from 10 MB to 1 GB, and try
to
>>>>>>> run the same query, I get the following error:
>>>>>>> Error: Encountered exception in hash plan execution. (state=,code=0)
>>>>>>>
>>>>>>> I have included the path to snappy-java-1.0.3.2.jar in sqlline.sh
>>>>>>> like the following:
>>>>>>> java -cp
>>>>>>> ".:$phoenix_client_jar:/usr/local/hbase-0.94.8/lib/snappy-java-1.0.3.2.jar"
>>>>>>>  -Dlog4j.configuration=file:$current_dir/log4j.properties
>>>>>>> sqlline.SqlLine
>>>>>>> -d com.salesforce.phoenix.jdbc.PhoenixDriver -u jdbc:phoenix:$1
-n
>>>>>>> none -p
>>>>>>> none --color=true --fastConnect=false --verbose=true
>>>>>>> --isolation=TRANSACTION_READ_COMMITTED $sqlfile
>>>>>>>
>>>>>>> Do you have any idea where the problem might be?
>>>>>>>
>>>>>>> Thank you so much,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks,
>>>>>> Maryann
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> Maryann
>>>>
>>>
>>>
>>
>>
>> --
>> Thanks,
>> Maryann
>>
>
>

Mime
View raw message