phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacobo Coll <jacobo...@gmail.com>
Subject Unable to filter using local indexes
Date Tue, 15 Dec 2015 13:48:02 GMT
Hi,

I recently started testing HBase/Phoenix as a storage solution for our
data. The problem is that I am not able to execute some "simple queries". I
am using phoenix 4.5.2 and hbase 1.0.0-cdh5.4.0. After creating a table and
making some selects (full scan), I started using local indexes to
accelerate it. As I know, global indexes only work for covered queries, but
"Unlike global indexes, local indexes *will* use an index even when all
columns referenced in the query are not contained in the index". Knowing
this, I prepared this simple test where the query will use one index and
filter the result using other columns:

-- Create table
create table test_table (mykey varchar primary key, col1 varchar, col2
varchar);
create local index idx2 on test_table (col2);
upsert into test_table (mykey, col1, col2) values('k1', 'v1-1', 'v1-2');
upsert into test_table (mykey, col1, col2) values('k2', 'v2-1', 'v2-2');

-- select using the index
select * from test_table where col2 = 'v1-2';
explain select * from test_table where col2 = 'v1-2';

-- select using the index and filtering the result, my goal
select * from test_table where col2 = 'v1-2' and col1 = 'v1-1';
explain select * from test_table where col2 = 'v1-2' and col1 = 'v1-1';


The first query is using the index, as it's seen in the explanation:
CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER _LOCAL_IDX_TEST_TABLE
[-32768,'v1-2']
    SERVER FILTER BY FIRST KEY ONLY

But the second query is executing a table scan, instead of using the index
and then filtering:
CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TEST_TABLE
    SERVER FILTER BY (COL2 = 'v1-2' AND COL1 = 'v1-1')

I was expecting something like:
CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER _LOCAL_IDX_TEST_TABLE
[-32768,'v1-2']
    SERVER FILTER BY "COL1" = 'v1-1'

Some of the workarounds that I have tried:

- Include col1 in the index, and it works, but I expect to have many
columns, and the overhead is unaffordable.
create local index idx2inc1 on test_table (col2) include (col1);

- I had read in this Huawei slices about hindex (
http://events.linuxfoundation.org/sites/events/files/slides/ApacheCon_hindex_0.pdf)
that local indexes are supposed to work together, so I tried creating an
index on col1 to see if the query works:
create local index idx1 on test_table (col1);
Ideally this would be the best solution for my problem, so I can filter
using any column. But still not using any index.

- I also tried to force to use a hint, but throws an
ColumnNotFoundException:
select /*+ INDEX(TEST_TABLE IDX2) */ * from test_table where col2 = 'v1-2'
and col1 = 'v1-1';
Error: ERROR 504 (42703): Undefined column. columnName=COL1
(state=42703,code=504)
org.apache.phoenix.schema.ColumnNotFoundException: ERROR 504 (42703):
Undefined column. columnName=COL1
    at
org.apache.phoenix.compile.WhereCompiler$WhereExpressionCompiler.resolveColumn(WhereCompiler.java:190)
    at
org.apache.phoenix.compile.WhereCompiler$WhereExpressionCompiler.visit(WhereCompiler.java:169)
    at
org.apache.phoenix.compile.WhereCompiler$WhereExpressionCompiler.visit(WhereCompiler.java:156)
    at
org.apache.phoenix.parse.ColumnParseNode.accept(ColumnParseNode.java:56)
    at
org.apache.phoenix.parse.CompoundParseNode.acceptChildren(CompoundParseNode.java:64)
    .......
I can imagine that it's attempting to filter col1 directly from the index
table.

- I also tried to make a join with itself, but I don't think that this is a
good solution:
explain select mykey,col1,col2 from test_table join ( select mykey as k
from test_table where col2 = 'v1-2') as sq1 on mykey = sq1.k where col1 =
'v1-1'
CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TEST_TABLE
    SERVER FILTER BY COL1 = 'v1-1'
    PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)
        CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER _LOCAL_IDX_TEST_TABLE
[-32768,'v1-2']
            SERVER FILTER BY FIRST KEY ONLY
    DYNAMIC SERVER FILTER BY TEST_TABLE.MYKEY IN (SQ1.K)

Am I doing something wrong, or missing some step? Or it is impossible what
I am trying to do?
Is there some way to do this without include the other columns in the index?

Thanks,
Jacobo Coll

Mime
View raw message