phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Indexes
Date Mon, 03 Feb 2014 16:52:36 GMT
There will be an upgrade step required to go from 2.x to 3.0, as the system
table has changed (and probably will a bit more still before we release).

For now, you can do the following if you want to test out 3.0.0-snapshot:
- Remove com.salesforce.* coprocessors on existing tables. If you haven't
added any of your own, probably easiest to just remove all coprocessors.
- Re-issue your DDL commands. If you have existing data against that table,
it'd be best to open a connection at a timestamp earlier than any of your
data using the CURRENT_SCN connection property. If you don't care about
doing point-in-time queries at an earlier timestamp (or flash-back
queries), than you don't need to worry about this, and you can just
re-issue the DDL.

Thanks,
James


On Mon, Feb 3, 2014 at 8:40 AM, Justin Workman <justinjworkman@gmail.com>wrote:

> We updated to the 3.0.0-SNAPSHOT in an effort to also test the Flume
> component, and we are not able to query any of our existing tables now
> through sqlline or a java jdbc connection. However the Flume component
> works fine writing to a new table. Here is the error we are getting when
> doing a select count(1) from keywords;
>
> Error: org.apache.hadoop.hbase.DoNotRetryIOException: keywords: at index 4
> at
> com.salesforce.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:83)
>  at
> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:1034)
> at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
>  at org.apache.hadoop.hbase.regionserver.HRegion.exec(HRegion.java:5482)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.execCoprocessor(HRegionServer.java:3720)
>  at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:308)
>  at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> Caused by: java.lang.NullPointerException: at index 4
> at
> com.google.common.collect.ImmutableList.checkElementNotNull(ImmutableList.java:305)
>  at
> com.google.common.collect.ImmutableList.construct(ImmutableList.java:296)
> at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:272)
>  at com.salesforce.phoenix.schema.PTableImpl.init(PTableImpl.java:290)
> at com.salesforce.phoenix.schema.PTableImpl.<init>(PTableImpl.java:219)
>  at
> com.salesforce.phoenix.schema.PTableImpl.makePTable(PTableImpl.java:212)
> at
> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:436)
>  at
> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:254)
> at
> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:1082)
>  at
> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.addIndexToTable(MetaDataEndpointImpl.java:279)
> at
> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:430)
>  at
> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:254)
> at
> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:1082)
>  at
> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:1028)
> ... 10 more (state=08000,code=101)
>
>
> On Thu, Jan 30, 2014 at 4:01 PM, Justin Workman <justinjworkman@gmail.com>wrote:
>
>> I will test with the latest master build. When this table goes live we
>> will shorten the cf name, that was a mistake. Thanks for all the info. I do
>> think going forward we will be creating these tables via Phoenix. We are
>> still testing the flume sink and pig handlers before completely committing.
>>
>> I'll update the list once I've had a chance to test with the latest build
>> and file a Jira if the problem persists.
>>
>> Thanks!
>> Justin
>>
>> Sent from my iPhone
>>
>> On Jan 30, 2014, at 1:25 PM, James Taylor <jamestaylor@apache.org> wrote:
>>
>> Thanks for all the detail, Justin. Based on this, it looks like a bug
>> related to using case sensitive column names. Maryann checked in a fix
>> related to this, so it might be fixed in the latest on master.
>>
>> If it's not fixed, would you mind filing a JIRA?
>>
>> FWIW, you may want to consider a shorter column family name, like "k" or
>> "kw" as that'll make your table smaller. Also, did you know you can provide
>> your HBase table and column family config parameters in your CREATE TABLE
>> statement and it'll create the HBase table and the column families, like
>> below?
>>
>> CREATE TABLE SEO.KEYWORDIDEAS (
>>     "pk" VARCHAR PRIMARY KEY,
>>     "keyword"."jobId" VARCHAR,
>>     "keyword"."jobName" VARCHAR,
>>     "keyword"."jobType" VARCHAR,
>>     "keyword"."keywordText" VARCHAR,
>>     "keyword"."parentKeywordText" VARCHAR,
>>     "keyword"."refinementName" VARCHAR,
>>     "keyword"."refinementValue" VARCHAR,
>>     "keyword"."relatedKeywordRank" VARCHAR
>>     ) IMMUTABLE_ROWS=true, COMPRESSION='SNAPPY' ;
>>
>>
>>
>>
>> On Thu, Jan 30, 2014 at 8:50 AM, Justin Workman <justinjworkman@gmail.com
>> > wrote:
>>
>>> I don't think that is the issue we are hitting. Details are below. The
>>> Hbase table does have more columns than we are defining the phoenix table.
>>> We were hoping to just be able to use the dynamic column features for
>>> if/when we need to access data in other columns in the underlying table. As
>>> you can see from the output of the explain statement below, a simple query
>>> does not use the index.
>>>
>>> However if I create another identical table using Phoenix and upsert
>>> into that new table from the table below, create the same index on that
>>> table and then run the same select query, it does use the index on that
>>> table.
>>>
>>> So I am still very confused as to why the index is not invoked when the
>>> table is created on top of an existing Hbase table.
>>>
>>> Hbase Create Table
>>> > create 'SEO.KEYWORDIDEAS', { NAME=>'keyword', COMPRESSION=>'SNAPPY'
}
>>>
>>> Phoenix Create Table
>>> CREATE TABLE SEO.KEYWORDIDEAS (
>>>     "pk" VARCHAR PRIMARY KEY,
>>>     "keyword"."jobId" VARCHAR,
>>>     "keyword"."jobName" VARCHAR,
>>>     "keyword"."jobType" VARCHAR,
>>>     "keyword"."keywordText" VARCHAR,
>>>     "keyword"."parentKeywordText" VARCHAR,
>>>     "keyword"."refinementName" VARCHAR,
>>>     "keyword"."refinementValue" VARCHAR,
>>>     "keyword"."relatedKeywordRank" VARCHAR
>>>     ) IMMUTABLE_ROWS=true;
>>>
>>> Create Index
>>> CREATE INDEX KWDIDX ON SEO.KEYWORDIDEAS ("parentKeywordText");
>>>
>>> Show and count indexes
>>>
>>> +-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+-------------+--------+------------------+-----------+-----------+
>>> | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | NON_UNIQUE | INDEX_QUALIFIER |
>>> INDEX_NAME | TYPE | ORDINAL_POSITION | COLUMN_NAME | ASC_OR_DESC |
>>> CARDINALITY | PAGES  | FILTER_CONDITION | DATA_TYPE | TYPE_NAME |
>>>
>>> +-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+-------------+--------+------------------+-----------+-----------+
>>> | null      | SEO         | KEYWORDIDEAS | true       | null
>>>  | KWDIDX     | 3    | 1                | keyword:parentKeywordText | A
>>>       | null        | null   | null             | |
>>> | null      | SEO         | KEYWORDIDEAS | true       | null
>>>  | KWDIDX     | 3    | 2                | :pk         | A           | null
>>>        | null   | null             | 12        | V |
>>> | null      | SEO         | KEYWORDIDEAS | true       | null
>>>  | RA_TEST_ID | 3    | 1                | keyword:jobId | A           |
>>> null        | null   | null             | 12        | |
>>> | null      | SEO         | KEYWORDIDEAS | true       | null
>>>  | RA_TEST_ID | 3    | 2                | :pk         | A           | null
>>>        | null   | null             | 12        | V |
>>>
>>> +-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+-------------+--------+------------------+-----------+-----------+
>>>
>>> > select count(1) from seo.keywordideas;
>>> +----------+
>>> | COUNT(1) |
>>> +----------+
>>> | 423229   |
>>> +----------+
>>> > select count(1) from seo.kwdidx;
>>> +----------+
>>> | COUNT(1) |
>>> +----------+
>>> | 423229   |
>>> +----------+
>>>
>>> > explain select count(1) from seo.keywords where "parentKeywordText" =
>>> 'table';
>>> +------------+
>>> |    PLAN    |
>>> +------------+
>>> | CLIENT PARALLEL 18-WAY FULL SCAN OVER SEO.KEYWORDIDEAS |
>>> |     SERVER FILTER BY keyword.parentKeywordText = 'sheets' |
>>> |     SERVER AGGREGATE INTO SINGLE ROW |
>>> +------------+
>>>
>>> Now here is where I can get the index to be invoked.
>>> > CREATE TABLE SEO.NEW_KEYWORDIDEAS (
>>>     PK VARCHAR PRIMARY KEY,
>>>     JOB_ID VARCHAR
>>>     JOB_NAME VARCHAR,
>>>     JOB_TYPE VARCHAR,
>>>     KEYWORD_TEXT VARCHAR,
>>>     PARENT_KEYWORD_TEXT VARCHAR,
>>>     REFINEMENT_NAME VARCHAR,
>>>     REFINEMENT_VALUE VARCHAR,
>>>     RELATED_KEYWORD_RANK VARCHAR
>>>     ) IMMUTABLE_ROWS=true;
>>>
>>> > UPSERT INTO SEO.NEW_KEYWORDIEAS SELECT * FROM SEO.KEYWORDIDEAS;
>>>
>>> > CREATE INDEX NEW_KWD_IDX ON SEO.NEW_KEYWORDIDEAS (PARENT_KEYWORD_TEXT);
>>>
>>> > explain select count(1) from seo.new_keywordideas where
>>> parent_keyword_text = 'table';
>>>
>>> +------------+
>>>
>>> |    PLAN    |
>>>
>>> +------------+
>>>
>>> | CLIENT PARALLEL 1-WAY RANGE SCAN OVER SEO.NEW_KWD_IDX ['table'] |
>>>
>>> |     SERVER AGGREGATE INTO SINGLE ROW |
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jan 29, 2014 at 5:21 PM, James Taylor <jamestaylor@apache.org>wrote:
>>>
>>>> Hi Justin,
>>>> Please take a look at this FAQ:
>>>> http://phoenix.incubator.apache.org/faq.html#/Why_isnnullt_my_secondary_index_being_used
>>>>
>>>> If that's not the case for you, can you include your CREATE TABLE,
>>>> CREATE INDEX, SELECT statement, and EXPLAIN plan?
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>>
>>>> On Wed, Jan 29, 2014 at 4:13 PM, Justin Workman <
>>>> justinjworkman@gmail.com> wrote:
>>>>
>>>>> I am seeing some odd behavior with indexes and want some clarification
>>>>> on how they are used.
>>>>>
>>>>> When I create an table in phoenix on top of an existing Hbase table,
>>>>> and then create an index on this table, I can see the index get built
and
>>>>> populated properly, however no queries show that they are using this
index
>>>>> when I run an explain on the query.
>>>>>
>>>>> However, if I create an seperete table in Phoenix and do an upsert
>>>>> from my hbase table into the new table that I created, and create the
same
>>>>> index as on the previous table. Then my queries show that they would
use
>>>>> the index when running them through the explain plan.
>>>>>
>>>>> Are we not able to create or use an index on a table we create over an
>>>>> exiting Hbase table, or am I doing something wrong?
>>>>>
>>>>> Thanks in advance for any help.
>>>>> Justin
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message