phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: hint to use a global index is not working - need to find out why
Date Fri, 20 Apr 2018 20:29:49 GMT
Ron - Salting is only recommended when your primary key is monotonically
increasing. It's mainly used to prevent write hotspotting. Also, I think
Ron forgot to mention, but I was working with him a bit earlier on this,
and I couldn't repro the issue either (in current 4.x or in 4.7 release).
Here's the unit test I put together which hints a non covered global index:

    @Test
    public void testIndexHintWithNonCoveredColumnSelected() throws
Exception {
        String schemaName = "";
        String dataTableName = "T_FOO";
        String indexTableName = "I_FOO";
        String dataTableFullName = SchemaUtil.getTableName(schemaName,
dataTableName);
        String indexTableFullName = SchemaUtil.getTableName(schemaName,
indexTableName);
        try (Connection conn = DriverManager.getConnection(getUrl())) {
            conn.createStatement().execute("CREATE TABLE " +
dataTableFullName + "(k INTEGER PRIMARY KEY, v1 VARCHAR, v2 VARCHAR)");
            conn.createStatement().execute("CREATE INDEX " + indexTableName
+ " ON " + dataTableName + "(v1)");
            PhoenixStatement stmt =
conn.createStatement().unwrap(PhoenixStatement.class);
            QueryPlan plan;
            stmt.executeQuery("SELECT v1, v2 " +
                    " FROM " + dataTableFullName +
                    " WHERE v1='a'");
            plan = stmt.getQueryPlan();
            assertEquals("Expected (" + dataTableFullName + ") but was " +
plan.getSourceRefs(), 1, plan.getSourceRefs().size());

            stmt.executeQuery("SELECT /*+ INDEX(" + dataTableFullName + " "
+ indexTableName + ") */ v1, v2" +
                    " FROM " + dataTableFullName +
                    " WHERE v1='a'");
            plan = stmt.getQueryPlan();
            assertEquals("Expected (" + dataTableFullName + "," +
indexTableFullName + ") but was " +  plan.getSourceRefs(), 2,
plan.getSourceRefs().size());
        }
    }



On Fri, Apr 20, 2018 at 1:11 PM, Taylor, Ronald (Ronald) <
Ronald.Taylor@cchmc.org> wrote:

> Hello Sergey,
>
>
>
> Per your request, here are the commands that I used to create the table
> and its indexes.  Hopefully you can find something in here that provides a
> guide as to what we are doing wrong.
>
>
>
> BTW – as I said, we are novices with Phoenix here. One thing we are doing
> is playing around a bit with salting numbers. We believed that the data in
> our test table was small enough to fit on one region server ( < 10 GB), so
> we used a high salt number (24) to try to force HBase to use more than one
> region server, to parallelize over more than one node. Did we get that
> concept right?
>
>
>
> Ron
>
>
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
>
> CREATE TABLE variantjoin_RT_salted24 (
> chrom VARCHAR,
> genomic_range VARCHAR,
> reference VARCHAR,
> alternate VARCHAR,
> annotations VARCHAR,
> consequence VARCHAR,
> chrom_int INTEGER,
> onekg_maf DOUBLE,
> coding VARCHAR,
> esp_aa_maf DOUBLE,
> esp_ea_maf DOUBLE,
> exac_maf DOUBLE,
> filter VARCHAR,
> gene VARCHAR,
> impact VARCHAR,
> polyphen VARCHAR,
> sift VARCHAR,
> viva_maf DOUBLE,
> variant_id INTEGER PRIMARY KEY,
> genomic_range_start INTEGER,
> genomic_range_end INTEGER
> ) SALT_BUCKETS = 24, IMMUTABLE_ROWS=false;
>
>
>
>
>
> 0: jdbc:phoenix:> !describe variantjoin_RTsalted24
>
>
> 0: jdbc:phoenix:> !describe variantjoin_RTsalted24
> +------------+--------------+-------------------------+-----
> -----------------+-+
> | TABLE_CAT  | TABLE_SCHEM  |       TABLE_NAME        |
> COLUMN_NAME      | |
> +------------+--------------+-------------------------+-----
> -----------------+-+
> |            |              | VARIANTJOIN_RTSALTED24  |
> CHROM                | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> GENOMIC_RANGE        | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> REFERENCE            | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> ALTERNATE            | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> ANNOTATIONS          | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> CONSEQUENCE          | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> CHROM_INT            | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> ONEKG_MAF            | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> CODING               | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> ESP_AA_MAF           | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> ESP_EA_MAF           | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> EXAC_MAF             | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> FILTER               | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> GENE                 | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> IMPACT               | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> POLYPHEN             | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> SIFT                 | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> VIVA_MAF             | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> VARIANT_ID           | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> GENOMIC_RANGE_START  | |
> |            |              | VARIANTJOIN_RTSALTED24  |
> GENOMIC_RANGE_END    | |
> +------------+--------------+-------------------------+-----
> -----------------+-+
> 0: jdbc:phoenix:>
>
>
>
>
>
> 0: jdbc:phoenix:> create index vj2_chrom on variantjoin_RTsalted24 (chrom);
> create index vj2_chrom on variantjoin_RTsalted24 (chrom);
> No rows affected (13.322 seconds)
> 0: jdbc:phoenix:>
>
> reference    - no index at present
> alternate    - no index at present
> annotations  - JSONB blob no index at present
>
>
> 0: jdbc:phoenix:> create index vj2_genomic_range on variantjoin_RTsalted24
> (genomic_range);
> create index vj2_genomic_range on variantjoin_RTsalted24 (genomic_range);
> No rows affected (11.953 seconds)
> 0: jdbc:phoenix:>
>
> 0: jdbc:phoenix:> create index vj2_chrom_int on variantjoin_RTsalted24
> (chrom_int);
> create index vj2_chrom_int on variantjoin_RTsalted24 (chrom_int);
> No rows affected (12.518 seconds)
> 0: jdbc:phoenix:>
>
> 0: jdbc:phoenix:> create index vj2_onekg_maf on variantjoin_RTsalted24
> (onekg_maf);
> create index vj2_onekg_maf on variantjoin_RTsalted24 (onekg_maf);
> No rows affected (13.727 seconds)
> 0: jdbc:phoenix:>
>
> 0: jdbc:phoenix:> create index vj2_coding_maf on variantjoin_RTsalted24
> (onekg_maf);
> create index vj2_coding_maf on variantjoin_RTsalted24 (onekg_maf);
> No rows affected (12.45 seconds)
> 0: jdbc:phoenix:>
>
> 0: jdbc:phoenix:> create index vj2_consequence on variantjoin_RTsalted24
> (consequence);
> create index vj2_consequence on variantjoin_RTsalted24 (consequence);
> No rows affected (13.906 seconds)
> 0: jdbc:phoenix:>
>
> 0: jdbc:phoenix:> create index vj2_esp_aa_maf on variantjoin_RTsalted24
> (esp_aa_maf);
> create index vj2_esp_ea_maf on variantjoin_RTsalted24 (esp_ea_maf);
> No rows affected (12.355 seconds)
> 0: jdbc:phoenix:>
>
> 0: jdbc:phoenix:> create index vj2_exac_maf on variantjoin_RTsalted24
> (exac_maf);
> create index vj2_exac_maf on variantjoin_RTsalted24 (exac_maf);
> No rows affected (12.146 seconds)
> 0: jdbc:phoenix:>
>
> 0: jdbc:phoenix:> create index vj2_filter on variantjoin_RTsalted24
> (filter);
> create index vj2_filter on variantjoin_RTsalted24 (filter);
> No rows affected (12.178 seconds)
> 0: jdbc:phoenix:>
>
> 0: jdbc:phoenix:> create index vj2_gene on variantjoin_RTsalted24 (gene);
> create index vj2_gene on variantjoin_RTsalted24 (gene);
> No rows affected (12.375 seconds)
> 0: jdbc:phoenix:>
>
> 0: jdbc:phoenix:> create index vj2_impact on variantjoin_RTsalted24
> (impact);
> create index vj2_impact on variantjoin_RTsalted24 (impact);
> No rows affected (14.806 seconds)
> 0: jdbc:phoenix:>
>
> 0: jdbc:phoenix:> create index vj2_polyphen on variantjoin_RTsalted24
> (polyphen);
> create index vj2_polyphen on variantjoin_RTsalted24 (polyphen);
> No rows affected (13.218 seconds)
> 0: jdbc:phoenix:>
>
> 0: jdbc:phoenix:> create index vj2_sift on variantjoin_RTsalted24 (sift);
> create index vj2_sift on variantjoin_RTsalted24 (sift);
> No rows affected (12.12 seconds)
> 0: jdbc:phoenix:>
>
> 0: jdbc:phoenix:> create index vj2_viva_maf on variantjoin_RTsalted24
> (viva_maf);
> create index vj2_viva_maf on variantjoin_RTsalted24 (viva_maf);
> No rows affected (12.032 seconds)
> 0: jdbc:phoenix:>
>
> 0: jdbc:phoenix:> create index vj2_genomic_range_start on
> variantjoin_RTsalted24 (genomic_range_start);
> create index vj2_genomic_range_start on variantjoin_RTsalted24
> (genomic_range_start);
> No rows affected (12.682 seconds)
> 0: jdbc:phoenix:>
>
> 0: jdbc:phoenix:> create index vj2_genomic_range_end on
> variantjoin_RTsalted24 (genomic_range_end);
> create index vj2_genomic_range_end on variantjoin_RTsalted24
> (genomic_range_end);
> No rows affected (13.112 seconds)
> 0: jdbc:phoenix:>
>
>
>
> %%%%%%%%%%%%%%%%%%%%%%%
>
>
>
> Ronald C. Taylor, Ph.D.
> Divisions of Immunobiology and Biomedical Informatics
>
> Cincinnati Children's Hospital Medical Center
>
> Office phone: 513-803-4880
>
> Cell phone:     509-783-7308
>
> Email: ronald.taylor@cchmc.org
>
>
>
>
>
> *From: *<sergey.soldatov@gmail.com> on behalf of Sergey Soldatov <
> sergeysoldatov@gmail.com>
> *Reply-To: *"user@phoenix.apache.org" <user@phoenix.apache.org>
> *Date: *Thursday, April 19, 2018 at 5:19 PM
> *To: *"user@phoenix.apache.org" <user@phoenix.apache.org>
> *Subject: *Re: hint to use a global index is not working - need to find
> out why
>
>
>
> That looks strange. Could you please provide full DDLs for table and
> indexes? I just tried a similar scenario and obviously index is used:
>
>
>
> 0: jdbc:phoenix:> create table VARIANTJOIN_RTSALTED24 (id integer primary
> key, chrom_int integer, genomic_range integer);
>
> No rows affected (6.339 seconds)
>
> 0: jdbc:phoenix:>    create index jv2_chrom_int on VARIANTJOIN_RTSALTED24
> (chrom_int);
>
> No rows affected (10.016 seconds)
>
> 0: jdbc:phoenix:> explain SELECT/*+ INDEX(VJ jv2_chrom_int) */
> VJ.chrom_int, genomic_range  FROM VARIANTJOIN_RTSALTED24 as VJ WHERE
> (chrom_int =18 ) limit 5;
>
> +-----------------------------------------------------------
> ----------------------------+
>
> |                                         PLAN
>               |
>
> +-----------------------------------------------------------
> ----------------------------+
>
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER
> VARIANTJOIN_RTSALTED24       |
>
> | CLIENT 5 ROW LIMIT
>               |
>
> |     SKIP-SCAN-JOIN TABLE 0
>               |
>
> |         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER
> JV2_CHROM_INT [18]  |
>
> |             SERVER FILTER BY FIRST KEY ONLY
>              |
>
> |     DYNAMIC SERVER FILTER BY "VJ.ID" IN ($2.$4)
>                |
>
> |     JOIN-SCANNER 5 ROW LIMIT
>               |
>
> +-----------------------------------------------------------
> ----------------------------+
>
> 7 rows selected (0.936 seconds)
>
>
>
>
>
> Thanks,
>
> Sergey
>
>
>
> On Thu, Apr 19, 2018 at 7:31 PM, Taylor, Ronald (Ronald) <
> Ronald.Taylor@cchmc.org> wrote:
>
> Hello Phoenix users,
>
> I am a novice Phoenix user and this is my first post to this user list. I
> did some searching in the list archives, but could not find an answer to
> what I hope is a simple question: my global index is being ignored, even
> after I add a Hint, and I want to know why.
>
> We are using Phoenix 4.7 in the Hortonworks distribution. Looks like
> Hortonworks has been backporting at least some phoenix updates into their
> version of phoenix 4.7, so I guess it is a custom distribution. See
>
>
>
>      https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_
> release-notes/content/patch_phoenix.html
>
>
>
> I have created a simple table of about 8 million rows, and about 15
> columns, with several fields having global indexes. I created the main
> table (variantjoin_rtsalted24) and its indexes, and then used a bulk loader
> to populate them from a tab-delimited file. That appeared to work fine.
>
> chrom_int is one field on which there is a global index, named
> vj2_chrom_int. And you can see the index being automatically being used
> below, where it is the only field being returned. Time required is 0.124
> sec.
>
> 0: jdbc:phoenix:> SELECT VJ.chrom_int  FROM VARIANTJOIN_RTSALTED24 as VJ
> WHERE (chrom_int =18 ) limit 5;
>
> +------------+
>
> | CHROM_INT  |
>
> +------------+
>
> | 18         |
>
> | 18         |
>
> | 18         |
>
> | 18         |
>
> | 18         |
>
> +------------+
>
> 5 rows selected (0.124 seconds)
>
> 0: jdbc:phoenix:>
>
> You can see that the vj2_chrom_int index is automatically being used, as I
> understand things  by the "RANGE SCAN" wording and "[0,1" in the explain
> plan:
>
> 0: jdbc:phoenix:> explain SELECT VJ.chrom_int  FROM VARIANTJOIN_RTSALTED24
> as VJ WHERE (chrom_int =18 ) limit 5;
>
> +-----------------------------------------------------------
> -------------------+
>
> |                                           PLAN
> |
>
> +-----------------------------------------------------------
> -------------------+
>
> | CLIENT 24-CHUNK SERIAL 24-WAY ROUND ROBIN RANGE SCAN OVER VJ2_CHROM_INT
> [0,1 |
>
> |     SERVER FILTER BY FIRST KEY ONLY
> |
>
> |     SERVER 5 ROW LIMIT
> |
>
> | CLIENT 5 ROW LIMIT
>                                   |
>
> +-----------------------------------------------------------
> -------------------+
>
> 4 rows selected (0.043 seconds)
>
> 0: jdbc:phoenix:>
>
>
> I can use a Hint to tell Phoenix to NOT use this index, as seen below. And
> that increases the time needed to 1.97 sec, over an order of magnitude more
> time than the 0.124 sec required with index use.
>
> 0: jdbc:phoenix:> SELECT /*+ NO_INDEX */ VJ.chrom_int  FROM
> VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;
>
> +------------+
>
> | CHROM_INT  |
>
> +------------+
>
> | 18         |
>
> | 18         |
>
> | 18         |
>
> | 18         |
>
> | 18         |
>
> +------------+
>
> 5 rows selected (1.977 seconds)
>
> 0: jdbc:phoenix:>
>
> And here is the explain plan for that:
>
>
> 0: jdbc:phoenix:> explain SELECT /*+ NO_INDEX */ VJ.chrom_int  FROM
> VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;
>
> +-----------------------------------------------------------
> -------------------+
>
> |
> PLAN                  |
>
> +-----------------------------------------------------------
> -------------------+
>
> | CLIENT 72-CHUNK 14325202 ROWS 15099524157 BYTES PARALLEL 24-WAY ROUND
> ROBIN  |
>
> |     SERVER FILTER BY CHROM_INT = 18
> |
>
> |     SERVER 5 ROW LIMIT
> |
>
> | CLIENT 5 ROW LIMIT
>                                   |
>
> +-----------------------------------------------------------
> -------------------+
>
> 4 rows selected (0.009 seconds)
>
> Now,  I want to add other fields for retrieval. For example,
> "genomic_range". The Phoenix documentation says in such a case I must add a
> Hint to force Phoenix to make use of the index (since it is a simple global
> index, not a covered index wherein genomic_range has been added.) So I
> tried that. See below. Alas, the response time is about the same as what I
> get with NO_INDEX.  It appears that, even with the Hint, the index is not
> being used.
>
> 0: jdbc:phoenix:> SELECT/*+ INDEX(VJ jv2_chrom_int) */ VJ.chrom_int,
> genomic_range  FROM VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 )
> limit 5;
> +------------+----------------------+
> | CHROM_INT  |    GENOMIC_RANGE     |
> +------------+----------------------+
> | 18         | [49546,49547)        |
> | 18         | [20003625,20003626)  |
> | 18         | [19618749,19618752)  |
> | 18         | [47561,47583)        |
> | 18         | [20024261,20024272)  |
> +------------+----------------------+
> 5 rows selected (1.799 seconds)
> 0: jdbc:phoenix:>
>
>
> And below is the explain plan for the query with the index failure. No
> indication of index use, that I can tell.
>
> 0: jdbc:phoenix:> explain SELECT/*+ INDEX(VJ jv2_chrom_int) */
> VJ.chrom_int, genomic_range  FROM VARIANTJOIN_RTSALTED24 as VJ WHERE
> (chrom_int =18 ) limit 5;
>
> +-----------------------------------------------------------
> -------------------+
> |
> PLAN                  |
> +-----------------------------------------------------------
> -------------------+
> | CLIENT 72-CHUNK 14325202 ROWS 15099524157 BYTES PARALLEL 24-WAY ROUND
> ROBIN  |
> |     SERVER FILTER BY CHROM_INT = 18
> |
> |     SERVER 5 ROW LIMIT
> |
> | CLIENT 5 ROW LIMIT
>                                   |
> +-----------------------------------------------------------
> -------------------+
> 4 rows selected (0.042 seconds)
> 0: jdbc:phoenix:>
>
> So I am puzzled and frustrated. How do I get my index Hint to work?  The
> difference in timing between automatic use (when the index is the only
> field being retrieved) and when NO_INDEX is used tells me that the index
> table is there and can indeed be used. But something is going wrong when I
> try to force its use via a Hint. Guidance would be very much appreciated on
> this basic point.
>
>   - Ron Taylor
>
> Ronald C. Taylor, Ph.D.
> Divisions of Immunobiology and Biomedical Informatics
>
> Cincinnati Children's Hospital Medical Center
>
> Office phone: 513-803-4880
>
> Cell phone:     509-783-7308
>
> Email: ronald.taylor@cchmc.org
>
>
>
>
>
>
>

Mime
View raw message