phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juvenn Woo <mach...@gmail.com>
Subject Re: Query TimeOut on Azure HDInsight
Date Fri, 10 Feb 2017 15:37:43 GMT
Sumanta,

Actually DISTINCT makes big difference, it may require scan as many rows as
possible to find 10 (limit 10) distinct rows. If your COL1 has less than 10
distinct value, it'll scan whole table to know that there are less than
that.

On Feb 10, 2017 11:25 PM, "Sumanta Gh" <sumanta.gh@tcs.com> wrote:

> If we remove DISTINCT from the below query, everything works fine.
> Any pointer why DISTINCT could fail?
>
>
> Regards
> Sumanta
>
>
>  -----Mark Heppner <heppner.mark@gmail.com> wrote: -----
>
>  =======================
>  To: user@phoenix.apache.org
>  From: Mark Heppner <heppner.mark@gmail.com>
>  Date: 02/10/2017 08:02PM
>  Subject: Re: Query TimeOut on Azure HDInsight
>  =======================
>    Sumanta,
> Doing the full scan over 100 million rows is going to be costly. How many
> region servers do you have? If this is a common query, you could add a
> secondary index on COL1 and INCLUDE(COLX). Otherwise, you'll have to
> increase hbase.rpc.timeout to something higher than 60000 and maybe even
> phoenix.query.timeoutMs. I'm sure there are other optimizations too, but
> I'll let someone else answer that.
>
> On Fri, Feb 10, 2017 at 7:40 AM, Sumanta Gh <sumanta.gh@tcs.com> wrote:
>
> > Hi,
> > We have a production system on Azure HDInsight.
> > There is a table called TABLE1 which has approx 100 million rows.
> >
> > Recently the following query is always timing out -
> >
> > *SELECT DISTINCT COLX FROM TABLE1 WHERE COL1=1 LIMIT 10;*
> >
> > java.lang.RuntimeException: org.apache.phoenix.exception.
> PhoenixIOException:
> > org.apache.phoenix.exception.PhoenixIOException: Failed after
> > attempts=36, exceptions:
> > Fri Feb 10 12:06:14 GMT 2017, null, java.net.SocketTimeoutException:
> > callTimeout=60000, callDuration=72705: row '?  ?' on table 'TABLE1' at
> > region=TABLE1,,1450429763940.e30cec826e39df2e3b21e0baa6e1d9c0.,
> > hostname=workernode1.xxxxxx.d1.internal.cloudapp.net,
> 60020,1483615853438,
> > seqNum=173240701
> >
> >
> > The explain plan is -
> > +------------------------------------------+
> > |                   PLAN                   |
> > +------------------------------------------+
> > | CLIENT 47-CHUNK PARALLEL 47-WAY RANGE SCAN OVER TABLE1 [1] |
> > |     SERVER AGGREGATE INTO DISTINCT ROWS BY [COLX] LIMIT 10 GROUPS |
> > | CLIENT MERGE SORT                        |
> > | CLIENT 10 ROW LIMIT                      |
> > +------------------------------------------+
> >
> >
> > How can we make this above query successful? Kindly reply urgently.
> >
> > Regards
> > Sumanta
> >
> > =====-----=====-----=====
> > Notice: The information contained in this e-mail
> > message and/or attachments to it may contain
> > confidential or privileged information. If you are
> > not the intended recipient, any dissemination, use,
> > review, distribution, printing or copying of the
> > information contained in this e-mail message
> > and/or attachments to it are strictly prohibited. If
> > you have received this communication in error,
> > please notify us by reply e-mail or telephone and
> > immediately and permanently delete the message
> > and any attachments. Thank you
> >
> >
>
>
> --
> Mark Heppner
>
>

Mime
View raw message