phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maryann Xue <maryann....@gmail.com>
Subject Re: When will be the stats based join selector be implemented?
Date Thu, 08 Oct 2015 20:08:28 GMT
Hi Li,

What you are concerned here seems to be more of the knowledge of Calcite.

Anyway in short Calcite works with rules. And you can think of applying a
set of rules gives you a bunch of different query plans you could probably
go with. Calcite then calculates the cumulative cost for each candidate
(this is only the idea, but implementation differs a little bit) and picks
the cheapest plan out of these candidates.

So for example, we have several different implementations for joins in
Phoenix, and those correspond to different physical operators in Calcite
(PhoenixServerJoin.java, PhoenixClientJoin.java). We provide overrides the
cost function ("computeSelfCost") trying to model it as close as the
runtime overhead. But both versions (using PhoenixServerJoin and
PhoenixClientJoin) exist in the candidates, and what comes cheaper is
usually based on the join's input. Like if both sides of the join operator
are sorted on the join keys, most likely the merge-join is going to chosen.

There are quite a lot of general optimization rules provided by Calcite
already (in the Calcite project), like the filter push down rule. There are
also some Phoenix specific rules under org.apache.phoenix.calcite.rel.rules.

For examples, you can look at CalciteIT.java, which contains some basic
test cases as well as some interesting stuff.


Thanks,
Maryann



On Thu, Oct 8, 2015 at 2:37 PM, Li Gao <gaol@marinsoftware.com> wrote:

> Hi Maryann,
>
> I am wondering if you could help me understand how the Phoenix calcite
> branch is using Calcite to do query optimizations
>
> i.e.
>
>    - some pointers to the code where the joins can detect whether a hash
>    join or a sort merge join should be used for a given case
>    - pointers to how the cost is calculated in the code
>    - pointers to how the filter predicate push down is implemented in the
>    code
>
> Examples  would be greatly appreciated.
>
> Thanks,
> Li
>
>
> On Mon, Oct 5, 2015 at 5:49 PM, Maryann Xue <maryann.xue@gmail.com> wrote:
>
>> Hi Li,
>>
>> Sorry, I forgot to mention that this calcite branch is now depending on
>> Apache Calcite's master branch instead of any of its releases. So you need
>> to checkout Calcite (git://github.com/apache/incubator-calcite.git)
>> first and run `mvn install` for that project before going back to the
>> Phoenix project and run mvn commands.
>>
>> On Mon, Oct 5, 2015 at 6:43 PM, Li Gao <gaol@marinsoftware.com> wrote:
>>
>>> Hi Maryann,
>>>
>>> This looks great. Thanks for pointing me to the right branch!  For some
>>> reason I am getting the following errors when I do mvn package
>>>
>>> [WARNING] The POM for
>>> org.apache.calcite:calcite-avatica:jar:1.5.0-incubating-SNAPSHOT is
>>> missing, no dependency information available
>>>
>>> [WARNING] The POM for
>>> org.apache.calcite:calcite-core:jar:1.5.0-incubating-SNAPSHOT is missing,
>>> no dependency information available
>>>
>>> [WARNING] The POM for
>>> org.apache.calcite:calcite-core:jar:tests:1.5.0-incubating-SNAPSHOT is
>>> missing, no dependency information available
>>>
>>> [WARNING] The POM for
>>> org.apache.calcite:calcite-linq4j:jar:1.5.0-incubating-SNAPSHOT is missing,
>>> no dependency information available
>>>
>>> Where can I find these dependencies?
>>>
>>> Thanks,
>>>
>>> Li
>>>
>>>
>>>
>>> On Mon, Oct 5, 2015 at 12:19 PM, Maryann Xue <maryann.xue@gmail.com>
>>> wrote:
>>>
>>>> Hi Li,
>>>>
>>>> We are moving towards integrating with Calcite as our stats based
>>>> optimization now. You can checkout our calcite
>>>> <https://git1-us-west.apache.org/repos/asf?p=phoenix.git;a=shortlog;h=refs/heads/calcite>
>>>> branch and play with it if you are interested. It's still under
>>>> development, but you can already see some amazing optimization examples in
>>>> our test file CalciteIT.java. You can also go
>>>> http://www.slideshare.net/HBaseCon/ecosystem-session-2-49044349 for
>>>> more information.
>>>>
>>>>
>>>> Thanks,
>>>> Maryann
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Oct 5, 2015 at 2:08 PM, Li Gao <gaol@marinsoftware.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am currently looking into getting optimized joins based on table
>>>>> stats. I noticed in the QueryCompile at line 232-234 is still saying
"TODO".
>>>>>
>>>>>
>>>>> https://github.com/apache/phoenix/blob/4.x-HBase-1.0/phoenix-core/src/main/java/org/apache/phoenix/compile/QueryCompiler.java
>>>>>
>>>>> We have a need to get the selector enabled based on the size of the
>>>>> the LHS and RHS table.
>>>>>
>>>>> Thanks,
>>>>> Li
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message