Hi Li,What you are concerned here seems to be more of the knowledge of Calcite.Anyway in short Calcite works with rules. And you can think of applying a set of rules gives you a bunch of different query plans you could probably go with. Calcite then calculates the cumulative cost for each candidate (this is only the idea, but implementation differs a little bit) and picks the cheapest plan out of these candidates.So for example, we have several different implementations for joins in Phoenix, and those correspond to different physical operators in Calcite (PhoenixServerJoin.java, PhoenixClientJoin.java). We provide overrides the cost function ("computeSelfCost") trying to model it as close as the runtime overhead. But both versions (using PhoenixServerJoin and PhoenixClientJoin) exist in the candidates, and what comes cheaper is usually based on the join's input. Like if both sides of the join operator are sorted on the join keys, most likely the merge-join is going to chosen.There are quite a lot of general optimization rules provided by Calcite already (in the Calcite project), like the filter push down rule. There are also some Phoenix specific rules under org.apache.phoenix.calcite.rel.rules.For examples, you can look at CalciteIT.java, which contains some basic test cases as well as some interesting stuff.Thanks,MaryannOn Thu, Oct 8, 2015 at 2:37 PM, Li Gao <email@example.com> wrote:Hi Maryann,I am wondering if you could help me understand how the Phoenix calcite branch is using Calcite to do query optimizationsi.e.
- some pointers to the code where the joins can detect whether a hash join or a sort merge join should be used for a given case
- pointers to how the cost is calculated in the code
- pointers to how the filter predicate push down is implemented in the codeExamples would be greatly appreciated.Thanks,LiOn Mon, Oct 5, 2015 at 5:49 PM, Maryann Xue <firstname.lastname@example.org> wrote:Hi Li,Sorry, I forgot to mention that this calcite branch is now depending on Apache Calcite's master branch instead of any of its releases. So you need to checkout Calcite (git://github.com/apache/incubator-calcite.git) first and run `mvn install` for that project before going back to the Phoenix project and run mvn commands.On Mon, Oct 5, 2015 at 6:43 PM, Li Gao <email@example.com> wrote:Hi Maryann,This looks great. Thanks for pointing me to the right branch! For some reason I am getting the following errors when I do mvn package
[WARNING] The POM for org.apache.calcite:calcite-avatica:jar:1.5.0-incubating-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for org.apache.calcite:calcite-core:jar:1.5.0-incubating-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for org.apache.calcite:calcite-core:jar:tests:1.5.0-incubating-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for org.apache.calcite:calcite-linq4j:jar:1.5.0-incubating-SNAPSHOT is missing, no dependency information available
Where can I find these dependencies?
On Mon, Oct 5, 2015 at 12:19 PM, Maryann Xue <firstname.lastname@example.org> wrote:Hi Li,We are moving towards integrating with Calcite as our stats based optimization now. You can checkout our calcite branch and play with it if you are interested. It's still under development, but you can already see some amazing optimization examples in our test file CalciteIT.java. You can also go http://www.slideshare.net/HBaseCon/ecosystem-session-2-49044349 for more information.Thanks,MaryannOn Mon, Oct 5, 2015 at 2:08 PM, Li Gao <email@example.com> wrote:Hi all,I am currently looking into getting optimized joins based on table stats. I noticed in the QueryCompile at line 232-234 is still saying "TODO".https://github.com/apache/phoenix/blob/4.x-HBase-1.0/phoenix-core/src/main/java/org/apache/phoenix/compile/QueryCompiler.javaWe have a need to get the selector enabled based on the size of the the LHS and RHS table.Thanks,Li