phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <>
Subject Re: Column Cardinality and Stats table as an "interface"
Date Thu, 14 Apr 2016 20:28:31 GMT
The stats table would purely be used to drive optimizer decisions in
Phoenix. The data in the table is only collected during major compaction
(or when an update stats is run manually), so it's not really meant for
satisfying queries.

For Kylin integration, we'd rely on Kylin to maintain the cubes and Calcite
would be the glue that allows both Phoenix and Kylin to cooperate at
planning time. I'm sure there'd be other runtime pieces required to make it

I have no idea on the feasibility of BlinkDB integration, but conceptually
BlinkDB could probably be used as a statistics provider for Phoenix.

On Thu, Apr 14, 2016 at 1:05 PM, Nick Dimiduk <> wrote:

> Ah, okay. Thanks for the pointer to PHOENIX-1178. Do you think the stats
> table is the right place for this kind of info? Seems like the only choice.
> Is there a plan to make the stats table a stable internal API? For
> instance, integration with Kylin for building Cubes off of denormalized
> event tables in Phoenix, or supporting BlinkDB approximation queries could
> both be facilitated by the stats table.
> -n
> On Thu, Apr 14, 2016 at 12:24 PM, James Taylor <>
> wrote:
>> FYI, Lars H. is looking at PHOENIX-258 for improving performance of
>> DISTINCT. We don't yet keep any cardinality info in our stats
>> (see PHOENIX-1178).
>> Thanks,
>> James
>> On Thu, Apr 14, 2016 at 11:22 AM, Nick Dimiduk <>
>> wrote:
>>> Hello,
>>> I'm curious if there are any tricks for estimating the cardinality of
>>> the values in a phoenix column. Even for leading rowkey column, a select
>>> distinct query on a large table requires a full scan (PHOENIX-258). Maybe
>>> one could reach into the stats table and derive some knowledge? How much of
>>> a "bad thing" would this be?
>>> Thanks,
>>> Nick

View raw message