phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Poon <vincentp...@apache.org>
Subject Re: Query All Dynamic Columns
Date Wed, 26 Dec 2018 20:05:15 GMT
A lot of work is currently going into handling large numbers of views -
splittable syscat, view management, etc... but agree that it's not ideal.

There's currently no built-in way to do what you want AFAIK, but you can
manage the columns yourself in a separate table:
- store them all in a single column value, and read that value before doing
your query.  HBase checkAndMutate for locking.
or
- store each column as separate rows.  Then you can do things like filter
by column name efficiently.
You could 'soft delete' by removing the entries.

Would be a nice improvement to have an option to persist dynamic column
names+types in Phoenix.

On Fri, Dec 21, 2018 at 12:18 PM Clay Baenziger (BLOOMBERG/ 731 LEX) <
cbaenziger@bloomberg.net> wrote:

> Hello,
>
> A user of mine brought up a question around dynamic columns in Phoenix
> today. The quantity of columns should become asymptotic to a few tends of
> thousands of columns as their data fills in.
>
> The user want to query all columns in a table and they are today thinking
> of using views to do this -- but it is ugly management. They have an
> unbounded number of views -- which will pollute the global catalog and fail
> relatively quickly.
>
> Has anyone thought about the potentially wasteful[1] approach of scanning
> all rows in a query to determine columns and then re-running the query for
> the rows once we know what columns the SQL result will contain. Maybe
> something cleaner like persisting the set of columns in the statistics
> table and a SELECT * may return columns with nothing but nulls. Or, even
> better is there an overall better way to model such a wide schema in
> Phoenix?
>
> -Clay
>
> [1]: Perhaps some heuristics could allow for not needing to do 2n reads in
> all cases?
>

Mime
View raw message