phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miles Spielberg <mi...@box.com>
Subject Re: Runtime DDL supported?
Date Thu, 22 Feb 2018 07:57:24 GMT
I believe each query in a UNION needs to have the same result tuple format, which would work
in this toy example, but in the general case each view would have a different schema. We could
make the result tuples conform with each other by selecting NULL literals for every column
except those in a view. It would get quite verbose though. Assuming f1,f2,f3 all have incompatible
types, were you suggesting something like this?

Select f1, null, null from v1 where PK=?
Union all
Select null, f2, null from v2 where PK=?
Union all
Select null, null, f3 from v3 where PK=?

We might just run separate parallel queries against each view and merge the results client
side. I would guess this should perform well since the block cache can be leveraged for queries
after the first.

We could also use the HBase API to run a point row get. We'd have to reimplement decoding
for Phoenix's column values, which is not ideal but quite doable.

Sent from my iPhone

> On Feb 21, 2018, at 9:09 PM, James Taylor <jamestaylor@apache.org> wrote:
> 
> Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems you’re on a good
track with multiple views over a single (or handful) of physical table(s).
> 
>> On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg <miles@box.com> wrote:
>> I've done some experimentation with views, with a schema resembling this:
>> 
>>>> create table t1(
>>>>     pk bigint not null primary key
>>>> );
>>>> 
>>>> create view v1(
>>>>     f1 varchar
>>>> ) AS SELECT * FROM t1;
>>>> create INDEX v1_f1 ON v1(f1);
>>>> 
>>>> create view v2(
>>>>     f2 varchar
>>>> ) AS SELECT * FROM t1;
>>>> create INDEX v2_f2 ON v2(f2);
>>>> 
>>>> create view v3(
>>>>     f3 varchar
>>>> ) AS SELECT * FROM t1;
>>>> create INDEX v3_f3 ON v3(f3);
>> 
>> Most of the time we'll be accessing data via the indexed views, but we'd also like
to be able to query all columns (f1, f2, f3) for a given pk. At the HBase level, this should
be doable as a point get on t1. The SQL-y way to express this would probably be with JOINs,
but the EXPLAIN plan is not encouraging.
>> 
>>> > explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 on v2.pk=t1.pk
left  join v3 on v3.pk=t1.pk where t1.pk=12345;
>>> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON
1 KEY OVER T1
>>> |     PARALLEL LEFT-JOIN TABLE 0
>>> |         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>> |     PARALLEL LEFT-JOIN TABLE 1
>>> |         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>> |     PARALLEL LEFT-JOIN TABLE 2
>>> |         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>> 
>> This is pushing me back towards a design of having a single table, except for the
issue of proliferating physical HBase tables for the indexes. Would you advise having a single
table + a single view on it containing all columns, to coerce Phoenix to consolidate the indexes
into a single physical table? Are there other alternatives we should be considering?
>> 
>> Miles Spielberg
>> Staff Software Engineer
>> 
>> O. 650.485.1102
>> 900 Jefferson Ave
>> Redwood City, CA 94063
>> 
>> 
>>> On Fri, Feb 16, 2018 at 3:27 PM, James Taylor <jamestaylor@apache.org>
wrote:
>>> All indexes on views are stored in a single physical table, so you'll be ok in
that regard.
>>> 
>>> If you could file bugs for any local index issues, we'd really appreciate it.
We've been steadily improving local indexes (see PHOENIX-3941 for some recent perf improvements
- applicable for multi-tenant tables in particular - these will appear in our 4.14 release).
Handling non covered columns is pretty isolated, so we should be able to fix bugs you find.
Plus, there's a workaround - you can cover your indexes until any issues are fixed.
>>> 
>>> Global, mutable indexes have had many improvements over the last several releases
too, but there's more operational overhead if/when a data table gets out of sync with it's
index table (plus some amount of configurable eventual consistency or index disablement).
With local indexes (and HBase 1.3), this isn't possible.
>>> 
>>> Thanks,
>>> James
>>> 
>>>> On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg <miles@box.com> wrote:
>>>> Hi James,
>>>> 
>>>> Thanks for the tips around reducing the number of physical tables while still
maintaining the appearance of multiple tables via view definitions. In our use case we don't
anticipate having much if any immutable data, so unfortunately I don't expect to be able to
take advantage of Phoenix's optimizations there.
>>>> 
>>>> We're expecting many indexes, mostly likely with several per logical per-tenant
table. Given that global indexes are implemented as physical HBase tables, will the view-oriented
optimizations help very much? We've done some experiments with local indexes on 4.13.2 and
found bugs, particularly with the rewrite optimization to read non-covered columns from the
main table, so we're not confident in using local indexes to optimize queries. (I've looked
through the 5.0-alpha release notes and couldn't find anything related to this issue, so if
desired I'll collect info for a separate bug report.)
>>>> 
>>>> Miles Spielberg
>>>> Staff Software Engineer
>>>> 
>>>> O. 650.485.1102
>>>> 900 Jefferson Ave
>>>> Redwood City, CA 94063
>>>> 
>>>> 
>>>>> On Fri, Feb 16, 2018 at 2:49 PM, James Taylor <jamestaylor@apache.org>
wrote:
>>>>> Hi Miles,
>>>>> You'll be fine if you use views [1] and multi-tenancy [2] to limit the
number of physical HBase tables. Make sure you read about the limitations of views too [3].
>>>>> 
>>>>> Here's the way I've seen this modeled successfully:
>>>>> - create one schema per use case. This will let you leverage some nice
features in HBase for quotas and throttling. If you'll have a single use case, you don't have
to worry about it. Read about namespaces here [4] and make sure to enable them before you
start creating tables.
>>>>> - define an immutable, multi-tenant base table that has TENANT_ID + TYPE_ID
primary key. There are optimizations Phoenix does over immutable tables that you'll want to
leverage (assuming you have use cases that fit into this category). This Phoenix table will
be backed by a physical HBase table, but you won't execute Phoenix DML against it. Think of
it as a kind of "abstract" type. Instead, you'll create updatable views over it.
>>>>> - define a regular/mutable, multi-tenant base table that has TENANT_ID
+ TYPE_ID primary key. Same deal as above, but this would be the base table for any tables
in which the rows change in place.
>>>>> - define global views per "logical" table (against either your immutable
base table or mutable base table depending on the functionality needed) with each view having
a WHERE TYPE_ID='your type identifier' clause which adds specific columns to the primary key.
This view will be updatable (i.e. you can execute DML against it). The columns you add to
your PK will depend on your most common query patterns. 
>>>>> - optionally define indexes on these global views.
>>>>> - each tenant can further extend or just use the global views.
>>>>> 
>>>>> FYI, lots of good performance/tuning tips can be found here[5].
>>>>> 
>>>>> Thanks,
>>>>> James
>>>>> 
>>>>> 
>>>>> [1] https://phoenix.apache.org/views.html
>>>>> [2] https://phoenix.apache.org/multi-tenancy.html
>>>>> [3] https://phoenix.apache.org/views.html#Limitations
>>>>> [4] https://phoenix.apache.org/namspace_mapping.html
>>>>> [5] https://phoenix.apache.org/tuning_guide.html
>>>>> 
>>>>>> On Fri, Feb 16, 2018 at 11:47 AM, Miles Spielberg <miles@box.com>
wrote:
>>>>>> We're looking at employing Phoenix in a multi-tenant use case where
tenants can create their own tables and indexes, running into totals of tens-of-thousands
of each. Is this a supported scenario, or are we headed for trouble?

Mime
View raw message