phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miles Spielberg <mi...@box.com>
Subject Re: Runtime DDL supported?
Date Tue, 27 Feb 2018 23:39:46 GMT
As we discussed, indexes across views are stored in a single HBase table
associated with the original table (_IDX_<tablename>). That's grand for
limiting the number of HBase tables created, but I just realized that the
actual index data within is differentiated by the 16-bit "viewIndexId",
which limits us to 64K indexes across all views for a given table. That's
concerning for our use case, especially if its a cumulative autoincrement
across all CREATE INDEX and DROP INDEX operations over the lifetime of the
base table.

Is there any workaround for this? A quick grep across the source indicates
that the length of viewIndexId is currently hard-coded.

At least, this limitation should probably be added to the list of caveats
and warnings at https://phoenix.apache.org/views.html.

Miles Spielberg
Staff Software Engineer


O. 650.485.1102
900 Jefferson Ave
Redwood City, CA 94063

On Thu, Feb 22, 2018 at 7:42 AM, James Taylor <jamestaylor@apache.org>
wrote:

> Another option would be to use dynamic columns[1] when querying across
> views. You’d have to disable column encoding [2] in this case.
>
> [1] http://phoenix.apache.org/dynamic_columns.html
> [2] http://phoenix.apache.org/columnencoding.html
>
> On Wed, Feb 21, 2018 at 11:57 PM Miles Spielberg <miles@box.com> wrote:
>
>> I believe each query in a UNION needs to have the same result tuple
>> format, which would work in this toy example, but in the general case each
>> view would have a different schema. We could make the result tuples conform
>> with each other by selecting NULL literals for every column except those in
>> a view. It would get quite verbose though. Assuming f1,f2,f3 all have
>> incompatible types, were you suggesting something like this?
>>
>> Select f1, null, null from v1 where PK=?
>> Union all
>> Select null, f2, null from v2 where PK=?
>> Union all
>> Select null, null, f3 from v3 where PK=?
>>
>> We might just run separate parallel queries against each view and merge
>> the results client side. I would guess this should perform well since the
>> block cache can be leveraged for queries after the first.
>>
>> We could also use the HBase API to run a point row get. We'd have to
>> reimplement decoding for Phoenix's column values, which is not ideal but
>> quite doable.
>>
>> Sent from my iPhone
>>
>> On Feb 21, 2018, at 9:09 PM, James Taylor <jamestaylor@apache.org> wrote:
>>
>> Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems
>> you’re on a good track with multiple views over a single (or handful) of
>> physical table(s).
>>
>> On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg <miles@box.com> wrote:
>>
>>> I've done some experimentation with views, with a schema resembling this:
>>>
>>> create table t1(
>>>>
>>>>     pk bigint not null primary key
>>>>
>>>> );
>>>>
>>>>
>>>>> create view v1(
>>>>
>>>>     f1 varchar
>>>>
>>>> ) AS SELECT * FROM t1;
>>>>
>>>> create INDEX v1_f1 ON v1(f1);
>>>>
>>>>
>>>>> create view v2(
>>>>
>>>>     f2 varchar
>>>>
>>>> ) AS SELECT * FROM t1;
>>>>
>>>> create INDEX v2_f2 ON v2(f2);
>>>>
>>>>
>>>>> create view v3(
>>>>
>>>>     f3 varchar
>>>>
>>>> ) AS SELECT * FROM t1;
>>>>
>>>> create INDEX v3_f3 ON v3(f3);
>>>>
>>>>
>>> Most of the time we'll be accessing data via the indexed views, but we'd
>>> also like to be able to query all columns (f1, f2, f3) for a given pk. At
>>> the HBase level, this should be doable as a point get on t1. The SQL-y
>>> way to express this would probably be with JOINs, but the EXPLAIN plan is
>>> not encouraging.
>>>
>>> > explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 on
>>>> v2.pk=t1.pk left  join v3 on v3.pk=t1.pk where t1.pk=12345;
>>>> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT
>>>> LOOKUP ON 1 KEY OVER T1
>>>> |     PARALLEL LEFT-JOIN TABLE 0
>>>> |         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>>> |     PARALLEL LEFT-JOIN TABLE 1
>>>> |         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>>> |     PARALLEL LEFT-JOIN TABLE 2
>>>> |         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>>>
>>>
>>> This is pushing me back towards a design of having a single table,
>>> except for the issue of proliferating physical HBase tables for the
>>> indexes. Would you advise having a single table + a single view on it
>>> containing all columns, to coerce Phoenix to consolidate the indexes into a
>>> single physical table? Are there other alternatives we should be
>>> considering?
>>>
>>> Miles Spielberg
>>> Staff Software Engineer
>>>
>>>
>>> O. 650.485.1102 <(650)%20485-1102>
>>> 900 Jefferson Ave
>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>
>>> Redwood City
>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>,
>>> CA 94063
>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>
>>>
>>> On Fri, Feb 16, 2018 at 3:27 PM, James Taylor <jamestaylor@apache.org>
>>> wrote:
>>>
>>>> All indexes on views are stored in a single physical table, so you'll
>>>> be ok in that regard.
>>>>
>>>> If you could file bugs for any local index issues, we'd really
>>>> appreciate it. We've been steadily improving local indexes
>>>> (see PHOENIX-3941 for some recent perf improvements - applicable for
>>>> multi-tenant tables in particular - these will appear in our 4.14 release).
>>>> Handling non covered columns is pretty isolated, so we should be able to
>>>> fix bugs you find. Plus, there's a workaround - you can cover your indexes
>>>> until any issues are fixed.
>>>>
>>>> Global, mutable indexes have had many improvements over the last
>>>> several releases too, but there's more operational overhead if/when a data
>>>> table gets out of sync with it's index table (plus some amount of
>>>> configurable eventual consistency or index disablement). With local indexes
>>>> (and HBase 1.3), this isn't possible.
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>> On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg <miles@box.com> wrote:
>>>>
>>>>> Hi James,
>>>>>
>>>>> Thanks for the tips around reducing the number of physical tables
>>>>> while still maintaining the appearance of multiple tables via view
>>>>> definitions. In our use case we don't anticipate having much if any
>>>>> immutable data, so unfortunately I don't expect to be able to take
>>>>> advantage of Phoenix's optimizations there.
>>>>>
>>>>> We're expecting many indexes, mostly likely with several per logical
>>>>> per-tenant table. Given that global indexes are implemented as physical
>>>>> HBase tables, will the view-oriented optimizations help very much? We've
>>>>> done some experiments with local indexes on 4.13.2 and found bugs,
>>>>> particularly with the rewrite optimization to read non-covered columns
from
>>>>> the main table, so we're not confident in using local indexes to optimize
>>>>> queries. (I've looked through the 5.0-alpha release notes and couldn't
find
>>>>> anything related to this issue, so if desired I'll collect info for a
>>>>> separate bug report.)
>>>>>
>>>>> Miles Spielberg
>>>>> Staff Software Engineer
>>>>>
>>>>>
>>>>> O. 650.485.1102 <(650)%20485-1102>
>>>>> 900 Jefferson Ave
>>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>
>>>>> Redwood City
>>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063+%3Chttps://maps.google.com/?q%3D900%2BJefferson%2BAve%2B%250D%2B%250D%2BRedwood%2BCity,%2BCA%2B94063%26entry%3Dgmail%26source%3Dg%3E&entry=gmail&source=g>,
>>>>> CA 94063
>>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>
>>>>>
>>>>> On Fri, Feb 16, 2018 at 2:49 PM, James Taylor <jamestaylor@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Miles,
>>>>>> You'll be fine if you use views [1] and multi-tenancy [2] to limit
>>>>>> the number of physical HBase tables. Make sure you read about the
>>>>>> limitations of views too [3].
>>>>>>
>>>>>> Here's the way I've seen this modeled successfully:
>>>>>> - create one schema per use case. This will let you leverage some
>>>>>> nice features in HBase for quotas and throttling. If you'll have
a single
>>>>>> use case, you don't have to worry about it. Read about namespaces
here [4]
>>>>>> and make sure to enable them before you start creating tables.
>>>>>> - define an immutable, multi-tenant base table that has TENANT_ID
+
>>>>>> TYPE_ID primary key. There are optimizations Phoenix does over immutable
>>>>>> tables that you'll want to leverage (assuming you have use cases
that fit
>>>>>> into this category). This Phoenix table will be backed by a physical
>>>>>> HBase table, but you won't execute Phoenix DML against it. Think
of it as a
>>>>>> kind of "abstract" type. Instead, you'll create updatable views over
it.
>>>>>> - define a regular/mutable, multi-tenant base table that has
>>>>>> TENANT_ID + TYPE_ID primary key. Same deal as above, but this would
be the
>>>>>> base table for any tables in which the rows change in place.
>>>>>> - define global views per "logical" table (against either your
>>>>>> immutable base table or mutable base table depending on the functionality
>>>>>> needed) with each view having a WHERE TYPE_ID='your type identifier'
clause
>>>>>> which adds specific columns to the primary key. This view will be
updatable
>>>>>> (i.e. you can execute DML against it). The columns you add to your
PK will
>>>>>> depend on your most common query patterns.
>>>>>> - optionally define indexes on these global views.
>>>>>> - each tenant can further extend or just use the global views.
>>>>>>
>>>>>> FYI, lots of good performance/tuning tips can be found here[5].
>>>>>>
>>>>>> Thanks,
>>>>>> James
>>>>>>
>>>>>>
>>>>>> [1] https://phoenix.apache.org/views.html
>>>>>> [2] https://phoenix.apache.org/multi-tenancy.html
>>>>>> [3] https://phoenix.apache.org/views.html#Limitations
>>>>>> [4] https://phoenix.apache.org/namspace_mapping.html
>>>>>> [5] https://phoenix.apache.org/tuning_guide.html
>>>>>>
>>>>>> On Fri, Feb 16, 2018 at 11:47 AM, Miles Spielberg <miles@box.com>
>>>>>> wrote:
>>>>>>
>>>>>>> We're looking at employing Phoenix in a multi-tenant use case
where
>>>>>>> tenants can create their own tables and indexes, running into
totals of
>>>>>>> tens-of-thousands of each. Is this a supported scenario, or are
we headed
>>>>>>> for trouble?
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>

Mime
View raw message