phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Runtime DDL supported?
Date Thu, 08 Mar 2018 19:05:04 GMT
Thanks for digging that up, Miles. I've added a comment on the JIRA on how
to go about implementing it here:
https://issues.apache.org/jira/browse/PHOENIX-3547?focusedCommentId=16391739&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16391739

That would be a good first contribution if you're up for it.

Thanks,
James

On Wed, Mar 7, 2018 at 5:09 PM, Miles Spielberg <miles@box.com> wrote:

> We found https://issues.apache.org/jira/browse/PHOENIX-3547, which seems
> to be precisely our problem. We would want at least the option to use a
> bigint rather than the int in the JIRA to accommodate massive growth. While
> we intend to have many tenants, we don't intend to use the Phoenix
> "tenant_id" to differentiate them, and instead manage them at our
> application layer, so separate counters per Phoenix tenant would not help
> in our situation.
>
> Miles Spielberg
> Staff Software Engineer
>
>
> O. 650.485.1102 <(650)%20485-1102>
> 900 Jefferson Ave
> <https://maps.google.com/?q=900+Jefferson+Ave+%0D%0A%0D%0ARedwood+City+,+CA+94063&entry=gmail&source=g>
> Redwood City, CA 94063
> <https://maps.google.com/?q=900+Jefferson+Ave+%0D%0A%0D%0ARedwood+City+,+CA+94063&entry=gmail&source=g>
>
> On Wed, Feb 28, 2018 at 10:27 PM, James Taylor <jamestaylor@apache.org>
> wrote:
>
>> Please file a JIRA as it’d be feasible to change this limitation. The
>> easiest way would be to have a separate counter for each tenant. Another
>> way to reduce the number of indexes on tenant specific views would be to
>> factor out common columns to global views and create indexes there.
>>
>> On Tue, Feb 27, 2018 at 3:40 PM Miles Spielberg <miles@box.com> wrote:
>>
>>> As we discussed, indexes across views are stored in a single HBase table
>>> associated with the original table (_IDX_<tablename>). That's grand for
>>> limiting the number of HBase tables created, but I just realized that the
>>> actual index data within is differentiated by the 16-bit "viewIndexId",
>>> which limits us to 64K indexes across all views for a given table. That's
>>> concerning for our use case, especially if its a cumulative autoincrement
>>> across all CREATE INDEX and DROP INDEX operations over the lifetime of the
>>> base table.
>>>
>>> Is there any workaround for this? A quick grep across the source
>>> indicates that the length of viewIndexId is currently hard-coded.
>>>
>>> At least, this limitation should probably be added to the list of
>>> caveats and warnings at https://phoenix.apache.org/views.html.
>>>
>>> Miles Spielberg
>>> Staff Software Engineer
>>>
>>>
>>> O. 650.485.1102 <(650)%20485-1102>
>>> 900 Jefferson Ave
>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D%0A%0D%0ARedwood+City+,+CA+94063&entry=gmail&source=g>
>>> Redwood City
>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D%0A%0D%0ARedwood+City+,+CA+94063&entry=gmail&source=g>,
>>> CA 94063
>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D%0A%0D%0ARedwood+City+,+CA+94063&entry=gmail&source=g>
>>>
>>> On Thu, Feb 22, 2018 at 7:42 AM, James Taylor <jamestaylor@apache.org>
>>> wrote:
>>>
>>>> Another option would be to use dynamic columns[1] when querying across
>>>> views. You’d have to disable column encoding [2] in this case.
>>>>
>>>> [1] http://phoenix.apache.org/dynamic_columns.html
>>>> [2] http://phoenix.apache.org/columnencoding.html
>>>>
>>>> On Wed, Feb 21, 2018 at 11:57 PM Miles Spielberg <miles@box.com> wrote:
>>>>
>>>>> I believe each query in a UNION needs to have the same result tuple
>>>>> format, which would work in this toy example, but in the general case
each
>>>>> view would have a different schema. We could make the result tuples conform
>>>>> with each other by selecting NULL literals for every column except those
in
>>>>> a view. It would get quite verbose though. Assuming f1,f2,f3 all have
>>>>> incompatible types, were you suggesting something like this?
>>>>>
>>>>> Select f1, null, null from v1 where PK=?
>>>>> Union all
>>>>> Select null, f2, null from v2 where PK=?
>>>>> Union all
>>>>> Select null, null, f3 from v3 where PK=?
>>>>>
>>>>> We might just run separate parallel queries against each view and
>>>>> merge the results client side. I would guess this should perform well
since
>>>>> the block cache can be leveraged for queries after the first.
>>>>>
>>>>> We could also use the HBase API to run a point row get. We'd have to
>>>>> reimplement decoding for Phoenix's column values, which is not ideal
but
>>>>> quite doable.
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On Feb 21, 2018, at 9:09 PM, James Taylor <jamestaylor@apache.org>
>>>>> wrote:
>>>>>
>>>>> Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems
>>>>> you’re on a good track with multiple views over a single (or handful)
of
>>>>> physical table(s).
>>>>>
>>>>> On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg <miles@box.com>
wrote:
>>>>>
>>>>>> I've done some experimentation with views, with a schema resembling
>>>>>> this:
>>>>>>
>>>>>> create table t1(
>>>>>>>
>>>>>>>     pk bigint not null primary key
>>>>>>>
>>>>>>> );
>>>>>>>
>>>>>>>
>>>>>>>> create view v1(
>>>>>>>
>>>>>>>     f1 varchar
>>>>>>>
>>>>>>> ) AS SELECT * FROM t1;
>>>>>>>
>>>>>>> create INDEX v1_f1 ON v1(f1);
>>>>>>>
>>>>>>>
>>>>>>>> create view v2(
>>>>>>>
>>>>>>>     f2 varchar
>>>>>>>
>>>>>>> ) AS SELECT * FROM t1;
>>>>>>>
>>>>>>> create INDEX v2_f2 ON v2(f2);
>>>>>>>
>>>>>>>
>>>>>>>> create view v3(
>>>>>>>
>>>>>>>     f3 varchar
>>>>>>>
>>>>>>> ) AS SELECT * FROM t1;
>>>>>>>
>>>>>>> create INDEX v3_f3 ON v3(f3);
>>>>>>>
>>>>>>>
>>>>>> Most of the time we'll be accessing data via the indexed views, but
>>>>>> we'd also like to be able to query all columns (f1, f2, f3) for a
given pk. At
>>>>>> the HBase level, this should be doable as a point get on t1. The
>>>>>> SQL-y way to express this would probably be with JOINs, but the EXPLAIN
>>>>>> plan is not encouraging.
>>>>>>
>>>>>> > explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join
v2
>>>>>>> on v2.pk=t1.pk left  join v3 on v3.pk=t1.pk where t1.pk=12345;
>>>>>>> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN
POINT
>>>>>>> LOOKUP ON 1 KEY OVER T1
>>>>>>> |     PARALLEL LEFT-JOIN TABLE 0
>>>>>>> |         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN
OVER T1
>>>>>>> |     PARALLEL LEFT-JOIN TABLE 1
>>>>>>> |         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN
OVER T1
>>>>>>> |     PARALLEL LEFT-JOIN TABLE 2
>>>>>>> |         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN
OVER T1
>>>>>>>
>>>>>>
>>>>>> This is pushing me back towards a design of having a single table,
>>>>>> except for the issue of proliferating physical HBase tables for the
>>>>>> indexes. Would you advise having a single table + a single view on
it
>>>>>> containing all columns, to coerce Phoenix to consolidate the indexes
into a
>>>>>> single physical table? Are there other alternatives we should be
>>>>>> considering?
>>>>>>
>>>>>> Miles Spielberg
>>>>>> Staff Software Engineer
>>>>>>
>>>>>>
>>>>>> O. 650.485.1102 <(650)%20485-1102>
>>>>>> 900 Jefferson Ave
>>>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>
>>>>>> Redwood City
>>>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>,
>>>>>> CA 94063
>>>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>
>>>>>>
>>>>>> On Fri, Feb 16, 2018 at 3:27 PM, James Taylor <jamestaylor@apache.org
>>>>>> > wrote:
>>>>>>
>>>>>>> All indexes on views are stored in a single physical table, so
>>>>>>> you'll be ok in that regard.
>>>>>>>
>>>>>>> If you could file bugs for any local index issues, we'd really
>>>>>>> appreciate it. We've been steadily improving local indexes
>>>>>>> (see PHOENIX-3941 for some recent perf improvements - applicable
for
>>>>>>> multi-tenant tables in particular - these will appear in our
4.14 release).
>>>>>>> Handling non covered columns is pretty isolated, so we should
be able to
>>>>>>> fix bugs you find. Plus, there's a workaround - you can cover
your indexes
>>>>>>> until any issues are fixed.
>>>>>>>
>>>>>>> Global, mutable indexes have had many improvements over the last
>>>>>>> several releases too, but there's more operational overhead if/when
a data
>>>>>>> table gets out of sync with it's index table (plus some amount
of
>>>>>>> configurable eventual consistency or index disablement). With
local indexes
>>>>>>> (and HBase 1.3), this isn't possible.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> James
>>>>>>>
>>>>>>> On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg <miles@box.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi James,
>>>>>>>>
>>>>>>>> Thanks for the tips around reducing the number of physical
tables
>>>>>>>> while still maintaining the appearance of multiple tables
via view
>>>>>>>> definitions. In our use case we don't anticipate having much
if any
>>>>>>>> immutable data, so unfortunately I don't expect to be able
to take
>>>>>>>> advantage of Phoenix's optimizations there.
>>>>>>>>
>>>>>>>> We're expecting many indexes, mostly likely with several
per
>>>>>>>> logical per-tenant table. Given that global indexes are implemented
as
>>>>>>>> physical HBase tables, will the view-oriented optimizations
help very much?
>>>>>>>> We've done some experiments with local indexes on 4.13.2
and found bugs,
>>>>>>>> particularly with the rewrite optimization to read non-covered
columns from
>>>>>>>> the main table, so we're not confident in using local indexes
to optimize
>>>>>>>> queries. (I've looked through the 5.0-alpha release notes
and couldn't find
>>>>>>>> anything related to this issue, so if desired I'll collect
info for a
>>>>>>>> separate bug report.)
>>>>>>>>
>>>>>>>> Miles Spielberg
>>>>>>>> Staff Software Engineer
>>>>>>>>
>>>>>>>>
>>>>>>>> O. 650.485.1102 <(650)%20485-1102>
>>>>>>>> 900 Jefferson Ave
>>>>>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>
>>>>>>>> Redwood City
>>>>>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063+%3Chttps://maps.google.com/?q%3D900%2BJefferson%2BAve%2B%250D%2B%250D%2BRedwood%2BCity,%2BCA%2B94063%26entry%3Dgmail%26source%3Dg%3E&entry=gmail&source=g>,
>>>>>>>> CA 94063
>>>>>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>
>>>>>>>>
>>>>>>>> On Fri, Feb 16, 2018 at 2:49 PM, James Taylor <
>>>>>>>> jamestaylor@apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hi Miles,
>>>>>>>>> You'll be fine if you use views [1] and multi-tenancy
[2] to limit
>>>>>>>>> the number of physical HBase tables. Make sure you read
about the
>>>>>>>>> limitations of views too [3].
>>>>>>>>>
>>>>>>>>> Here's the way I've seen this modeled successfully:
>>>>>>>>> - create one schema per use case. This will let you leverage
some
>>>>>>>>> nice features in HBase for quotas and throttling. If
you'll have a single
>>>>>>>>> use case, you don't have to worry about it. Read about
namespaces here [4]
>>>>>>>>> and make sure to enable them before you start creating
tables.
>>>>>>>>> - define an immutable, multi-tenant base table that has
TENANT_ID
>>>>>>>>> + TYPE_ID primary key. There are optimizations Phoenix
does over immutable
>>>>>>>>> tables that you'll want to leverage (assuming you have
use cases that fit
>>>>>>>>> into this category). This Phoenix table will be backed
by a
>>>>>>>>> physical HBase table, but you won't execute Phoenix DML
against it. Think
>>>>>>>>> of it as a kind of "abstract" type. Instead, you'll create
updatable views
>>>>>>>>> over it.
>>>>>>>>> - define a regular/mutable, multi-tenant base table that
has
>>>>>>>>> TENANT_ID + TYPE_ID primary key. Same deal as above,
but this would be the
>>>>>>>>> base table for any tables in which the rows change in
place.
>>>>>>>>> - define global views per "logical" table (against either
your
>>>>>>>>> immutable base table or mutable base table depending
on the functionality
>>>>>>>>> needed) with each view having a WHERE TYPE_ID='your type
identifier' clause
>>>>>>>>> which adds specific columns to the primary key. This
view will be updatable
>>>>>>>>> (i.e. you can execute DML against it). The columns you
add to your PK will
>>>>>>>>> depend on your most common query patterns.
>>>>>>>>> - optionally define indexes on these global views.
>>>>>>>>> - each tenant can further extend or just use the global
views.
>>>>>>>>>
>>>>>>>>> FYI, lots of good performance/tuning tips can be found
here[5].
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> James
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1] https://phoenix.apache.org/views.html
>>>>>>>>> [2] https://phoenix.apache.org/multi-tenancy.html
>>>>>>>>> [3] https://phoenix.apache.org/views.html#Limitations
>>>>>>>>> [4] https://phoenix.apache.org/namspace_mapping.html
>>>>>>>>> [5] https://phoenix.apache.org/tuning_guide.html
>>>>>>>>>
>>>>>>>>> On Fri, Feb 16, 2018 at 11:47 AM, Miles Spielberg <miles@box.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> We're looking at employing Phoenix in a multi-tenant
use case
>>>>>>>>>> where tenants can create their own tables and indexes,
running into totals
>>>>>>>>>> of tens-of-thousands of each. Is this a supported
scenario, or are we
>>>>>>>>>> headed for trouble?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>
>

Mime
View raw message