phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Select dynamic column content
Date Wed, 07 Jan 2015 08:04:52 GMT
Hi Sumanta,
Another alternative option is to leverage support for VIEWs in Phoenix (
http://phoenix.apache.org/views.html). In many use cases I've seen where
there are hundreds of sparse columns defined for a schema, there's a column
that determines *which* sparse columns are applicable for a given row. An
example would be a metric_event table in which an event_type column value
defines which columns are associated with a given event_type. This might be
modelled with a base table and a series of views like this:

-- base table
CREATE TABLE metric_event (
    event_type CHAR(2),
    event_id VARCHAR,
    created_date DATE,
    CONSTRAINT pk PRIMARY KEY (event_type, event_id));

-- updatable view for garbage collection event
CREATE VIEW gc_event( -- define columns specific to a GC event
    duration BIGINT,
    eden_space BIGINT,
    survivor_space BIGINT,
    ...)
AS SELECT * FROM event WHERE event_type = 'GC'

-- updatable view for IO event
CREATE VIEW io_event( -- define columns specific to an IO event
    io_wait_time BIGINT,
    iops BIGINT,
    read_workload INTEGER,
    write_workload INTEGER,
    ...)
AS SELECT * FROM event WHERE event_type = 'IO'

Using this approach, Phoenix will keep track of the columns each view for
you so you don't have to. All views reside in the same HBase table (the one
corresponding to your base table). When you do a SELECT * FROM io_event,
you'll get back only those columns defined for that view plus the columns
from the base table.

A further orthogonal level of granularity allows you to define your base
table as a MULTI_TENANT table and have each VIEW vary depending on the
particular user (identified at connection time through a property). Just as
with the scenario described above, each user may define different columns
that only exist for that particular user. See
http://phoenix.apache.org/multi-tenancy.html for more info.

HTH,

    James

On Tue, Jan 6, 2015 at 9:43 AM, Jesse Yates <jesse.k.yates@gmail.com> wrote:

> And it looks like you already figured that out :)
>
>
> On Tue, Jan 6, 2015, 9:43 AM Jesse Yates <jesse.k.yates@gmail.com> wrote:
>
>> You wouldn't even need another table, just a single VARCHAR[] column to
>> keep the column names. Its ideal to keep it in the same row (possibly in
>> another cf if you expect it to be large) since you get ACID compliance on
>> that row, which you wouldn't get from using another table. You then just
>> upsert the names column at the same time you upsert the dynamic columns.
>>
>> Phoenix does something similar for tracing where there is an unknown
>> number of annotations - we keep a trace annotation count column which then
>> can be used to figure out the dynamic annotation column names (which are
>> things like annotations.a0, .a1, .a2, etc)
>>
>> The downside is that you then need to do two queries to get all the
>> columns, but until we implement the cf.* logic for dynamic columns, that's
>> the best you can do.
>>
>> - jesse
>>
>> On Tue, Jan 6, 2015, 9:23 AM Sumanta Gh <sumanta.gh@tcs.com> wrote:
>>
>>> Thanks Nicolas for replying.
>>> I am already managing dynamic column names either putting them in a
>>> separate column or keeping the names in cache.
>>> But looking at the pace Phoenix is evolving, IMHO this cf.* query
>>> pattern would be very much helpful for users.
>>> The stock HBase client is capable of doing that.
>>> Because my table is extremely sparsed and I allow quite a random schema
>>> in every row of the table, getting the content of the dynamic column in a
>>> single query will save lot of time.
>>> Looking forward to your completion of the work...
>>>
>>> Regards
>>> Sumanta
>>>
>>>
>>> -----Nicolas Maillard <nmaillard@hortonworks.com> wrote: -----
>>> To: user@phoenix.apache.org
>>> From: Nicolas Maillard <nmaillard@hortonworks.com>
>>> Date: 01/06/2015 03:08PM
>>> Subject: Re: Select dynamic column content
>>>
>>>
>>> Hello Sumanta
>>>
>>> This is a last bit missing in the select cf.* query pattern that would
>>> bring back not only known columns but all dynamic ones also. I never got
>>> around to finishing that bit for different reasons. The best way would be
>>> to finish that, other than that I do not see an easy way to retrieve
>>>  dynamic columns of which you have lost the column name. I guess if there
>>> is a logic ti these column names you could try to reconstruct or keep a
>>> second table of keys to dynamic column names to find them in the after math.
>>>
>>> regards
>>>
>>> On Tue, Jan 6, 2015 at 8:01 AM, Sumanta Gh <sumanta.gh@tcs.com> wrote:
>>>
>>>> Hi,
>>>> It has been a puzzle for me to get the content of dynamic columns in a
>>>> single SELECT * FROM query.
>>>> Challenge is that I do not know the dynamic column names in advance so
>>>> I can not mention a dynamic column in the SELECT query.
>>>> Is there any way out? Please suggest.
>>>>
>>>> Regards
>>>> Sumanta
>>>>
>>>> =====-----=====-----=====
>>>> Notice: The information contained in this e-mail
>>>> message and/or attachments to it may contain
>>>> confidential or privileged information. If you are
>>>> not the intended recipient, any dissemination, use,
>>>> review, distribution, printing or copying of the
>>>> information contained in this e-mail message
>>>> and/or attachments to it are strictly prohibited. If
>>>> you have received this communication in error,
>>>> please notify us by reply e-mail or telephone and
>>>> immediately and permanently delete the message
>>>> and any attachments. Thank you
>>>>
>>>>
>>>
>>>
>>> --
>>> Nicolas Maillard Solution Engineer
>>> Phone:     +33 (0)6 68 17 66 05
>>> Email:      nmaillard*@hortonworks.com* <nmaillard@hortonworks.com>
>>> Website:   http://www.hortonworks.com
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>>
>>

Mime
View raw message