phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: MAP type?
Date Fri, 16 May 2014 19:14:43 GMT
I think having JSON support would be great. If you're up for contributing
these set of built-in functions, I'm +1.

I think to get good performance, you'd need support for functional indexes
(PHOENIX-514). This would enable you to create an index like this (assuming
you have a book_store table with a column named json):
    CREATE INDEX json_book_idx ON book_store (json_extract(json,
'$.store.book'))
This would allow a query like this to be executed efficiently:
    SELECT json FROM book_store WHERE json_extract(json, '$.store.book') =
'Huckleberry Finn'
Another improvement would be if we had STRUCT support to optimize the
storage such that all the superfluous tag information isn't stored over and
over again (PHOENIX-477). We'd want to allow the STRUCT definition to be
supplied at query time (much like our dynamic columns) to handle use cases
where the JSON structure is not known in advance.
Another alternative would be to have a JSON type and optimize the storage
in other (warning: hand wavy) interesting ways (PHOENIX-628).

I think all this would be awesome - each part could be done independently,
in parallel, and phased in. It just needs someone to step up and own it.
Thanks,
James



On Thu, May 15, 2014 at 9:50 AM, Stephen Sprague <spragues@gmail.com> wrote:

> hate to bang on the json drum again but having a json_extract() function
> this or any other non-scalar datatype could be implemented using a varchar
> datatype.
>
> granted the syntax isn't as clean:
>
> column['key'] vs. json_extract(column,'$.key')
>
> but i think it gets the job done.
>
> presto uses json_extract_scalar() and json_extract() among others. cf.
> http://prestodb.io/docs/current/functions/json.html.  This could be used
> as a guide, perhaps.
>
>
>
>
> On Wed, May 14, 2014 at 2:30 PM, Rob Roland <rob@simplymeasured.com>wrote:
>
>> Hi all,
>>
>> I have a use-case that would benefit well from a MAP type, similar to how
>> Hive uses a map. I'd like to avoid using a JOIN, due to the amount of rows
>> on each side of the JOIN. In Hive's case, you would assign a column family
>> to the MAP field.
>>
>> Essentially, I have an entity that I want to add quite a bit of dynamic
>> data that would be unknown at both initial query time and query time. A
>> contrived example is below:
>>
>> I'd like to do something like this:
>>
>> CREATE TABLE facebook_post (
>> id VARCHAR PRIMARY KEY,
>> post_body VARCHAR,
>>  bitly_links MAP<VARCHAR, VARCHAR>
>> );
>>
>> SELECT * FROM facebook_post;
>>
>> As opposed to:
>>
>> CREATE TABLE facebook_post (
>> id VARCHAR PRIMARY KEY,
>> post_body VARCHAR
>> );
>>
>> CREATE TABLE facebook_post_links (
>> facebook_post_id VARCHAR,
>> bitly_id VARCHAR,
>> expanded_link VARCHAR,
>>  CONSTRAINT pk PRIMARY KEY (facebook_post_id, bitly_id)
>> );
>>
>> SELECT * FROM facebook_post AS fp
>> LEFT OUTER JOIN facebook_post_links AS fpl ON (fpl.facebook_post_id =
>> fp.id);
>>
>> I realize that a MAP type is outside of the SQL standard, but it would be
>> excellent in my use-case. I could accomplish this with an ARRAY type, but
>> the expansion of bit.ly links happens as a post-processing step, and
>> would probably involve locking, as appending to an ARRAY has to happen
>> client-side.
>>
>> I'd be willing to take a shot at making a MAP type, but I would want to
>> make sure this change would be welcomed into mainline Phoenix before
>> embarking on something like this.
>>
>> Thanks,
>>
>> Rob Roland
>>
>
>

Mime
View raw message