phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re:
Date Fri, 03 Jun 2016 15:36:00 GMT
That's pretty slick, Steve. Another variant along the same lines would be
to create/alter a view over a base table where you use the IF NOT EXISTS.
In this way, you let Phoenix track the dynamic columns for you. We have a
new feature in 4.8 for declaring a table as an APPEND_ONLY_SCHEMA and we'll
minimize the RPCs for ensuring the meta data is up to date. It works well
for dynamically tracking schema as data is processed, a pretty common
pattern.

Thanks,
James

On Friday, June 3, 2016, Steve Terrell <sterrell@oculus360.us> wrote:

> I have a similar situation.  I have records with varying fields that I
> wanted to access individually and also as a group.
>
> My actual records are JSON objects, so they look like like this:
> {"field1": value1, "field2": value2, …}
> To make matter harder, the fields are also varying types:  ints, strings,
> boolean, floats.
>
> What I ended up doing that has worked really well is make each field in
> the JSON record a dynamic field in a Phoenix table.  And, to preserve the
> integrity of the original record, I have one static string field in which
> the entire original JSON goes into.
>
> Because of the dynamic fields, I have to execute an UPSERT for each row
> instead of some kind of batch loading.  But, I routinely load millions of
> rows like this, sometimes with 20+ fields, and it has never failed on an
> upsert and runs fast enough.
>
> A few of the fields are actually always in each record, so I use those for
> my primary keys,
>
> The end result is great!  I can query by any field, quickly pull entire
> records from the key fields, and any JSON object can be loaded.
>
> In my code that translates the JSON into UPSERT statements I also keep a
> running tab of field names, their type, and number of occurrences.  Then a
> the end of my millions of UPSERTS I populate a simple table that just lists
> all the fields/types/counts so that there is no mystery about what fields
> are available when I want to query the big table.  I did this because I
> don't always know what field names are in the JSON, and I don't know how to
> get a list of dynamic field names from Phoenix SQL.
>
> Hope this helps,
>     Steve
>
> On Thu, Jun 2, 2016 at 9:28 PM, Stephen Wilcoxon <wilcoxon@gmail.com
> <javascript:_e(%7B%7D,'cvml','wilcoxon@gmail.com');>> wrote:
>
>> The simple solution is to transform the multi-valued field into a
>> separate linked table with multiple rows per "id".  If this is what you
>> mean by "exploding the data", why do you not want to do so?
>>
>> On Thu, Jun 2, 2016 at 7:11 PM, Rahul Jain <rjain7@gmail.com
>> <javascript:_e(%7B%7D,'cvml','rjain7@gmail.com');>> wrote:
>>
>>>
>>> Folks,
>>>
>>> I used this question earlier on the wrong list; posting on apache
>>> phoenix user group on the advise of James Taylor:
>>>
>>> We have a use case of migration from legacy database to hbase/phoenix.
>>> I'd want to hear your thoughts on how to manage the  given multi-valued
>>>  field:
>>>
>>> Field name : namesList
>>>
>>> Contains: comma separated list of strings.
>>>
>>>  e.g. "Jack, Richard, Fred"
>>>
>>> The field does not have an inherent limit on the number of strings,
>>> however for successful migration we need to support unto 100 strings in the
>>> field
>>>
>>> This field is currently indexed in our legacy DB technology where the
>>> user is able to query by any of the contained strings.
>>>
>>> e.g. query:
>>>
>>> select * from myTable where namesList LIKE '%Fred%';
>>>
>>> Is there any way Phoenix supports generating multiple index values from
>>> a single column in a row ? We are looking for a method where we don't
>>> explicitly specify individual offsets for indexing, (nameList[0],
>>> nameList[1] etc) , rather say a custom functional index equivalent
>>> generates the indexes as a list and each of the index in the list is
>>> persisted individually.
>>>
>>> Effectively our goal is to avoid full table scans when doing queries
>>> that match values from multi valued fields such as example above. We do
>>> want to avoid exploding the data however possible.
>>>
>>> Any comments / suggestions are welcome.
>>>
>>> Thank you !
>>>
>>> Rahul
>>>
>>
>>
>

Mime
View raw message