phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: hbase cell storage different bewteen bulk load and direct api
Date Thu, 19 Apr 2018 21:20:13 GMT
I believe we still rely on that empty key value, even for compact storage
formats (though theoretically it could likely be made so we don't - JIRA,
please?) A quick test would confirm:
- upsert a row with no last_name or first_name
- select * from T where last_name IS NULL
If the row isn't returned, then we need that empty key value.

Thanks,
James

On Thu, Apr 19, 2018 at 1:58 PM, Sergey Soldatov <sergeysoldatov@gmail.com>
wrote:

> Heh. That looks like a bug actually. This is a 'dummy' KV (
> https://phoenix.apache.org/faq.html#Why_empty_key_value), but I have some
> doubts that we need it for compacted rows.
>
> Thanks,
> Sergey
>
> On Thu, Apr 19, 2018 at 11:30 PM, Lew Jackman <lew9090@netzero.net> wrote:
>
>> I have not tried the master yet branch yet, however on Phoenix 4.13 this
>> storage discrepancy in hbase is still present with the extra
>> column=M:\x00\x00\x00\x00 cells in hbase when using psql or sqlline.
>>
>> Does anyone have an understanding of the meaning of the column qualifier
>> \x00\x00\x00\x00 ?
>>
>>
>> ---------- Original Message ----------
>> From: "Lew Jackman" <lew9090@netzero.net>
>> To: user@phoenix.apache.org
>> Cc: user@phoenix.apache.org
>> Subject: Re: hbase cell storage different bewteen bulk load and direct api
>> Date: Thu, 19 Apr 2018 13:59:16 GMT
>>
>> The upsert statement appears the same as the psql results - i.e. extra
>> cells. I will try the master branch next. Thanks for the tip.
>>
>> ---------- Original Message ----------
>> From: Sergey Soldatov <sergeysoldatov@gmail.com>
>> To: user@phoenix.apache.org
>> Subject: Re: hbase cell storage different bewteen bulk load and direct api
>> Date: Thu, 19 Apr 2018 12:26:25 +0600
>>
>> Hi Lew,
>> no. 1st one looks line incorrect. You may file a bug on that ( I believe
>> that the second case is correct, but you may also check with uploading data
>> using regular upserts). Also, you may check whether the master branch has
>> this issue.
>>
>> Thanks,
>> Sergey
>>
>> On Thu, Apr 19, 2018 at 10:19 AM, Lew Jackman <lew9090@netzero.net>
>> wrote:
>>
>>> Under Phoenix 4.11 we are seeing some storage discrepancies in hbase
>>> between a load via psql and a bulk load.
>>>
>>> To illustrate in a simple case we have modified the example table from
>>> the load reference https://phoenix.apache.org/bulk_dataload.html
>>>
>>> CREATE TABLE example (
>>> Â Â Â my_pk bigint not null,
>>> Â Â Â m.first_name varchar(50),
>>> Â Â Â m.last_name varchar(50)
>>> Â Â Â CONSTRAINT pk PRIMARY KEY (my_pk))
>>> Â Â Â IMMUTABLE_ROWS=true,
>>> Â Â Â IMMUTABLE_STORAGE_SCHEME = SINGLE_CELL_ARRAY_WITH_OFFSETS,
>>> Â Â Â COLUMN_ENCODED_BYTES = 1;
>>>
>>> Hbase Rows when Loading via PSQL
>>>
>>> \\\\\\\\x80\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x0009
>>> Â Â Â Â column=M:\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00,
>>> timestamp=1524109827690, value=x             Â
>>> \\\\\\\\x80\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x0009
>>> Â Â Â Â column=M:1, timestamp=1524109827690, value=xJohnDoe\\\\\\\\x00\\\\\
>>> \\\x00\\\\\\\\x00\\\\\\\\x01\\\\\\\\x00\\\\\\\\x05\\\\\\\\x0
>>> 0\\\\\\\\x00\\\\\\\\x00\\\\\\\\x08\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x03\\\\\\\\x02
>>> Â Â Â Â Â Â Â Â Â Â Â Â Â
>>> \\\\\\\\x80\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x01\\\\\\\\x092
>>> Â column=M:\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00,
>>> timestamp=1524109827690, value=x             Â
>>> \\\\\\\\x80\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x01\\\\\\\\x092
>>> Â column=M:1, timestamp=1524109827690, value=xMaryPoppins\\\\\\\\x00\
>>> \\\\\\\x00\\\\\\\\x00\\\\\\\\x01\\\\\\\\x00\\\\\\\\x05\\\\\\
>>> \\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x0C\\\\\\\\x00\\\\\\\\x00
>>> \\\\\\\\x00\\\\\\\\x03\\\\\\\\x02 Â Â Â Â Â Â Â Â Â Â Â Â Â
>>>
>>> Hbase Rows when Loading via MapReduce using CsvBulkLoadTool
>>>
>>> \\\\\\\\x80\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x0009
>>> Â Â Â Â column=M:1, timestamp=1524110486638, value=xJohnDoe\\\\\\\\x00\\\\\
>>> \\\x00\\\\\\\\x00\\\\\\\\x01\\\\\\\\x00\\\\\\\\x05\\\\\\\\x0
>>> 0\\\\\\\\x00\\\\\\\\x00\\\\\\\\x08\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x03\\\\\\\\x02
>>> Â Â Â Â Â Â Â Â Â Â Â Â Â
>>> \\\\\\\\x80\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x01\\\\\\\\x092
>>> Â column=M:1, timestamp=1524110486638, value=xMaryPoppins\\\\\\\\x00\
>>> \\\\\\\x00\\\\\\\\x00\\\\\\\\x01\\\\\\\\x00\\\\\\\\x05\\\\\\
>>> \\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x0C\\\\\\\\x00\\\\\\\\x00
>>> \\\\\\\\x00\\\\\\\\x03\\\\\\\\x02 Â Â Â Â Â Â Â Â Â Â Â Â Â
>>>
>>>
>>> So, the bulk loaded tables have 4 cells for the two rows loaded via psql
>>> whereas a bulk load is missing two cells since it lacks the cells with col
>>> qualifier :\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00\\\\\\\\x00
>>> Â
>>> Is this behavior correct?
>>> Â
>>> Thanks much for any insight.
>>> Â
>>>
>>>
>>> ____________________________________________________________
>>> *How To "Remove" Dark Spots*
>>> Gundry MD
>>>
>>> <http://thirdpartyoffers.netzero.net/TGL3232/5ad818ce6211c18ce6b13st04vuc>
>>> http://thirdpartyoffers.netzero.net/TGL3232/5ad818ce6211c18ce6b13st04vuc
>>> [image: SponsoredBy Content.Ad]
>>
>>
>

Mime
View raw message