phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Soldatov <sergeysolda...@gmail.com>
Subject Re: hbase cell storage different bewteen bulk load and direct api
Date Thu, 19 Apr 2018 06:26:25 GMT
Hi Lew,
no. 1st one looks line incorrect. You may file a bug on that ( I believe
that the second case is correct, but you may also check with uploading data
using regular upserts). Also, you may check whether the master branch has
this issue.

Thanks,
Sergey

On Thu, Apr 19, 2018 at 10:19 AM, Lew Jackman <lew9090@netzero.net> wrote:

> Under Phoenix 4.11 we are seeing some storage discrepancies in hbase
> between a load via psql and a bulk load.
>
> To illustrate in a simple case we have modified the example table from the
> load reference https://phoenix.apache.org/bulk_dataload.html
>
> CREATE TABLE example (
>    my_pk bigint not null,
>    m.first_name varchar(50),
>    m.last_name varchar(50)
>    CONSTRAINT pk PRIMARY KEY (my_pk))
>    IMMUTABLE_ROWS=true,
>    IMMUTABLE_STORAGE_SCHEME = SINGLE_CELL_ARRAY_WITH_OFFSETS,
>    COLUMN_ENCODED_BYTES = 1;
>
> Hbase Rows when Loading via PSQL
>
> \\x80\\x00\\x00\\x00\\x00\\x0009     column=M:\\x00\\x00\\x00\\x00,
> timestamp=1524109827690, value=x
> \\x80\\x00\\x00\\x00\\x00\\x0009     column=M:1, timestamp=1524109827690,
> value=xJohnDoe\\x00\\x00\\x00\\x01\\x00\\x05\\x00\\x00\\x00\\x08\\x00\\x00\\x00\\x03\\x02
>
> \\x80\\x00\\x00\\x00\\x00\\x01\\x092  column=M:\\x00\\x00\\x00\\x00,
> timestamp=1524109827690, value=x
> \\x80\\x00\\x00\\x00\\x00\\x01\\x092  column=M:1,
> timestamp=1524109827690, value=xMaryPoppins\\x00\\x00\\
> x00\\x01\\x00\\x05\\x00\\x00\\x00\\x0C\\x00\\x00\\x00\\x03\\x02
>
>
> Hbase Rows when Loading via MapReduce using CsvBulkLoadTool
>
> \\x80\\x00\\x00\\x00\\x00\\x0009     column=M:1, timestamp=1524110486638,
> value=xJohnDoe\\x00\\x00\\x00\\x01\\x00\\x05\\x00\\x00\\x00\\x08\\x00\\x00\\x00\\x03\\x02
>
> \\x80\\x00\\x00\\x00\\x00\\x01\\x092  column=M:1,
> timestamp=1524110486638, value=xMaryPoppins\\x00\\x00\\
> x00\\x01\\x00\\x05\\x00\\x00\\x00\\x0C\\x00\\x00\\x00\\x03\\x02
>
>
>
> So, the bulk loaded tables have 4 cells for the two rows loaded via psql
> whereas a bulk load is missing two cells since it lacks the cells with col
> qualifier :\\x00\\x00\\x00\\x00
>
> Is this behavior correct?
>
> Thanks much for any insight.
>
>
>
> ____________________________________________________________
> *How To "Remove" Dark Spots*
> Gundry MD
> <http://thirdpartyoffers.netzero.net/TGL3232/5ad818ce6211c18ce6b13st04vuc>
> http://thirdpartyoffers.netzero.net/TGL3232/5ad818ce6211c18ce6b13st04vuc
> [image: SponsoredBy Content.Ad]

Mime
View raw message