phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: BigDecimal casting issue?
Date Thu, 26 Feb 2015 18:02:15 GMT
The problem with the old way that HBase represents BigDecimal is that
the serialized bytes don't sort the same way that the BigDecimal does
(FWIW, but orthogonal to this discussion and not something that will
help you with this particular situation, a new type system was
introduced in HBase to fix this). If it's a matter of writing a
conversion script that runs overnight, I'd encourage you to do that.
If that's not possible, there are a few alternatives, but performance
won't be as good.

Phoenix requires that the serialized bytes sort order matches the sort
order of the objects they represent so that range scans and order bys
work correctly. For example, let's say you had the following schema:

CREATE TABLE t (n DECIMAL PRIMARY KEY);

and issued the following query:

SELECT * FROM t WHERE n > 0.5 AND n < 0.9;

Phoenix will compile this into a scan that has a start row of the
serialized bytes of 0.5 and a stop row of the serialized bytes of 0.9.
If the serialized byte sort order doesn't match the object sort order,
you'd get incorrect results. We do support UNSIGNED_* types for byte,
short, int, and long because the positive numbers meet this criteria.
We can't have an UNSIGNED_DECIMAL, because even restricted to just
positive BigDecimal values, this contract is not met.

1) You could declare your data byte for BigDecimal as VARBINARY and do
the byte[]->BigDecimal conversion in your code. Note that the
filtering shown above wouldn't work correctly, but an equality filter
would be ok.
CREATE TABLE t (n VARBINARY PRIMARY KEY);
SELECT * FROM t WHERE n = 0.5;

2) You could stringify your BigDecimal and convert it on-the-fly using
our TO_NUMBER built-in:
CREATE TABLE t (n VARCHAR PRIMARY KEY);
SELECT * FROM t WHERE TO_NUMBER(n) > 0.5 AND TO_NUMBER(n) < 0.9;

Note that this query would be a full table scan.

HTH,
James


On Thu, Feb 26, 2015 at 6:39 AM, Matthew Johnson
<matt.johnson@algomi.com> wrote:
> Thanks guys,
>
>
>
> Unfortunately I cannot easily change the way the value is serialized into
> HBase because that value is used by multiple other projects that don’t use
> Phoenix. Over time I could potentially migrate all these projects but I was
> hoping for a slightly more immediate solution. I could write an overnight
> batch job that scans through the entire table, and for every value that is a
> BigDecimal I could duplicate it into a new column with the PDataType
> Decimal, but this seems kinda inefficient and would duplicate a lot of data.
>
>
>
> I had a look at the documentation (and tried a few combinations) but could
> not find an UNSIGNED_DECIMAL type for Phoenix.
>
>
>
> Is there any other way I can retrieve BigDecimal values with Phoenix? If I
> wanted to write my own wrapper for Phoenix that supports BigDecimal, would I
> have to patch just the client or would I have to patch the server as well?
>
>
>
> Cheers,
>
> Matt
>
>
>
>
>
> From: anil gupta [mailto:anilgupta84@gmail.com]
> Sent: 26 February 2015 13:44
> To: user@phoenix.apache.org
> Subject: Re: BigDecimal casting issue?
>
>
>
> Hi Matthew,
>
> If you want phoenix to read a BigDecimal value out of HBase. You will need
> to use PDataType.DECIMAL.toBytes(BigDecimal) to serialize that value in
> HBase. IMO, serialization of of BigDecimal via PDataType is better because
> it enables byte array comparison on BigDecimal values.
>
> Thanks,
>
> Anil
>
>
>
> On Thu, Feb 26, 2015 at 4:30 AM, Gabriel Reid <gabriel.reid@gmail.com>
> wrote:
>
> Hi Matt,
>
> Although the object representation of the Phoenix DECIMAL type is
> BigDecimal, the byte-level encoding is different than that of
> Bytes.toBytes(BigDecimal). The reason for this is to allow for
> ordering of these values based comparison of binary values. Sorting
> the values with binary value comparison based on the return value of
> Bytes.toBytes(BigDecimal) will not result in the correct ordering of
> values.
>
> As you may have noticed, many data types in Phoenix have an UNSIGNED_*
> counter part which uses the same underlying encoding as Bytes.toBytes,
> although these datatypes suffer from the same binary comparison issue
> as outlined above.
>
> - Gabriel
>
>
>
>
> On Thu, Feb 26, 2015 at 12:06 PM, Matthew Johnson
> <matt.johnson@algomi.com> wrote:
>> Hi all,
>>
>>
>>
>> I am trying to map an HBase column where I store java.math.BigDecimal
>> values
>> using:
>>
>>
>>
>> Bytes.toBytes(myBigDecimalValueInJava)
>>
>>
>>
>> My understanding from the “Data Types” page
>> (http://phoenix.apache.org/language/datatypes.html#decimal_type) is that
>> the
>> DECIMAL type in Phoenix should map to this. However, when I store:
>>
>>
>>
>> 102.1
>>
>>
>>
>> in HBase, Phoenix reads it back as:
>>
>>
>>
>> -1.02020201E+126
>>
>>
>>
>> Not sure whether I am using it wrong? I have tried creating the view with
>> data type DECIMAL and also DECIMAL(15,5) but both give the same problem.
>> Is
>> anyone else able to successfully insert BigDecimal values via the HBase
>> client and retrieve them using Phoenix?
>>
>>
>>
>> FYI when I retrieve the value using HBase client:
>>
>>
>>
>> BigDecimal bd = Bytes.toBigDecimal(value);
>>
>>
>>
>> It correctly prints the value of 102.1.
>>
>>
>>
>> Any thoughts?
>>
>>
>>
>> Cheers,
>>
>> Matt
>>
>>
>>
>>
>
>
>
>
> --
>
> Thanks & Regards,
> Anil Gupta

Mime
View raw message