phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: BigDecimal casting issue?
Date Fri, 27 Feb 2015 16:54:12 GMT
Yes, AVG(TO_NUMBER(s_price)) would work fine. FWIW, even with the
BigDecimal not in the row key, it be a problem if a secondary index were
ever added on it.

One other option, if to_number doesn't meet your perf requirements would be
to store it as a VARBINARY and create a couple of new built in functions to
convert it to/from a BigDecimal. See
http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html?m=1

Thanks,
James

On Friday, February 27, 2015, Matthew Johnson <matt.johnson@algomi.com>
wrote:

> Thanks James,
>
>
>
> We are  not using BigDecimal for primary key at all, the main use case
> would be something like *(rowid varchar primary key, producttype varchar,
> price decimal)* and then we would want to do aggregate number functions
> like AVG, MAX etc. So the vast majority would be full table scans anyway.
>
>
>
> If I had a separate column where the price was stored as a string, eg *(rowid
> varchar primary key, producttype varchar, price decimal, s_price varchar)*
> will Phoenix allow me to use aggregate number functions along with the
> to_number function? Something like AVG(TO_NUMBER(s_price))?
>
>
>
> Thanks!
>
> Matt
>
>
>
>
>
> -----Original Message-----
> From: James Taylor [mailto:jamestaylor@apache.org
> <javascript:_e(%7B%7D,'cvml','jamestaylor@apache.org');>]
> Sent: 26 February 2015 18:02
> To: user
> Subject: Re: BigDecimal casting issue?
>
>
>
> The problem with the old way that HBase represents BigDecimal is that the
> serialized bytes don't sort the same way that the BigDecimal does (FWIW,
> but orthogonal to this discussion and not something that will help you with
> this particular situation, a new type system was introduced in HBase to fix
> this). If it's a matter of writing a conversion script that runs overnight,
> I'd encourage you to do that.
>
> If that's not possible, there are a few alternatives, but performance
> won't be as good.
>
>
>
> Phoenix requires that the serialized bytes sort order matches the sort
> order of the objects they represent so that range scans and order bys work
> correctly. For example, let's say you had the following schema:
>
>
>
> CREATE TABLE t (n DECIMAL PRIMARY KEY);
>
>
>
> and issued the following query:
>
>
>
> SELECT * FROM t WHERE n > 0.5 AND n < 0.9;
>
>
>
> Phoenix will compile this into a scan that has a start row of the
> serialized bytes of 0.5 and a stop row of the serialized bytes of 0.9.
>
> If the serialized byte sort order doesn't match the object sort order,
> you'd get incorrect results. We do support UNSIGNED_* types for byte,
> short, int, and long because the positive numbers meet this criteria.
>
> We can't have an UNSIGNED_DECIMAL, because even restricted to just
> positive BigDecimal values, this contract is not met.
>
>
>
> 1) You could declare your data byte for BigDecimal as VARBINARY and do the
> byte[]->BigDecimal conversion in your code. Note that the filtering shown
> above wouldn't work correctly, but an equality filter would be ok.
>
> CREATE TABLE t (n VARBINARY PRIMARY KEY); SELECT * FROM t WHERE n = 0.5;
>
>
>
> 2) You could stringify your BigDecimal and convert it on-the-fly using our
> TO_NUMBER built-in:
>
> CREATE TABLE t (n VARCHAR PRIMARY KEY);
>
> SELECT * FROM t WHERE TO_NUMBER(n) > 0.5 AND TO_NUMBER(n) < 0.9;
>
>
>
> Note that this query would be a full table scan.
>
>
>
> HTH,
>
> James
>
>
>
>
>
> On Thu, Feb 26, 2015 at 6:39 AM, Matthew Johnson <matt.johnson@algomi.com
> <javascript:_e(%7B%7D,'cvml','matt.johnson@algomi.com');>> wrote:
>
> > Thanks guys,
>
> >
>
> >
>
> >
>
> > Unfortunately I cannot easily change the way the value is serialized
>
> > into HBase because that value is used by multiple other projects that
>
> > don’t use Phoenix. Over time I could potentially migrate all these
>
> > projects but I was hoping for a slightly more immediate solution. I
>
> > could write an overnight batch job that scans through the entire
>
> > table, and for every value that is a BigDecimal I could duplicate it
>
> > into a new column with the PDataType Decimal, but this seems kinda
> inefficient and would duplicate a lot of data.
>
> >
>
> >
>
> >
>
> > I had a look at the documentation (and tried a few combinations) but
>
> > could not find an UNSIGNED_DECIMAL type for Phoenix.
>
> >
>
> >
>
> >
>
> > Is there any other way I can retrieve BigDecimal values with Phoenix?
>
> > If I wanted to write my own wrapper for Phoenix that supports
>
> > BigDecimal, would I have to patch just the client or would I have to
> patch the server as well?
>
> >
>
> >
>
> >
>
> > Cheers,
>
> >
>
> > Matt
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > From: anil gupta [mailto:anilgupta84@gmail.com
> <javascript:_e(%7B%7D,'cvml','anilgupta84@gmail.com');>]
>
> > Sent: 26 February 2015 13:44
>
> > To: user@phoenix.apache.org
> <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>
>
> > Subject: Re: BigDecimal casting issue?
>
> >
>
> >
>
> >
>
> > Hi Matthew,
>
> >
>
> > If you want phoenix to read a BigDecimal value out of HBase. You will
>
> > need to use PDataType.DECIMAL.toBytes(BigDecimal) to serialize that
>
> > value in HBase. IMO, serialization of of BigDecimal via PDataType is
>
> > better because it enables byte array comparison on BigDecimal values.
>
> >
>
> > Thanks,
>
> >
>
> > Anil
>
> >
>
> >
>
> >
>
> > On Thu, Feb 26, 2015 at 4:30 AM, Gabriel Reid <gabriel.reid@gmail.com
> <javascript:_e(%7B%7D,'cvml','gabriel.reid@gmail.com');>>
>
> > wrote:
>
> >
>
> > Hi Matt,
>
> >
>
> > Although the object representation of the Phoenix DECIMAL type is
>
> > BigDecimal, the byte-level encoding is different than that of
>
> > Bytes.toBytes(BigDecimal). The reason for this is to allow for
>
> > ordering of these values based comparison of binary values. Sorting
>
> > the values with binary value comparison based on the return value of
>
> > Bytes.toBytes(BigDecimal) will not result in the correct ordering of
>
> > values.
>
> >
>
> > As you may have noticed, many data types in Phoenix have an UNSIGNED_*
>
> > counter part which uses the same underlying encoding as Bytes.toBytes,
>
> > although these datatypes suffer from the same binary comparison issue
>
> > as outlined above.
>
> >
>
> > - Gabriel
>
> >
>
> >
>
> >
>
> >
>
> > On Thu, Feb 26, 2015 at 12:06 PM, Matthew Johnson
>
> > <matt.johnson@algomi.com
> <javascript:_e(%7B%7D,'cvml','matt.johnson@algomi.com');>> wrote:
>
> >> Hi all,
>
> >>
>
> >>
>
> >>
>
> >> I am trying to map an HBase column where I store java.math.BigDecimal
>
> >> values
>
> >> using:
>
> >>
>
> >>
>
> >>
>
> >> Bytes.toBytes(myBigDecimalValueInJava)
>
> >>
>
> >>
>
> >>
>
> >> My understanding from the “Data Types” page
>
> >> (http://phoenix.apache.org/language/datatypes.html#decimal_type) is
>
> >> that the DECIMAL type in Phoenix should map to this. However, when I
>
> >> store:
>
> >>
>
> >>
>
> >>
>
> >> 102.1
>
> >>
>
> >>
>
> >>
>
> >> in HBase, Phoenix reads it back as:
>
> >>
>
> >>
>
> >>
>
> >> -1.02020201E+126
>
> >>
>
> >>
>
> >>
>
> >> Not sure whether I am using it wrong? I have tried creating the view
>
> >> with data type DECIMAL and also DECIMAL(15,5) but both give the same
> problem.
>
> >> Is
>
> >> anyone else able to successfully insert BigDecimal values via the
>
> >> HBase client and retrieve them using Phoenix?
>
> >>
>
> >>
>
> >>
>
> >> FYI when I retrieve the value using HBase client:
>
> >>
>
> >>
>
> >>
>
> >> BigDecimal bd = Bytes.toBigDecimal(value);
>
> >>
>
> >>
>
> >>
>
> >> It correctly prints the value of 102.1.
>
> >>
>
> >>
>
> >>
>
> >> Any thoughts?
>
> >>
>
> >>
>
> >>
>
> >> Cheers,
>
> >>
>
> >> Matt
>
> >>
>
> >>
>
> >>
>
> >>
>
> >
>
> >
>
> >
>
> >
>
> > --
>
> >
>
> > Thanks & Regards,
>
> > Anil Gupta
>

Mime
View raw message