phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Binu Mathew <bmat...@groupon.com>
Subject Re: to_date not working as expected
Date Mon, 01 Feb 2016 20:12:11 GMT
If and when a patch becomes available, can you please update this thread.

This is a blocker:
1. We have existing HBase tables that users do not necessarily want to port
over to Phoenix tables
2. Building Phoenix views to the existing HBase tables is a better solution
3. The inability to cast the date data types in the WHERE clause and
compare against another date prevents users from performance range scans.

Unfortunately, we are unable to upgrade our Hortonworks Hadoop
distribution, HDP 2.3, to use Phoenix 4.6.

HDP 2.3 came default with Phoenix 4.4 and we can't seem to upgrade a single
package, 4.4, to 4.6 on the HDP 2.3 distribution.

Can you provide us with a patch to resolve this issue?

Thanks,

On Sat, Jan 30, 2016 at 11:45 AM, Thomas D'Silva <tdsilva@salesforce.com>
wrote:

> Binu,
>
> I am able to repro the issue by manually running the test from the patch
> from https://issues.apache.org/jira/browse/PHOENIX-1769 .
> I will investigate further.
>
> Thanks,
> Thomas
>
>
> On Fri, Jan 29, 2016 at 4:26 PM, Binu Mathew <bmathew@groupon.com> wrote:
>
>> That doesn't seem to work.
>>
>> Phoenix is not recognizing that created_at in the WERE clause is a
>> derived column from to_date("created_at_date").
>>
>> Some relational databases support this type of functionality.
>>
>> On Fri, Jan 29, 2016 at 4:16 PM, Alok Singh <alok@cloudability.com>
>> wrote:
>>
>>> Does this work:
>>>
>>> select 1, to_date("created_at_date") as created_at from
>>> "gp_subscriptions" where created_at > to_date('2010-10-10') limit 3;
>>>
>>> Alok
>>>
>>> Alok
>>>
>>> alok@cloudability.com
>>>
>>> On Fri, Jan 29, 2016 at 3:54 PM, Binu Mathew <bmathew@groupon.com>
>>> wrote:
>>>
>>>> Thank you for the reply.
>>>>
>>>> I mistakenly wrote that we are using Phoenix with HBase .96. This was a
>>>> typo. We are using HBase .98 with Phoenix 4.4
>>>>
>>>> I tried the UNSIGNED types and still encountering the same issue.
>>>>
>>>> My field has the following data:
>>>>
>>>> select "created_at_date" from "gp_subscriptions" limit 3;
>>>>
>>>> +------------------------------------------+
>>>> |             created_at_date              |
>>>> +------------------------------------------+
>>>> | 2012-11-22                               |
>>>> | 2012-11-22                               |
>>>> | 2012-11-26                               |
>>>> +------------------------------------------+
>>>>
>>>> to_date function works when I convert the VARCHAR field to a DATE:
>>>>
>>>> select to_date("created_at_date") from "gp_subscriptions" limit 3;
>>>>
>>>> +---------------------------------------------------+
>>>> | TO_DATE(subscriber."created_at_date", null, null) |
>>>> +---------------------------------------------------+
>>>> | 2012-11-22 00:00:00.000                           |
>>>> | 2012-11-22 00:00:00.000                           |
>>>> | 2012-11-26 00:00:00.000                           |
>>>> +---------------------------------------------------+
>>>>
>>>>
>>>> However, I can't use the to_date function in the WHERE clause:
>>>>
>>>> select 1 from "gp_subscriptions" where to_date("created_at_date") >
>>>> to_date('2010-10-10') limit 3;
>>>>
>>>> java.lang.RuntimeException:
>>>> org.apache.phoenix.exception.PhoenixIOException:
>>>> org.apache.phoenix.exception.PhoenixIOException:
>>>> org.apache.hadoop.hbase.DoNotRetryIOException:
>>>> java.lang.reflect.InvocationTargetException
>>>>
>>>> Another issue is how Phoenix evaluates dates. In the queries below, I'm
>>>> evaluating string literals:
>>>>
>>>> 2009-05-05 is greater than (in the future) than 1970-05-05
>>>>
>>>> The following query should return 2 rows, however, it does not return
>>>> any rows:
>>>>
>>>> select '1' from "gp_subscriptions" where to_date('2009-05-05') >
>>>> to_date('1970-05-05') limit 2;
>>>> No rows selected (0.024 seconds)
>>>>
>>>>
>>>> The following query should return no rows, however, it returns 2 rows:
>>>>
>>>> select '1' from "gp_subscriptions" where to_date('2009-05-05') <
>>>> to_date('1970-05-05') limit 2;
>>>> 2 rows selected (0.033 seconds)
>>>>
>>>> Thanks,
>>>>
>>>> On Thu, Jan 28, 2016 at 8:07 PM, James Taylor <jamestaylor@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Binu,
>>>>> Phoenix has never supported HBase 0.96, so I'm not sure where you got
>>>>> the release from.
>>>>>
>>>>> I recommend upgrading to a later, supported version of HBase and a
>>>>> later version of Phoenix. Give the 4.7.0 RC a try.
>>>>>
>>>>> One other tip in particular for views you create over existing HBase
>>>>> tables. Use the UNSIGNED types documented here[1] as these use the same
>>>>> serialization as the Bytes methods provided by HBase. If you tell Phoenix
>>>>> the wrong type, it won't know so would produce erroneous data and queries.
>>>>>
>>>>> Thanks,
>>>>> James
>>>>>
>>>>> [1] https://phoenix.apache.org/language/datatypes.html
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__phoenix.apache.org_language_datatypes.html&d=CwMFaQ&c=LNdz7nrxyGFUIUTz2qIULQ&r=kx0aYbS3d_bROblEBwnOWEEpQQo7ummi3XKbSfjcDxI&m=DYIhYC8MzfMvMlb_vB97jlBOyX5-ROGZnBR8uHjL9AM&s=qZf4wLz966HePHU8YevWHvcdJWrlFV84gXZLYftu1II&e=>
>>>>>
>>>>> On Thu, Jan 28, 2016 at 5:57 PM, Binu Mathew <bmathew@groupon.com>
>>>>> wrote:
>>>>>
>>>>>> Phoenix version 4.4.0
>>>>>>
>>>>>> Issues with Phoenix when used with HBase 0.96.0.2.0
>>>>>>
>>>>>> 2 Issues:
>>>>>>
>>>>>> *ISSUE:* to_date Function is not converting string data types in
>>>>>> valid date formats to a DATE data type when used in the WHERE clause
for
>>>>>> date comparison.
>>>>>>
>>>>>> Below is a query I ran against a Phoenix view in which I use the
>>>>>> ‘to_date’ function to convert 2 VARCHAR columns to date.
>>>>>> 1. column ‘created_at_ts’ stored as VARCHAR in format such as
>>>>>> 2009-05-05 15:40:10.000
>>>>>> 2. column ‘created_at_date’ stored as VARCHAR in format such
as
>>>>>> 2009-05-05
>>>>>>
>>>>>> Observe that the ‘to_date’ function coverts the 2 VARCHAR columns
to
>>>>>> dates:
>>>>>>
>>>>>>
>>>>>> select to_date("created_at_ts"), to_date("created_at_date") from
>>>>>> "gp_subscriptions" limit 5;
>>>>>>
>>>>>>
>>>>>> +-------------------------------------------------+---------------------------------------------------+
>>>>>> | TO_DATE(subscriber."created_at_ts", null, null) |
>>>>>> TO_DATE(subscriber."created_at_date", null, null) |
>>>>>>
>>>>>> +-------------------------------------------------+---------------------------------------------------+
>>>>>> | 2009-05-05 15:40:10.000                         | 2009-05-05
>>>>>> 00:00:00.000                           |
>>>>>> | 2012-11-22 07:37:34.000                         | 2012-11-22
>>>>>> 00:00:00.000                           |
>>>>>> | 2010-07-24 14:12:33.000                         | 2010-07-24
>>>>>> 00:00:00.000                           |
>>>>>> | 2012-11-22 07:38:04.000                         | 2012-11-22
>>>>>> 00:00:00.000                           |
>>>>>> | 2012-11-22 07:38:10.000                         | 2012-11-22
>>>>>> 00:00:00.000                           |
>>>>>>
>>>>>> +-------------------------------------------------+---------------------------------------------------+
>>>>>>
>>>>>>
>>>>>> Here is another query in which I’m using the ‘to_date’ function
on
>>>>>> string literals in the WHERE clause for date comparison .
>>>>>>
>>>>>> Observer that the ‘to_date’ function coverts the string literals
to
>>>>>> dates and the the date comparison correctly evaluates:
>>>>>>
>>>>>> select '1' from "gp_subscriptions" where to_date('2009-05-05
>>>>>> 15:40:10.000') = to_date('2009-05-05 15:40:10.000') limit 2;
>>>>>> 2 rows selected (0.035 seconds)
>>>>>>
>>>>>>
>>>>>> Now when I try the date comparison using the columns from my view,
it
>>>>>> fails:
>>>>>>
>>>>>> select '1' from "gp_subscriptions" where to_date("created_at_ts")
=
>>>>>> to_date('2009-05-05 15:40:10.000') limit 2;
>>>>>>
>>>>>> Caused by: org.apache.hadoop.hbase.DoNotRetryIOException:
>>>>>> BooleanExpressionFilter failed during reading: Could not initialize
class
>>>>>> org.apache.phoenix.util.DateUtil$ISODateFormatParser
>>>>>> Caused by: java.lang.NoClassDefFoundError: Could not initialize class
>>>>>> org.apache.phoenix.util.DateUtil$ISODateFormatParser
>>>>>>
>>>>>> Also fails with same error when I try: select '1' from
>>>>>> "gp_subscriptions" where to_date("created_at_ts") = to_date('2009-05-05')
>>>>>> limit 2;
>>>>>>
>>>>>> Caused by: org.apache.hadoop.hbase.DoNotRetryIOException:
>>>>>> BooleanExpressionFilter failed during reading: Could not initialize
class
>>>>>> org.apache.phoenix.util.DateUtil$ISODateFormatParser
>>>>>> Caused by: java.lang.NoClassDefFoundError: Could not initialize class
>>>>>> org.apache.phoenix.util.DateUtil$ISODateFormatParser
>>>>>>
>>>>>>
>>>>>> *ISSUE: *Date comparisons on string literals are not evaluating
>>>>>> correctly such that dates in the future get interpreted as being
less than
>>>>>> dates in the past.
>>>>>>
>>>>>> Test case 1:
>>>>>> 2009-05-05 15:40:10.000 is greater than (in the future) 2005-05-05
>>>>>> 15:40:10.000
>>>>>>
>>>>>> The following query should return 2 rows, however, it does not return
>>>>>> any rows:
>>>>>>
>>>>>> select '1' from "gp_subscriptions" where to_date('2009-05-05
>>>>>> 15:40:10.000') > to_date('2005-05-05 15:40:10.000') limit 2;
>>>>>> No rows selected (0.024 seconds)
>>>>>>
>>>>>>
>>>>>> The following query should return no rows, however, it returns 2
rows:
>>>>>>
>>>>>> select '1' from "gp_subscriptions" where to_date('2009-05-05
>>>>>> 15:40:10.000') < to_date('2005-05-05 15:40:10.000') limit 2;
>>>>>> 2 rows selected (0.033 seconds)
>>>>>>
>>>>>> Test case 2:
>>>>>> 2009-05-05 is greater than (in the future) than 1970-05-05
>>>>>>
>>>>>> The following query should return 2 rows, however, it does not return
>>>>>> any rows:
>>>>>>
>>>>>> select '1' from "gp_subscriptions" where to_date('2009-05-05') >
>>>>>> to_date('1970-05-05') limit 2;
>>>>>> No rows selected (0.024 seconds)
>>>>>>
>>>>>>
>>>>>> The following query should return no rows, however, it returns 2
rows:
>>>>>>
>>>>>> select '1' from "gp_subscriptions" where to_date('2009-05-05') <
>>>>>> to_date('1970-05-05') limit 2;
>>>>>> 2 rows selected (0.033 seconds)
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Binu Mathew*
>>>>>> Data Engineering
>>>>>>
>>>>>> 3101 Park Blvd., Palo Alto, CA 94306
>>>>>> Mobile: 630.267.5938
>>>>>> Groupon
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.google.com_url-3Fq-3Dhttp-253A-252F-252Fwww.groupon.com-252F-26sa-3DD-26sntz-3D1-26usg-3DAFrqEzcC80FkwsjyolWTKAH1sZ9yU2t0xg&d=CwMFaQ&c=LNdz7nrxyGFUIUTz2qIULQ&r=kx0aYbS3d_bROblEBwnOWEEpQQo7ummi3XKbSfjcDxI&m=DYIhYC8MzfMvMlb_vB97jlBOyX5-ROGZnBR8uHjL9AM&s=ZLtx84pLqRnFvwXQsJT-IoadaGYt0KObWPttSMqW-xY&e=>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Binu Mathew*
>>>> Data Engineering
>>>>
>>>> 3101 Park Blvd., Palo Alto, CA 94306
>>>> Mobile: 630.267.5938
>>>> Groupon
>>>> <http://www.google.com/url?q=http%3A%2F%2Fwww.groupon.com%2F&sa=D&sntz=1&usg=AFrqEzcC80FkwsjyolWTKAH1sZ9yU2t0xg>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> *Binu Mathew*
>> Data Engineering
>>
>> 3101 Park Blvd., Palo Alto, CA 94306
>> Mobile: 630.267.5938
>> Groupon
>> <http://www.google.com/url?q=http%3A%2F%2Fwww.groupon.com%2F&sa=D&sntz=1&usg=AFrqEzcC80FkwsjyolWTKAH1sZ9yU2t0xg>
>>
>>
>>
>>
>


-- 
*Binu Mathew*
Data Engineering

3101 Park Blvd., Palo Alto, CA 94306
Mobile: 630.267.5938
Groupon
<http://www.google.com/url?q=http%3A%2F%2Fwww.groupon.com%2F&sa=D&sntz=1&usg=AFrqEzcC80FkwsjyolWTKAH1sZ9yU2t0xg>

Mime
View raw message