phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Creating Covering index on Phoenix
Date Sun, 23 Oct 2016 22:15:29 GMT
See http://phoenix.apache.org/ and the Features menu items.

On Sunday, October 23, 2016, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> Sorry I forgot you were referring to "multi tenancy"?
>
> Can you please elaborate on this?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 23 October 2016 at 22:53, Mich Talebzadeh <mich.talebzadeh@gmail.com
> <javascript:_e(%7B%7D,'cvml','mich.talebzadeh@gmail.com');>> wrote:
>
>> Thanks James.
>>
>> My use case is trade data into HDFS and then put into Hbase via Phoenix.
>> This is the batch layer
>>
>>
>>    1. Every row has a UUID as row-key and immutable (append only)
>>    2. Source (trade data -> Kafka -> Flume > HDFS. Hdfs directories
>>    partitioned by DtStamp (daily)
>>    3. cron from HDFS -> Phoenix -> Hbase
>>    4. cron from HDFS -> Hive ORC tables with partitions
>>
>>
>> For batch data visualisation we have a choice of using
>>
>>    1. Phoenix JDBC through Zeppelin (limited as Phoenix does not have
>>    analytics functions (well it can be done with usual joins as well))
>>    2. Hive JDBC through Zeppelin with Analytics support. The best choice
>>    for SQL . Pretty fast with Hive on Spark execution engine
>>    3. Spark sql with Functional programming directly on Hbase
>>    4. Spark sql with Hive
>>    5. Spark sql does not work on Phoenix (Spark 2 JDBC to Phoenix is
>>    broken.. I believe a Jira is with Hbase on this)
>>
>> So we have  a resilient design here. Phoenix secondary indexes are also
>> very useful.
>>
>> BTW. After every new append can we run update statistics on Phoenix
>> tables and indexes as we do with Hive?
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 23 October 2016 at 22:29, James Taylor <jamestaylor@apache.org
>> <javascript:_e(%7B%7D,'cvml','jamestaylor@apache.org');>> wrote:
>>
>>> Keep in mind that the CsvBulkLoadTool does not handle updating data
>>> in-place. It's expected that the data is unique by row and not updating
>>> existing data. If your data is write-once/append-only data, then you'll be
>>> ok, but otherwise you should stick with using the JDBC APIs.
>>>
>>> You're free to just use HBase APIs (maybe that's better for your use
>>> case?), but you won't get:
>>> - JDBC APIs
>>> - SQL
>>> - relational data model
>>> - parallel execution for your queries
>>> - secondary indexes
>>> - cross row/cross table transactions
>>> - query optimization
>>> - views
>>> - multi tenancy
>>> - query server
>>>
>>> HBase doesn't store data either, it relies on HDFS to do that. But HDFS
>>> eventually stores data in a file system, relying on the OS.
>>>
>>> Thanks,
>>> James
>>>
>>> On Sun, Oct 23, 2016 at 2:09 PM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','mich.talebzadeh@gmail.com');>> wrote:
>>>
>>>> Thanks Sergey,
>>>>
>>>> I have modified the design to load data into Hbase through Phoenix
>>>> table. In that way both the table in Hbase and the index in Hbase are
>>>> maintained.
>>>> I assume Phoenix bulkload .CsvBulkLoadTool  updates the underlying
>>>> table in Hbase plus all the indexes there as well.
>>>>
>>>> therefore I noticed some ambiguity here
>>>> <https://en.wikipedia.org/wiki/Apache_Phoenix>.
>>>>
>>>> "*Apache Phoenix* is an open source, massively parallel, relational
>>>> *database* engine supporting OLTP for Hadoop using *Apache* HBase as
>>>> its backing store."
>>>>
>>>> It is not a database. The underlying data store is Hbase. All Phoenix
>>>> does is to allow one to create SQL on top of Hbase to manipulate Hbase
>>>> table with DDL and DQ (data query). It does not store data itself.
>>>>
>>>> I trust this is the correct assessment
>>>>
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 23 October 2016 at 21:49, Sergey Soldatov <sergeysoldatov@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','sergeysoldatov@gmail.com');>> wrote:
>>>>
>>>>> Hi Mich,
>>>>> No, if you update HBase directly, the index will not be maintained.
>>>>> Actually I would suggest to ingest data using Phoenix CSV bulk load.
>>>>>
>>>>> Thanks,
>>>>> Sergey.
>>>>>
>>>>> On Sat, Oct 22, 2016 at 12:49 AM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','mich.talebzadeh@gmail.com');>>
wrote:
>>>>>
>>>>>> Thanks Sergey,
>>>>>>
>>>>>> In this case the phoenix view is defined on Hbase table.
>>>>>>
>>>>>> Hbase table is updated every 15 minutes via cron that uses
>>>>>> org.apache.hadoop.hbase.mapreduce.ImportTsv  to bulk load data into
>>>>>> Hbase table,
>>>>>>
>>>>>> So if I create index on my view in Phoenix, will that index be
>>>>>> maintained?
>>>>>>
>>>>>> regards
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property
which may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary
damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 21 October 2016 at 23:35, Sergey Soldatov <
>>>>>> sergeysoldatov@gmail.com
>>>>>> <javascript:_e(%7B%7D,'cvml','sergeysoldatov@gmail.com');>>
wrote:
>>>>>>
>>>>>>> Hi Mich,
>>>>>>>
>>>>>>> It's really depends on the query that you are going to use. If
>>>>>>> conditions will be applied only by time column you may create
index like
>>>>>>> create index I on "marketDataHbase" ("timecreated") include
>>>>>>> ("ticker", "price");
>>>>>>> If the conditions will be applied on others columns as well,
you may
>>>>>>> use
>>>>>>> create index I on "marketDataHbase" ("timecreated","ticker",
>>>>>>> "price");
>>>>>>>
>>>>>>> Index is updated together with the user table if you are using
>>>>>>> phoenix jdbc driver or phoenix bulk load tools to ingest the
data.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sergey
>>>>>>>
>>>>>>> On Fri, Oct 21, 2016 at 4:43 AM, Mich Talebzadeh <
>>>>>>> mich.talebzadeh@gmail.com
>>>>>>> <javascript:_e(%7B%7D,'cvml','mich.talebzadeh@gmail.com');>>
wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have  a Phoenix table on Hbase as follows:
>>>>>>>>
>>>>>>>> [image: Inline images 1]
>>>>>>>>
>>>>>>>> I want to create a covered index to cover the three columns:
>>>>>>>> ticker, timecreated, price
>>>>>>>>
>>>>>>>> More importantly I want the index to be maintained when new
rows
>>>>>>>> are added to Hbase table.
>>>>>>>>
>>>>>>>> What is the best way of achieving this?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other
property which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any
monetary damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message