phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilgupt...@gmail.com>
Subject Re: Hive or Phoenix
Date Wed, 10 Sep 2014 16:16:08 GMT
Hi Prakash,

Please find my reply inline.

On Tue, Sep 9, 2014 at 11:28 PM, Prakash Hosalli <
prakash.hosalli@syncoms.com> wrote:

> Hi James/Anil,
>
>
>         Regarding the questions you put forward,
>
> 1.      Yes we will stored data in Hbase,
> 2.      Hive will run over Hbase.
>
Anil: I am not aware of your use case to say how much you can do with
OOTB(Out of the Box) features of Hive and HBase integration. But, when i
tried to use Hive with HBase i could not use it because Hive does not
supports querying a table that has composite rowkeys. In an production
environment, most of the times users have composite rowkeys. Obviously, you
can patch Hive-HBase integration to make it better. Please keep in mind
that Hive is not designed to support HBase(HBase integration is just a
small feature of Hive). In contrast, Phoenix is designed on "Top of HBase"
so you will get much much better integration and optimization of HBase
query.

> 3.      We will be using large amount of data (approximately 10 Million of
> rows/daily to be process).
>
Anil: What kind of processing you will be doing? If you are doing simple
aggregates, that is already supported by Phoenix. You can also have a look
a Phoenix-Pig integration to leverage more analytical power of Pig(Although
Pig is a data flow language and Hive is declarative but you get Pig
integration OOTB.)

> 4.      Right now we have both options open, but primarily we plan to use
> Hive table to serve client request/query on aggregated data.
>
Anil: People primarily use Hive for SQL querying, same can be achieved in a
better way with Phoenix(especially when HBase is your storage).

> 5.      We plan to employ all type of query & we plan to achieve high
> level of low latency.
>
Anil: Phoenix will provide you much better performance on HBase.

>
>         If I understand correctly phoenix will just connect to Hbase
> securely & rely on the Hbase API to extract query reply, therefore Phoenix
> will depend on security mechanisms employed by Hbase API & will not provide
> any security feature by itself.
>
Anil: Yes, that is true. At present, Phoenix does not provides mechanism to
grant/revoke/create/add users. Same can be done using HBase shell and
phoenix will honor those changes. Phoenix is open source so a patch is
always appreciated for new features.

>
>         Kindly correct me if my understanding is wrong.
>
>
> Thanks & Regards,
> Prakash Hosalli
>
>
> -----Original Message-----
> From: James Taylor [mailto:jamestaylor@apache.org]
> Sent: Tuesday, September 09, 2014 11:56 PM
> To: user; anil gupta
> Subject: Re: Hive or Phoenix
>
> Hi Prakash,
> If possible, it'd be helpful if you could describe your use case a bit.
>
> Some questions I'd have for you: is the data over which you'd query stored
> in HBase? And if so, would the Hive run over the HBase data? Is the data
> read-only or does it mutate? How much data are we talking about
> (approximately) and what would your typical queries be: point look-ups,
> range scans, or full table scans?
>
> As far as security, HBase provides some more fine grained mechanisms as
> well which you could leverage through HBase APIs. Other than the ability to
> connect to a secure cluster through the connection URL, Phoenix doesn't yet
> provide a SQL wrapper on these HBase APIs. This is how Intuit is leveraging
> Phoenix + security in HBase. Anil Gupta can likely tell you more.
>
> Thanks,
> James
>
> On Tue, Sep 9, 2014 at 9:28 AM, Nicolas Maillard <
> nmaillard@hortonworks.com> wrote:
> > Hello Prakash
> >
> > Considering Hive or Phoenix is a little misleading they di serve
> > different needs, let me break it down as I can.
> >
> > You mention security:
> > Phoenix and hive both work on a secured Hadoop cluster, but Hive with
> > Hive Atz has a more fine grained authorization model. So from that
> > perspective Hive has more features.
> >
> > Query performance
> > On the performance side Phoenix has random read,write access where
> > Hive is a full data access, so no way to read a particular entry
> > unless you read the whole associated file.
> > So Hive is batch or interactive, meaning a couple of tens of seconds
> > to get your answer, where Phoenix can be sub second, the response time
> > will depend greatly on wether part of the pheonix key is in your
> > query. I you do a full table scan response time will suffer. Granted
> > secondary indexes could help you there.
> >
> > SQL Semantics
> > Hive currently has a more rich sql semantics with analytics functions,
> > complex types etc...
> > Phoenix is also more limited than Hive in joins or UDFS
> >
> > So I would use Hive for large data, random analysis and ETL, and pay
> > the price of the response time a little.
> > Phoenix on the other hand is great for large volumes of data where you
> > can set up your schema and especially keys according to specific needs
> > and query patterns, in this situation you would get great query
> performance.
> >
> > To sum up in all honesty both are needed
> >
> > Hope this helps
> >
> > On Tue, Sep 9, 2014 at 4:19 PM, Prakash Hosalli
> > <prakash.hosalli@syncoms.com> wrote:
> >>
> >>
> >>
> >> Hi,
> >>
> >>
> >>
> >>
> >>
> >>                 Is phoenix as any security layer in it. As we have in
> >> hive.
> >>
> >>
> >>
> >>                 Getting confuse to go forward with Phoenix or Hive in
> >> production environment in my company.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Thanks  & Regards,
> >>
> >> Prakash Hosalli
> >>
> >> Syncoms Bangalore India.
> >>
> >>
> >
> >
> >
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or
> > entity to which it is addressed and may contain information that is
> > confidential, privileged and exempt from disclosure under applicable
> > law. If the reader of this message is not the intended recipient, you
> > are hereby notified that any printing, copying, dissemination,
> > distribution, disclosure or forwarding of this communication is
> > strictly prohibited. If you have received this communication in error,
> > please contact the sender immediately and delete it from your system.
> Thank You.
>



-- 
Thanks & Regards,
Anil Gupta

Mime
View raw message