phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Gandola <pedro.gand...@gmail.com>
Subject Re: Telco HBase POC
Date Fri, 15 Jan 2016 12:18:52 GMT
Hi Willem,

Just to give you my short experience as phoenix user.

I'm using Phoenix4.4 on top of a HBase cluster where I keep 3 billion
entries.

In our use case Phoenix is doing very well and it saved a lot of code
complexity and time. If you guys have already decided that HBase is the way
to go then having phoenix as a SQL layer it will help a lot, not only in
terms of code simplicity but It will help you to create and maintain your
indexes and views which can be hard&costly tasks using the plain HBase API.
Joining tables it's just a simple SQL join :).

And there are a lot of more useful features that make your life easier with
HBase.

In terms of performance and depending on the SLAs that you have you need to
benchmark, however I think your main battles are going to be with HBase,
JVM GCs, Network, FileSystem, etc...

I would say to give Phoenix a try, for sure.

Cheers
Pedro

On Fri, Jan 15, 2016 at 9:12 AM, Willem Conradie <
willem.conradie@pbtgroup.co.za> wrote:

>
> Hi,
>
>
>
> I am currently consulting at a client with the following requirements.
>
>
>
> They want to make available detailed data usage CDRs for customers to
> verify their data usage against the websites that they visited. In short
> this can be seen as an itemised bill for data usage.  The data is currently
> not loaded into a RDBMS due to the volumes of data involved. The proposed
> solution is to load the data into HBase, running on a HDP cluster, and make
> it available for querying by the subscribers.  It is critical to ensure low
> latency read access to the subscriber data, which possibly will be exposed
> to 25 million subscribers. We will be running a scaled down version first
> for a proof of concept with the intention of it becoming an operational
> data store.  Once the solution is functioning properly for the data usage
> CDRs other CDR types will be added, as such we need  to build a cost
> effective, scalable solution .
>
>
>
> I am thinking of using Apache Phoenix for the following reasons:
>
>
>
> 1.      1. Current data loading into RDBMS is file based (CSV) via a
> staging server using the RDBMS file load drivers
>
> 2.      2.  Use Apache Phoenix   bin/psql.py script to mimic above
> process to load to HBase
>
> 3.       3. Expected data volume :  60 000 files per day
>                                                   1 –to 10 MB per file
>                                                   500 million records per
> day
>                                                    500 GB total volume per
> day
>
>
> 4.        4. Use Apache Phoenix client for low latency data retrieval
>
>
>
> Is Apache Phoenix a suitable candidate for this specific use case?
>
>
>
> Regards,
>
> Willem
>
>

Mime
View raw message