phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Willem Conradie <>
Subject RE: Telco HBase POC
Date Mon, 18 Jan 2016 10:39:34 GMT
Thanks for the prompt reply.
From: Pedro Gandola []
Sent: 15 January 2016 02:19 PM
Subject: Re: Telco HBase POC

Hi Willem,

Just to give you my short experience as phoenix user.

I'm using Phoenix4.4 on top of a HBase cluster where I keep 3 billion entries.
In our use case Phoenix is doing very well and it saved a lot of code complexity and time.
If you guys have already decided that HBase is the way to go then having phoenix as a SQL
layer it will help a lot, not only in terms of code simplicity but It will help you to create
and maintain your indexes and views which can be hard&costly tasks using the plain HBase
API. Joining tables it's just a simple SQL join :).

And there are a lot of more useful features that make your life easier with HBase.

In terms of performance and depending on the SLAs that you have you need to benchmark, however
I think your main battles are going to be with HBase, JVM GCs, Network, FileSystem, etc...

I would say to give Phoenix a try, for sure.


On Fri, Jan 15, 2016 at 9:12 AM, Willem Conradie <<>>


I am currently consulting at a client with the following requirements.

They want to make available detailed data usage CDRs for customers to verify their data usage
against the websites that they visited. In short this can be seen as an itemised bill for
data usage.  The data is currently not loaded into a RDBMS due to the volumes of data involved.
The proposed solution is to load the data into HBase, running on a HDP cluster, and make it
available for querying by the subscribers.  It is critical to ensure low latency read access
to the subscriber data, which possibly will be exposed to 25 million subscribers. We will
be running a scaled down version first for a proof of concept with the intention of it becoming
an operational data store.  Once the solution is functioning properly for the data usage CDRs
other CDR types will be added, as such we need  to build a cost effective, scalable solution

I am thinking of using Apache Phoenix for the following reasons:

1.      1. Current data loading into RDBMS is file based (CSV) via a staging server using
the RDBMS file load drivers

2.      2.  Use Apache Phoenix   bin/ script to mimic above process to load to HBase

3.       3. Expected data volume :  60 000 files per day
                                                  1 –to 10 MB per file
                                                  500 million records per day
                                                   500 GB total volume per day

4.        4. Use Apache Phoenix client for low latency data retrieval

Is Apache Phoenix a suitable candidate for this specific use case?



View raw message