phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matjaž Trtnik>
Subject HBase + Phoenix for CDR
Date Mon, 06 Jul 2015 15:52:04 GMT
Hi fellow Phoenix users!

We are considering using HBase and Phoenix for CDR data retention. Number of records is around
50 million per day and we should keep them for about one year. I have played around a bit
but would like to hear second opinion from people who have more experience so I have few questions:

  1.  Based on your experience can anyone recommend me approx number of nodes in cluster and
hardware configuration of one node (RAM).
  2.  Regarding row key I was thinking of Anumber + timestamp + Bnumber + jobId + recordIndex.
Any other ideas? Do I need to use salting or no? Let’s assume aNumber in most cases start
with first 5 digits the same (country + operator code), followed by 6 random digits for user
  3.  Searches are typically done by Anumber and timestamp but also some other criterias may
apply, like IMEI or IMSI number. Do you suggest to have secondary indexes for that? I read
that if using secondary index all columns in select statement should be included in index
as well. Keeping in mind I’m returning almost all columns does this mean almost double of
data for each index? Any other suggestions how to handle this?
  4.  For time stamp, do you suggest using LONG and storing epoch time or stick with DATE
  5.  Another request is that after some time we need to be able to efficiently delete all
CDRs that are older than let’s say 1 year. Is design of row key still good for that as only
argument here will be timestamp? Is it possible to use TTL with Phoenix?

Any other suggestions and advices how to design system are more than welcomed.

Thanks, Matjaz
View raw message