phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vladrodio...@gmail.com>
Subject Re: HBase + Phoenix for CDR
Date Tue, 07 Jul 2015 16:30:28 GMT
Phoenix grammar contains examples of usage. For example, create table:
https://phoenix.apache.org/language/index.html#create_table

You can not specify TTL per record. I suggest you using 1 year for whole
table and additional logic inside your application to filter expired rows
out.

When you set IMMUTABLE_ROWS=true no updates and deletes are allowed. Your
only option in this case - rely on TTL , or drop table entirely.


Optimal number of splits depends on a size of a cluster, max region size
and projected data store size - you will need to do some math here.

-Vlad

On Tue, Jul 7, 2015 at 1:57 AM, Matjaž Trtnik <mt@salviol.com> wrote:

>  Vlad and Eli, thanks for your answer and comments.
>
> 1. Normally I do query by whole Anumber, meaning country code + operator
> id + user number but as you suggested I could just reverse everything and
> it should work well if I’ll reverse number entered by user.
>
>  2. What’s the suggested number of
> Regarding syntax for table splitting I haven’t found any example in
> Phoenix but only for HBase.
>
>  create ‘mytable, ‘mycolumnfamlity’, {SPLITS=›
>
> ['10000000000000000000000000000000',
> '20000000000000000000000000000000',
> '30000000000000000000000000000000',
> '40000000000000000000000000000000',
> '50000000000000000000000000000000',
> '60000000000000000000000000000000']}
>  Do I have to use full row key when defining split?
> For example in my case where first 6 bytes represent reversed user number
> followed by oeprator and country code followed by other parts of row key -
> timestamp, job id and record number:
>
>
> 0000000000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>
> 0000010000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>
> 0000020000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>
> 0000030000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>
> 0000040000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>
> 0000050000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>
> 0000060000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>
> 0000070000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>
> 0000080000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>
> 0000090000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>
>
>  3. I was thinking about this solution too but problem is that let’s say
> for same IMEI there can be multiple Anumbers. Imagine someone using same
> phone (IMEI) but changing sim card (IMSI/Anumber). What I was trying
> yesterday was to enforce using of index and it worked pretty well but I
> read this should be only used if result set is rather small. I think in our
> case result set is typically few hundred records, maximum could be few
> thousand records but that would happen rarely. Is it advisable to use index
> enforcing for such case?
>
>  4. For TTL is there a way to set TTL to record level? Because CDR has to
> expire in 1 year since it was created and not inserted into table. And some
> CDRs like roaming are coming later so for example it can happen that you
> get today CDR which is already 1 month old and it should expire in 11
> months and not 12. I haven’t found any examples of setting TTL in Phoenix.
>
>
>  Another question I have is regarding IMMUTABLE_ROWS=true. It’s suggested
> to use this for append-only table with no updates. What about deletes? Can
> I use IMMUTABLE_ROWS=true if I delete records from table?
>
>
>
>  On 06 Jul 2015, at 20:32, Vladimir Rodionov <vladrodionov@gmail.com>
> wrote:
>
>  1. Unless you do query by Anumber prefix (country code + operator id) -
> reverse it : random 6 + operator id + country code. In this case you will
> not need salting row.
> 2. Presplit  table. Make sure you won't need to split table during normal
> operation.
> 3. Keep index between Bnumber (IMEI, IMSI?) and Anumber. Get Anumber by
> IMEI then run query by Anumber. This index is going to be much smaller.
>
>  Phoenix supports any table level configuration options, so you can
> specify TTL in your DDL statement
>
>  As for capacity planning, you can read:
>
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_system-admin-guide/content/ch_hbase_cluster_capacity_region_sizing.html
>
>  -Vlad
>
>  As for capacity planning, please read HBase book
>
>
> On Mon, Jul 6, 2015 at 8:52 AM, Matjaž Trtnik <mt@salviol.com> wrote:
>
>>  Hi fellow Phoenix users!
>>
>>  We are considering using HBase and Phoenix for CDR data retention.
>> Number of records is around 50 million per day and we should keep them for
>> about one year. I have played around a bit but would like to hear second
>> opinion from people who have more experience so I have few questions:
>>
>>
>>    1. Based on your experience can anyone recommend me approx number of
>>    nodes in cluster and hardware configuration of one node (RAM).
>>    2. Regarding row key I was thinking of Anumber + timestamp + Bnumber
>>    + jobId + recordIndex. Any other ideas? Do I need to use salting or no?
>>    Let’s assume aNumber in most cases start with first 5 digits the same
>>    (country + operator code), followed by 6 random digits for user number.
>>    3. Searches are typically done by Anumber and timestamp but also some
>>    other criterias may apply, like IMEI or IMSI number. Do you suggest to have
>>    secondary indexes for that? I read that if using secondary index all
>>    columns in select statement should be included in index as well. Keeping in
>>    mind I’m returning almost all columns does this mean almost double of data
>>    for each index? Any other suggestions how to handle this?
>>    4. For time stamp, do you suggest using LONG and storing epoch time
>>    or stick with DATE format?
>>    5. Another request is that after some time we need to be able to
>>    efficiently delete all CDRs that are older than let’s say 1 year. Is design
>>    of row key still good for that as only argument here will be timestamp? Is
>>    it possible to use TTL with Phoenix?
>>
>>
>>  Any other suggestions and advices how to design system are more than
>> welcomed.
>>
>>  Thanks, Matjaz
>>
>
>
>

Mime
View raw message