From user-return-4571-apmail-phoenix-user-archive=phoenix.apache.org@phoenix.apache.org Fri Jan 15 09:12:41 2016 Return-Path: X-Original-To: apmail-phoenix-user-archive@minotaur.apache.org Delivered-To: apmail-phoenix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 86919180C6 for ; Fri, 15 Jan 2016 09:12:41 +0000 (UTC) Received: (qmail 8744 invoked by uid 500); 15 Jan 2016 09:12:41 -0000 Delivered-To: apmail-phoenix-user-archive@phoenix.apache.org Received: (qmail 8702 invoked by uid 500); 15 Jan 2016 09:12:41 -0000 Mailing-List: contact user-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@phoenix.apache.org Delivered-To: mailing list user@phoenix.apache.org Received: (qmail 8691 invoked by uid 99); 15 Jan 2016 09:12:41 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jan 2016 09:12:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id D580A180488 for ; Fri, 15 Jan 2016 09:12:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.999 X-Spam-Level: ** X-Spam-Status: No, score=2.999 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=pbtgroupza.onmicrosoft.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id GEc965S7XVBF for ; Fri, 15 Jan 2016 09:12:31 +0000 (UTC) Received: from emea01-db3-obe.outbound.protection.outlook.com (mail-db3on0091.outbound.protection.outlook.com [157.55.234.91]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 03F8542BA7 for ; Fri, 15 Jan 2016 09:12:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pbtgroupza.onmicrosoft.com; s=selector1-pbtgroup-co-za; h=From:To:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=s8ZX4bt7a3XtNpNUcU4ismZK5AzLppn+3nwmZdqhH34=; b=FuXD3QlIsfH5qS58n8xYaf5DGQ/rjdtnc+Peda6SwEWPJ7gRAh6QyK2s7hhjXEsTQoqZOOnAVtdtuilqteFIkEoboCqtECORtwo+IHqkLxHifopWZ1ZSGG2P7TIoAM7nuRkZ/t9oLaSVhcBwvGNdzH1pVnzZp+n+jDQ9uZgn894= Received: from DBXPR05MB109.eurprd05.prod.outlook.com (10.242.138.11) by DBXPR05MB111.eurprd05.prod.outlook.com (10.242.138.19) with Microsoft SMTP Server (TLS) id 15.1.361.13; Fri, 15 Jan 2016 09:12:23 +0000 Received: from DBXPR05MB109.eurprd05.prod.outlook.com ([169.254.8.197]) by DBXPR05MB109.eurprd05.prod.outlook.com ([169.254.8.197]) with mapi id 15.01.0361.006; Fri, 15 Jan 2016 09:12:23 +0000 From: Willem Conradie To: "user@phoenix.apache.org" Subject: Telco HBase POC Thread-Topic: Telco HBase POC Thread-Index: AQHRT3PYZdS8qwEVd0yp0xynFp5uRJ78SrNo Date: Fri, 15 Jan 2016 09:12:23 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=willem.conradie@pbtgroup.co.za; x-originating-ip: [196.11.239.71] x-microsoft-exchange-diagnostics: 1;DBXPR05MB111;5:ctWk078Mw8V107SMDGM6V/K+KVk2ZasOT5R0wNyh2PsYtow3CTJM3HGZl9Ucnh4caBmRr2zmdtvlTnJ7fXafQWZQBVK8OxXwdt7wBVeCESEq68aKH0OsEpibotKn0bOROZXpTSujEs2lqchh+AzvbQ==;24:dY+N4niAnQq1fMre9u5da3dYBiEEfOiI/k4MqRpd8+pzevzGEegJEBP0+4MszvqyBkN9Bld7Fi07E0drH0L6BSIGfwRl8JZNE4xNKyTT0AE= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(42139001);SRVR:DBXPR05MB111; x-ms-office365-filtering-correlation-id: bab6f9f8-baa2-44ba-e0f5-08d31d8bfb0e x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(520078)(8121501046)(10201501046)(3002001);SRVR:DBXPR05MB111;BCL:0;PCL:0;RULEID:;SRVR:DBXPR05MB111; x-forefront-prvs: 08220FA8D6 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(189002)(199003)(33656002)(16236675004)(110136002)(10916005)(74482002)(106356001)(19625215002)(105586002)(5001960100002)(189998001)(101416001)(97736004)(19580395003)(19627405001)(76576001)(50986999)(54356999)(76176999)(2351001)(229853001)(86362001)(106116001)(5003600100002)(5002640100001)(102836003)(66066001)(74316001)(87936001)(450100001)(81156007)(10400500002)(2906002)(2900100001)(2950100001)(5008740100001)(107886002)(11100500001)(6116002)(40100003)(586003)(1220700001)(2501003)(1096002)(92566002)(122556002)(3846002)(5004730100002);DIR:OUT;SFP:1101;SCL:1;SRVR:DBXPR05MB111;H:DBXPR05MB109.eurprd05.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; received-spf: None (protection.outlook.com: pbtgroup.co.za does not designate permitted sender hosts) spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: multipart/alternative; boundary="_000_DBXPR05MB10963DF7F95E988021D5FABD5CD0DBXPR05MB109eurprd_" MIME-Version: 1.0 X-OriginatorOrg: pbtgroup.co.za X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Jan 2016 09:12:23.7371 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 16e79c95-57c1-497d-bd45-06210b204f16 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBXPR05MB111 --_000_DBXPR05MB10963DF7F95E988021D5FABD5CD0DBXPR05MB109eurprd_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi, I am currently consulting at a client with the following requirements. They want to make available detailed data usage CDRs for customers to verif= y their data usage against the websites that they visited. In short this ca= n be seen as an itemised bill for data usage. The data is currently not lo= aded into a RDBMS due to the volumes of data involved. The proposed solutio= n is to load the data into HBase, running on a HDP cluster, and make it ava= ilable for querying by the subscribers. It is critical to ensure low laten= cy read access to the subscriber data, which possibly will be exposed to 25= million subscribers. We will be running a scaled down version first for a = proof of concept with the intention of it becoming an operational data stor= e. Once the solution is functioning properly for the data usage CDRs other= CDR types will be added, as such we need to build a cost effective, scala= ble solution . I am thinking of using Apache Phoenix for the following reasons: 1. 1. Current data loading into RDBMS is file based (CSV) via a stagin= g server using the RDBMS file load drivers 2. 2. Use Apache Phoenix bin/psql.py script to mimic above process = to load to HBase 3. 3. Expected data volume : 60 000 files per day 1 -to 10 MB per file 500 million records per d= ay 500 GB total volume per = day 4. 4. Use Apache Phoenix client for low latency data retrieval Is Apache Phoenix a suitable candidate for this specific use case? Regards, Willem --_000_DBXPR05MB10963DF7F95E988021D5FABD5CD0DBXPR05MB109eurprd_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable


Hi,

 

I am currently consulting at a client with the fol= lowing requirements.

 

They want to make available detailed data usage CD= Rs for customers to verify their data usage against the websites that they = visited. In short this can be seen as an itemised bill for data usage. = ; The data is currently not loaded into a RDBMS due to the volumes of data involved. The proposed solution is to l= oad the data into HBase, running on a HDP cluster, and make it available fo= r querying by the subscribers.  It is critical to ensure low latency r= ead access to the subscriber data, which possibly will be exposed to 25 million subscribers. We will be running a s= caled down version first for a proof of concept with the intention of it be= coming an operational data store.  Once the solution is functioning pr= operly for the data usage CDRs other CDR types will be added, as such we need  to build a cost effective, = scalable solution .

 

I am thinking of using Apache Phoenix for the foll= owing reasons:

 

1.      = ;1. Current data loading into RDBMS is file based (CSV) via a staging server using the RDBMS file load drivers

2.    &= nbsp; 2.  Use Apache Phoenix   b= in/psql.py script to mimic above process to load to HBase

3.    &= nbsp;  3. Expected data volume :  60&= nbsp;000 files per day
            &nb= sp;            =             &nb= sp;            1 = 211;to 10 MB per file
            &nb= sp;            =             &nb= sp;            500 m= illion records per day
            &nb= sp;            =             &nb= sp;             = ;500 GB total volume per day
            &nb= sp;            =             &nb= sp;           

4.      &nbs= p; 4Use Apache Phoenix client for low latency data retrieval

 

Is Apache Phoenix a suitable candidate for this sp= ecific use case?

 

Regards,

Willem


--_000_DBXPR05MB10963DF7F95E988021D5FABD5CD0DBXPR05MB109eurprd_--