Hi All,
In one of my project we thought of using HBASE as back end.
My use case is I have 1TB of data which will come as multiple files (one
file around 40GB with 100 Million rows & contains 102 columns for each row)
I am trying to load this files using spark + Phoenix it is taking around 2
hours.
can you please suggest how to fine tune the load process and how to load
back the data using spark.
Environment details
==========================
Hadoop Distribution : Hortonworks
Spark Version : 1.6
Hbase Version: 1.1.2
Phoenix Version: 4.4.0
Number of nodes: 19
Please find the attachment for the create and load scripts.
Thanks & Regards
Radha krishna
|
Mime |
- Unnamed multipart/mixed (inline, None, 0 bytes)
- Unnamed multipart/alternative (inline, None, 0 bytes)
|