HBase does that automatically for you. Regions will be redistributed by HBase balancer and after next major compaction, locality of data will be restored, but ... HBase balancer works on a global level (all tables) and can not rebalance only one table, besides this there is a such a separate beast as HDFS balancer that makes its own decisions and does not care much about HBase data. It is recommended to disable HDFS balancer in HBase cluster for this reason. 



On Thu, Sep 3, 2015 at 1:32 AM, James Heather <james.heather@mendeley.com> wrote:

Suppose I create a table with a billion rows, on a cluster with N nodes. Then I want to increase performance, so I add a new node to the cluster. Obviously the data is still stored on the first N nodes, and not on the new one. Is there a way of redistributing the data (online) to take advantage of the new node?

I realise the answer might depend on the configuration of the table. If there are schemas that fit this notion well, and schemas that don't, I'd be interested to know about that too.

(This will be running on CDH5, if that makes a difference.)