1400 mappers on 9 nodes is about 155 mappers per datanode which sounds high to me. There are very few specifics in your mail. Are you using YARN? Can you provide details like table structure, # of rows & columns, etc. Do you have an error stack?


On Friday, September 11, 2015, Gaurav Kanade <gaurav.kanade@gmail.com> wrote:
Hi All

I am new to Apache Phoenix (and relatively new to MR in general) but I am trying a bulk insert of a 200GB tar separated file in an HBase table. This seems to start off fine and kicks off about ~1400 mappers and 9 reducers (I have 9 data nodes in my setup).

At some point I seem to be running into problems with this process as it seems the data nodes run out of capacity (from what I can see my data nodes have 400GB local space). It does seem that certain reducers eat up most of the capacity on these - thus slowing down the process to a crawl and ultimately leading to Node Managers complaining that Node Health is bad (log-dirs and local-dirs are bad)

Is there some inherent setting I am missing that I need to set up for the particular job ?

Any pointers would be appreciated

Thanks

--
Gaurav Kanade,
Software Engineer
Big Data
Cloud and Enterprise Division
Microsoft