Only when doing bulk loading and only during mapping phase
From: Puneet Kumar Ojha [firstname.lastname@example.org]
Received: רביעי, 07 ינו 2015, 15:03
To: email@example.com [firstname.lastname@example.org]
Subject: RE: high CPU when using bulk loading
Is the CPU usage 100% all the time OR only while doing bulk loading?
We are tuning our system for bulk loading. We managed to load ~250M records per hour (~96G of raw input csv data ) on a cluster with 8 nodes. We use MR bulk loading tool with pre split table and salted key.
What we currently see is that while Mappers are working we have 100% CPU usage across the cluster. It was our impression that the mapper will be I/O bound and not so much CPU intensive
Any idea what else can we tune /check.
Information in this e-mail and its attachments is confidential and privileged under the TEOCO confidentiality terms that can be reviewed here.