phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Kiran <maghamraviki...@gmail.com>
Subject Re: bulk loader MR counters
Date Thu, 02 Apr 2015 21:35:16 GMT
Hi Ralph.

    I assume when you are running the MR for the main table, you have a
larger number of columns to load than the MR for the index table due to
which you see more spilled records.

To tune the MR for the Main table, I would do the following first and then
measure the counters to see for any improvement.
a) To avoid the spilled the records during the MR for the main table, I
would recommend trying to increase the *mapreduce.task.io.sort.mb* to a
value like 500 MB rather than the default 100 MB
b) *mapreduce.task.io.sort.factor* to have higher number of streams to
merge at once during sorting map output .


Regards
Ravi

On Thu, Apr 2, 2015 at 1:21 PM, Perko, Ralph J <Ralph.Perko@pnnl.gov> wrote:

>  Hi, we recently upgraded our cluster (Phoenix 4.3 – HDP 2.2) and I’m
> seeing a significant degradation in performance.  I am going through the MR
> counters for a Phoenix CsvBulkLoad job and I am hoping you can help me
> understand some things.
>
>  There is a base table with 4 index tables, so a total of 5 MR jobs run –
> one for each table.
>
>  Here are the counters for an* index table *MR job:
>
>  Note two things – the Input and output are the same number as expected
>  There seems to be a lot spilled records.
>  ===========================================================
>  Combine input records 0 0 0
> Combine output records 0 0 0
> CPU time spent (ms) 1800380 156630 1957010
> Failed Shuffles 0 0 0
> GC time elapsed (ms) 39738 1923 41661
> Input split bytes 690 0 690
> Map input records *13637198* 0 13637198
> Map output bytes 2144112474 0 2144112474
> Map output materialized bytes 2171387170 0 2171387170
> Map output records *13637198* 0 13637198
> Merged Map outputs 0 50 50
> Physical memory (bytes) snapshot 8493744128 10708692992 19202437120
> Reduce input groups 0 13637198 13637198
> Reduce input records 0 13637198 13637198
> Reduce output records 0 13637198 13637198
> Reduce shuffle bytes 0 2171387170 2171387170
> Shuffled Maps 0 50 50
> Spilled Records *13637198* *13637198* 27274396
> Total committed heap usage (bytes) 11780751360 26862419968 38643171328
> Virtual memory (bytes) snapshot 25903271936 96590065664 122493337600
>  ===========================================================
>
>  Here are the counters for the *main table* MR job
>  Please note the input records are correct – same as above
>  The output records are many times the input
>  The output bytes are many times the output from above
>  The amount of spilled records is many times the number of input records
> and twice the number of output records
>  ===========================================================
> Combine input records 0 0 0
>  Combine output records 0 0 0
> CPU time spent (ms) 5059340 2035910 7095250
> Failed Shuffles 0 0 0
> GC time elapsed (ms) 38937 13748 52685
> Input split bytes 690 0 690
> Map input records *13637198* 0 13637198
> Map output bytes *59638106406* 0 59638106406
> Map output materialized bytes 60702718624 0 60702718624
> Map output records *531850722* 0 531850722
> Merged Map outputs 0 50 50
> Physical memory (bytes) snapshot 8398745600 2756530176 11155275776
> Reduce input groups 0 13637198 13637198
> Reduce input records 0 531850722 531850722
> Reduce output records 0 531850722 531850722
> Reduce shuffle bytes 0 60702718624 60702718624
> Shuffled Maps 0 50 50
> Spilled Records 1063701444 531850722 1595552166
> Total committed heap usage (bytes) 10136059904 19488309248 29624369152
> Virtual memory (bytes) snapshot 25926946816 96562970624 122489917440
>  ===========================================================
>
>
>  Can you help me understand why this is how I can tune this?
>
>  Thanks,
> Ralph
>
>   __________________________________________________
> *Ralph Perko*
> Pacific Northwest National Laboratory
> (509) 375-2272
> ralph.perko@pnnl.gov
>

Mime
View raw message