phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gaurav Kanade <gaurav.kan...@gmail.com>
Subject Re: Using Phoenix Bulk Upload CSV to upload 200GB data
Date Fri, 18 Sep 2015 00:29:28 GMT
Thanks guys so much for all the help - I was able to get this scenario to
work !

Now that I got this to work I am a little bit curious to see if I can
explore more on my initial question re disk space utilization.

Providing more specifics - yes I was using YARN and my table had about a
billion rows and 22 columns.

Specifically I would see several map jobs starting and completing
successfully. Reducers (all 9 of them) would start processing the output of
these map jobs but at a certain stage (75% map completion, 99% map
completion depending on my number of nodes) the non DFS disk space utilized
reached a critical stage eventually making the node unusable; thus killing
off all the map and reduce jobs that were on that node and restarting them;
it then proceeded to eat up memory on the other node one by one to the
extent that progress receded to 0% eventually.

Of course all this disappeared and worked smoothly when I switched from
using datanodes with VMs having 800G disk space rather than 400G disk space
- but the fact is on this kind of workload there is a point at which a
total of more than 3.2 GB temp space is required.

I will of course look at using compression of map output - but just wanted
to check if this is expected behavior on workloads of this size.

Thanks
Gaurav



On 16 September 2015 at 12:21, Gaurav Kanade <gaurav.kanade@gmail.com>
wrote:

> Thanks for the pointers Gabriel! Will give it a shot now!
>
> On 16 September 2015 at 12:15, Gabriel Reid <gabriel.reid@gmail.com>
> wrote:
>
>> Yes, there is post-processing that goes on within the driver program
>> (i.e. the command line tool with which you started the import job).
>>
>> The MapReduce job actually just creates HFiles, and then the
>> post-processing simply involves telling HBase to use these HFiles. If your
>> terminal closed while running the tool, then the HFiles won't be handed
>> over to HBase, which will result in what you're seeing.
>>
>> I usually start import jobs like this using screen [1] so that losing a
>> client terminal connection won't get in the way of the full job completing.
>>
>>
>> - Gabriel
>>
>>
>>
>> 1. https://www.gnu.org/software/screen/manual/screen.html
>>
>> On Wed, Sep 16, 2015 at 9:07 PM, Gaurav Kanade <gaurav.kanade@gmail.com>
>> wrote:
>>
>>> Sure, attached below the job counter values. I checked the final status
>>> of the job and it said succeeded. I could not see the import tool exactly
>>> because I ran it overnight and my machine rebooted at some point for some
>>> updates - I wonder if there is some post-processing after the MR job which
>>> might have failed due to this ?
>>>
>>> Thanks for the help !
>>> ----------------
>>> Logged in as: dr.who
>>> Counters for job_1442389862209_0002
>>> Application Job
>>>
>>>    - Overview
>>>    <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/job/job_1442389862209_0002>
>>>    - Counters
>>>    <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/jobcounters/job_1442389862209_0002>
>>>    - Configuration
>>>    <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/conf/job_1442389862209_0002>
>>>    - Map tasks
>>>    <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/tasks/job_1442389862209_0002/m>
>>>    - Reduce tasks
>>>    <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/tasks/job_1442389862209_0002/r>
>>>
>>> Tools
>>> Counter Group Counters File System Counters
>>> Name
>>> Map
>>> Reduce
>>> Total
>>> FILE: Number of bytes read
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/FILE_BYTES_READ>
1520770904675
>>> 2604849340144 4125620244819 FILE: Number of bytes written
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/FILE_BYTES_WRITTEN>
3031784709196
>>> 2616689890216 5648474599412 FILE: Number of large read operations
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/FILE_LARGE_READ_OPS>
0
>>> 0 0 FILE: Number of read operations
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/FILE_READ_OPS>
0
>>> 0 0 FILE: Number of write operations
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/FILE_WRITE_OPS>
0
>>> 0 0 WASB: Number of bytes read
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/WASB_BYTES_READ>
186405294283
>>> 0 186405294283 WASB: Number of bytes written
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/WASB_BYTES_WRITTEN>
0
>>> 363027342839 363027342839 WASB: Number of large read operations
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/WASB_LARGE_READ_OPS>
0
>>> 0 0 WASB: Number of read operations
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/WASB_READ_OPS>
0
>>> 0 0 WASB: Number of write operations
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.FileSystemCounter/WASB_WRITE_OPS>
0
>>> 0 0
>>> Job Counters
>>> Name
>>> Map
>>> Reduce
>>> Total
>>> Launched map tasks
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/TOTAL_LAUNCHED_MAPS>
0
>>> 0 348 Launched reduce tasks
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/TOTAL_LAUNCHED_REDUCES>
0
>>> 0 9 Rack-local map tasks
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/RACK_LOCAL_MAPS>
0
>>> 0 348 Total megabyte-seconds taken by all map tasks
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/MB_MILLIS_MAPS>
0
>>> 0 460560315648 Total megabyte-seconds taken by all reduce tasks
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/MB_MILLIS_REDUCES>
0
>>> 0 158604449280 Total time spent by all map tasks (ms)
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/MILLIS_MAPS>
0
>>> 0 599687911 Total time spent by all maps in occupied slots (ms)
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/SLOTS_MILLIS_MAPS>
0
>>> 0 599687911 Total time spent by all reduce tasks (ms)
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/MILLIS_REDUCES>
0
>>> 0 103258105 Total time spent by all reduces in occupied slots (ms)
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/SLOTS_MILLIS_REDUCES>
0
>>> 0 206516210 Total vcore-seconds taken by all map tasks
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/VCORES_MILLIS_MAPS>
0
>>> 0 599687911 Total vcore-seconds taken by all reduce tasks
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.JobCounter/VCORES_MILLIS_REDUCES>
0
>>> 0 103258105
>>> Map-Reduce Framework
>>> Name
>>> Map
>>> Reduce
>>> Total
>>> Combine input records
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/COMBINE_INPUT_RECORDS>
0
>>> 0 0 Combine output records
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/COMBINE_OUTPUT_RECORDS>
0
>>> 0 0 CPU time spent (ms)
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/CPU_MILLISECONDS>
162773540
>>> 90154160 252927700 Failed Shuffles
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/FAILED_SHUFFLE>
0
>>> 0 0 GC time elapsed (ms)
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/GC_TIME_MILLIS>
7667781
>>> 1607188 9274969 Input split bytes
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/SPLIT_RAW_BYTES>
52548
>>> 0 52548 Map input records
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/MAP_INPUT_RECORDS>
861890673
>>> 0 861890673 Map output bytes
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/MAP_OUTPUT_BYTES>
1488284643774
>>> 0 1488284643774 Map output materialized bytes
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/MAP_OUTPUT_MATERIALIZED_BYTES>
1515865164102
>>> 0 1515865164102 Map output records
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/MAP_OUTPUT_RECORDS>
13790250768
>>> 0 13790250768 Merged Map outputs
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/MERGED_MAP_OUTPUTS>
0
>>> 3132 3132 Physical memory (bytes) snapshot
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/PHYSICAL_MEMORY_BYTES>
192242380800
>>> 4546826240 196789207040 Reduce input groups
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/REDUCE_INPUT_GROUPS>
0
>>> 861890673 861890673 Reduce input records
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/REDUCE_INPUT_RECORDS>
0
>>> 13790250768 13790250768 Reduce output records
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/REDUCE_OUTPUT_RECORDS>
0
>>> 13790250768 13790250768 Reduce shuffle bytes
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/REDUCE_SHUFFLE_BYTES>
0
>>> 1515865164102 1515865164102 Shuffled Maps
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/SHUFFLED_MAPS>
0
>>> 3132 3132 Spilled Records
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/SPILLED_RECORDS>
27580501536
>>> 23694179168 51274680704 Total committed heap usage (bytes)
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/COMMITTED_HEAP_BYTES>
186401685504
>>> 3023044608 189424730112 Virtual memory (bytes) snapshot
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.TaskCounter/VIRTUAL_MEMORY_BYTES>
537370951680
>>> 19158048768 556529000448
>>> Phoenix MapReduce Import
>>> Name
>>> Map
>>> Reduce
>>> Total
>>> Upserts Done
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/Phoenix%20MapReduce%20Import/Upserts%20Done>
861890673
>>> 0 861890673
>>> Shuffle Errors
>>> Name
>>> Map
>>> Reduce
>>> Total
>>> BAD_ID
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/Shuffle%20Errors/BAD_ID>
0
>>> 0 0 CONNECTION
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/Shuffle%20Errors/CONNECTION>
0
>>> 0 0 IO_ERROR
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/Shuffle%20Errors/IO_ERROR>
0
>>> 0 0 WRONG_LENGTH
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/Shuffle%20Errors/WRONG_LENGTH>
0
>>> 0 0 WRONG_MAP
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/Shuffle%20Errors/WRONG_MAP>
0
>>> 0 0 WRONG_REDUCE
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/Shuffle%20Errors/WRONG_REDUCE>
0
>>> 0 0
>>> File Input Format Counters
>>> Name
>>> Map
>>> Reduce
>>> Total
>>> Bytes Read
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter/BYTES_READ>
186395934997
>>> 0 186395934997
>>> File Output Format Counters
>>> Name
>>> Map
>>> Reduce
>>> Total
>>> Bytes Written
>>> <http://headnode0.ctlynvnzlysu3nnyyhqmcwjbee.gx.internal.cloudapp.net:19888/jobhistory/singlejobcounter/job_1442389862209_0002/org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter/BYTES_WRITTEN>
0
>>> 363027342839 363027342839
>>>
>>> On 16 September 2015 at 11:46, Gabriel Reid <gabriel.reid@gmail.com>
>>> wrote:
>>>
>>>> Can you view (and post) the job counters values from the import job?
>>>> These should be visible in the job history server.
>>>>
>>>> Also, did you see the import tool exit successfully (in the terminal
>>>> where you started it?)
>>>>
>>>> - Gabriel
>>>>
>>>> On Wed, Sep 16, 2015 at 6:24 PM, Gaurav Kanade <gaurav.kanade@gmail.com>
>>>> wrote:
>>>> > Hi guys
>>>> >
>>>> > I was able to get this to work after using bigger VMs for data nodes;
>>>> > however now the bigger problem I am facing is after my MR job
>>>> completes
>>>> > successfully I am not seeing any rows loaded in my table (count shows
>>>> 0 both
>>>> > via phoenix and hbase)
>>>> >
>>>> > Am I missing something simple ?
>>>> >
>>>> > Thanks
>>>> > Gaurav
>>>> >
>>>> >
>>>> > On 12 September 2015 at 11:16, Gabriel Reid <gabriel.reid@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> Around 1400 mappers sounds about normal to me -- I assume your block
>>>> >> size on HDFS is 128 MB, which works out to 1500 mappers for 200
GB of
>>>> >> input.
>>>> >>
>>>> >> To add to what Krishna asked, can you be a bit more specific on
what
>>>> >> you're seeing (in log files or elsewhere) which leads you to believe
>>>> >> the data nodes are running out of capacity? Are map tasks failing?
>>>> >>
>>>> >> If this is indeed a capacity issue, one thing you should ensure
is
>>>> >> that map output comression is enabled. This doc from Cloudera
>>>> explains
>>>> >> this (and the same information applies whether you're using CDH
or
>>>> >> not) -
>>>> >>
>>>> http://www.cloudera.com/content/cloudera/en/documentation/cdh4/latest/CDH4-Installation-Guide/cdh4ig_topic_23_3.html
>>>> >>
>>>> >> In any case, apart from that there isn't any basic thing that you're
>>>> >> probably missing, so any additional information that you can supply
>>>> >> about what you're running into would be useful.
>>>> >>
>>>> >> - Gabriel
>>>> >>
>>>> >>
>>>> >> On Sat, Sep 12, 2015 at 2:17 AM, Krishna <research800@gmail.com>
>>>> wrote:
>>>> >> > 1400 mappers on 9 nodes is about 155 mappers per datanode which
>>>> sounds
>>>> >> > high
>>>> >> > to me. There are very few specifics in your mail. Are you using
>>>> YARN?
>>>> >> > Can
>>>> >> > you provide details like table structure, # of rows & columns,
>>>> etc. Do
>>>> >> > you
>>>> >> > have an error stack?
>>>> >> >
>>>> >> >
>>>> >> > On Friday, September 11, 2015, Gaurav Kanade <
>>>> gaurav.kanade@gmail.com>
>>>> >> > wrote:
>>>> >> >>
>>>> >> >> Hi All
>>>> >> >>
>>>> >> >> I am new to Apache Phoenix (and relatively new to MR in
general)
>>>> but I
>>>> >> >> am
>>>> >> >> trying a bulk insert of a 200GB tar separated file in an
HBase
>>>> table.
>>>> >> >> This
>>>> >> >> seems to start off fine and kicks off about ~1400 mappers
and 9
>>>> >> >> reducers (I
>>>> >> >> have 9 data nodes in my setup).
>>>> >> >>
>>>> >> >> At some point I seem to be running into problems with this
>>>> process as
>>>> >> >> it
>>>> >> >> seems the data nodes run out of capacity (from what I can
see my
>>>> data
>>>> >> >> nodes
>>>> >> >> have 400GB local space). It does seem that certain reducers
eat
>>>> up most
>>>> >> >> of
>>>> >> >> the capacity on these - thus slowing down the process to
a crawl
>>>> and
>>>> >> >> ultimately leading to Node Managers complaining that Node
Health
>>>> is bad
>>>> >> >> (log-dirs and local-dirs are bad)
>>>> >> >>
>>>> >> >> Is there some inherent setting I am missing that I need
to set up
>>>> for
>>>> >> >> the
>>>> >> >> particular job ?
>>>> >> >>
>>>> >> >> Any pointers would be appreciated
>>>> >> >>
>>>> >> >> Thanks
>>>> >> >>
>>>> >> >> --
>>>> >> >> Gaurav Kanade,
>>>> >> >> Software Engineer
>>>> >> >> Big Data
>>>> >> >> Cloud and Enterprise Division
>>>> >> >> Microsoft
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Gaurav Kanade,
>>>> > Software Engineer
>>>> > Big Data
>>>> > Cloud and Enterprise Division
>>>> > Microsoft
>>>>
>>>
>>>
>>>
>>> --
>>> Gaurav Kanade,
>>> Software Engineer
>>> Big Data
>>> Cloud and Enterprise Division
>>> Microsoft
>>>
>>
>>
>
>
> --
> Gaurav Kanade,
> Software Engineer
> Big Data
> Cloud and Enterprise Division
> Microsoft
>



-- 
Gaurav Kanade,
Software Engineer
Big Data
Cloud and Enterprise Division
Microsoft

Mime
View raw message