phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cox, Jonathan A" <ja...@sandia.gov>
Subject RE: [EXTERNAL] Re: Java Out of Memory Errors with CsvBulkLoadTool
Date Fri, 18 Dec 2015 20:35:47 GMT
Hi Gabriel,

The Hadoop version is 2.6.2.

-Jonathan

-----Original Message-----
From: Gabriel Reid [mailto:gabriel.reid@gmail.com] 
Sent: Friday, December 18, 2015 11:58 AM
To: user@phoenix.apache.org
Subject: Re: [EXTERNAL] Re: Java Out of Memory Errors with CsvBulkLoadTool

Hi Jonathan,

Which Hadoop version are you using? I'm actually wondering if mapred.child.java.opts is still
supported in Hadoop 2.x (I think it has been replaced by mapreduce.map.java.opts and mapreduce.reduce.java.opts).

The HADOOP_CLIENT_OPTS won't make a difference if you're running in
(pseudo) distributed mode, as separate JVMs will be started up for the tasks.

- Gabriel


On Fri, Dec 18, 2015 at 7:33 PM, Cox, Jonathan A <jacox@sandia.gov> wrote:
> Gabriel,
>
> I am running the job on a single machine in pseudo distributed mode. I've set the max
Java heap size in two different ways (just to be sure):
>
> export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx48g"
>
> and also in mapred-site.xml:
>   <property>
>     <name>mapred.child.java.opts</name>
>     <value>-Xmx48g</value>
>   </property>
>
> -----Original Message-----
> From: Gabriel Reid [mailto:gabriel.reid@gmail.com]
> Sent: Friday, December 18, 2015 8:17 AM
> To: user@phoenix.apache.org
> Subject: [EXTERNAL] Re: Java Out of Memory Errors with CsvBulkLoadTool
>
> Hi Jonathan,
>
> Sounds like something is very wrong here.
>
> Are you running the job on an actual cluster, or are you using the local job tracker
(i.e. running the import job on a single computer).
>
> Normally an import job, regardless of the size of the input, should run with map and
reduce tasks that have a standard (e.g. 2GB) heap size per task (although there will typically
be multiple tasks started on the cluster). There shouldn't be any need to have anything like
a 48GB heap.
>
> If you are running this on an actual cluster, could you elaborate on where/how you're
setting the 48GB heap size setting?
>
> - Gabriel
>
>
> On Fri, Dec 18, 2015 at 1:46 AM, Cox, Jonathan A <jacox@sandia.gov> wrote:
>> I am trying to ingest a 575MB CSV file with 192,444 lines using the 
>> CsvBulkLoadTool MapReduce job. When running this job, I find that I 
>> have to boost the max Java heap space to 48GB (24GB fails with Java 
>> out of memory errors).
>>
>>
>>
>> I’m concerned about scaling issues. It seems like it shouldn’t 
>> require between 24-48GB of memory to ingest a 575MB file. However, I 
>> am pretty new to Hadoop/HBase/Phoenix, so maybe I am off base here.
>>
>>
>>
>> Can anybody comment on this observation?
>>
>>
>>
>> Thanks,
>>
>> Jonathan
Mime
View raw message