phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vaghawan Ojha <vaghawan...@gmail.com>
Subject Re: Phoenix CSV bulk loader errors
Date Sun, 26 Nov 2017 02:00:39 GMT
Hi, Are you sure you are pointing to the right path and file? Because the
error says
Caused by: java.io.FileNotFoundException: File does not exist:
hdfs://*
Please make sure the csv file is there.

On Sunday, November 26, 2017, idosenesh <ido.ad.se@gmail.com> wrote:

> Im trying to bulk load into Phoenix using the CsvBulkLoadTool.
> Im running on amazon EMR cluster with 3 i3x2large core nodes, and default
> phoenix/hbase/emr configurations.
>
> I've successfully ran the job 3 times (i.e. succesfully inserted about 250G
> * 3 sized csv files) but the 4th run yields the following error:
> 2017-11-23 21:53:07,962 FATAL [IPC Server handler 7 on 39803]
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
> attempt_1511332372804_0016_m_002760_1 - exited :
> java.lang.IllegalArgumentException: Can't read partitions file
>         at
> org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(
> TotalOrderPartitioner.java:116)
>         at org.apache.hadoop.util.ReflectionUtils.setConf(
> ReflectionUtils.java:76)
>         at
> org.apache.hadoop.util.ReflectionUtils.newInstance(
> ReflectionUtils.java:136)
>         at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<
> init>(MapTask.java:711)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1698)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.io.FileNotFoundException: File does not exist:
> hdfs://***************:8020/mnt/var/lib/hadoop/tmp/
> partitions_66f309d7-fe46-440a-99bb-fd8f3b40099e
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem$22.
> doCall(DistributedFileSystem.java:1309)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem$22.
> doCall(DistributedFileSystem.java:1301)
>         at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(
> FileSystemLinkResolver.java:81)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(
> DistributedFileSystem.java:1317)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(
> SequenceFile.java:1830)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(
> SequenceFile.java:1853)
>         at
> org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.
> readPartitions(TotalOrderPartitioner.java:301)
>         at
> org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(
> TotalOrderPartitioner.java:88)
>
>
> My hdfs utilization is not high:
> [hadoop@******** /]$ hdfs dfsadmin -report
> Configured Capacity: 5679504728064 (5.17 TB)
> Present Capacity: 5673831846248 (5.16 TB)
> DFS Remaining: 5333336719720 (4.85 TB)
> DFS Used: 340495126528 (317.11 GB)
> DFS Used%: 6.00%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> Missing blocks (with replication factor 1): 0
>
>
>
> Im running the following command:
> HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/usr/lib/hbase/conf
> hadoop jar /usr/lib/phoenix/phoenix-4.11.0-HBase-1.3-client.jar
> org.apache.phoenix.mapreduce.CsvBulkLoadTool -Dfs.permissions.umask-mode=
> 000
> --table KEYWORDS_COMBINED_SALTED -d '|' --ignore-errors --input
> s3://path/to/my/bucket/file.csv
>
>
> The data on the last table is structurally the same as inserted before.
>
> Any ideas?
>
>
>
> --
> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>

Mime
View raw message