phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From idosenesh <ido.ad...@gmail.com>
Subject Phoenix CSV bulk loader errors
Date Sat, 25 Nov 2017 22:30:00 GMT
Im trying to bulk load into Phoenix using the CsvBulkLoadTool.
Im running on amazon EMR cluster with 3 i3x2large core nodes, and default
phoenix/hbase/emr configurations.

I've successfully ran the job 3 times (i.e. succesfully inserted about 250G
* 3 sized csv files) but the 4th run yields the following error:
2017-11-23 21:53:07,962 FATAL [IPC Server handler 7 on 39803]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
attempt_1511332372804_0016_m_002760_1 - exited :
java.lang.IllegalArgumentException: Can't read partitions file
	at
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
	at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
	at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:711)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.FileNotFoundException: File does not exist:
hdfs://***************:8020/mnt/var/lib/hadoop/tmp/partitions_66f309d7-fe46-440a-99bb-fd8f3b40099e
	at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
	at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
	at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1830)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1853)
	at
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:301)
	at
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:88)


My hdfs utilization is not high:
[hadoop@******** /]$ hdfs dfsadmin -report
Configured Capacity: 5679504728064 (5.17 TB)
Present Capacity: 5673831846248 (5.16 TB)
DFS Remaining: 5333336719720 (4.85 TB)
DFS Used: 340495126528 (317.11 GB)
DFS Used%: 6.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0



Im running the following command:
HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/usr/lib/hbase/conf
hadoop jar /usr/lib/phoenix/phoenix-4.11.0-HBase-1.3-client.jar
org.apache.phoenix.mapreduce.CsvBulkLoadTool -Dfs.permissions.umask-mode=000
--table KEYWORDS_COMBINED_SALTED -d '|' --ignore-errors --input
s3://path/to/my/bucket/file.csv


The data on the last table is structurally the same as inserted before.

Any ideas?



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/

Mime
View raw message