phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From idosenesh <ido.ad...@gmail.com>
Subject Re: Phoenix CSV bulk loader errors
Date Sun, 26 Nov 2017 09:59:59 GMT
Hey Vaghawan Ojha thanks for your comment.
The path is not to the source CSV file, the CSV is in S3, the flow is as
follows:
CsvBulkLoad is supposed to create the HFiles in /tmp hdfs directory, and in
the second phase to load them into hbase. Thsese are the missing files I get
in this error



Vaghawan Ojha wrote
> Hi, Are you sure you are pointing to the right path and file? Because the
> error says
> Caused by: java.io.FileNotFoundException: File does not exist:
> hdfs://*
> Please make sure the csv file is there.
> 
> On Sunday, November 26, 2017, idosenesh &lt;

> ido.ad.se@

> &gt; wrote:
> 
>> Im trying to bulk load into Phoenix using the CsvBulkLoadTool.
>> Im running on amazon EMR cluster with 3 i3x2large core nodes, and default
>> phoenix/hbase/emr configurations.
>>
>> I've successfully ran the job 3 times (i.e. succesfully inserted about
>> 250G
>> * 3 sized csv files) but the 4th run yields the following error:
>> 2017-11-23 21:53:07,962 FATAL [IPC Server handler 7 on 39803]
>> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
>> attempt_1511332372804_0016_m_002760_1 - exited :
>> java.lang.IllegalArgumentException: Can't read partitions file
>>         at
>> org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(
>> TotalOrderPartitioner.java:116)
>>         at org.apache.hadoop.util.ReflectionUtils.setConf(
>> ReflectionUtils.java:76)
>>         at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(
>> ReflectionUtils.java:136)
>>         at
>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<
>> init>(MapTask.java:711)
>>         at
>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:422)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(
>> UserGroupInformation.java:1698)
>>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>> Caused by: java.io.FileNotFoundException: File does not exist:
>> hdfs://***************:8020/mnt/var/lib/hadoop/tmp/
>> partitions_66f309d7-fe46-440a-99bb-fd8f3b40099e
>>         at
>> org.apache.hadoop.hdfs.DistributedFileSystem$22.
>> doCall(DistributedFileSystem.java:1309)
>>         at
>> org.apache.hadoop.hdfs.DistributedFileSystem$22.
>> doCall(DistributedFileSystem.java:1301)
>>         at
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(
>> FileSystemLinkResolver.java:81)
>>         at
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(
>> DistributedFileSystem.java:1317)
>>         at org.apache.hadoop.io.SequenceFile$Reader.
> <init>
> (
>> SequenceFile.java:1830)
>>         at org.apache.hadoop.io.SequenceFile$Reader.
> <init>
> (
>> SequenceFile.java:1853)
>>         at
>> org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.
>> readPartitions(TotalOrderPartitioner.java:301)
>>         at
>> org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(
>> TotalOrderPartitioner.java:88)
>>
>>
>> My hdfs utilization is not high:
>> [hadoop@******** /]$ hdfs dfsadmin -report
>> Configured Capacity: 5679504728064 (5.17 TB)
>> Present Capacity: 5673831846248 (5.16 TB)
>> DFS Remaining: 5333336719720 (4.85 TB)
>> DFS Used: 340495126528 (317.11 GB)
>> DFS Used%: 6.00%
>> Under replicated blocks: 0
>> Blocks with corrupt replicas: 0
>> Missing blocks: 0
>> Missing blocks (with replication factor 1): 0
>>
>>
>>
>> Im running the following command:
>> HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/usr/lib/hbase/conf
>> hadoop jar /usr/lib/phoenix/phoenix-4.11.0-HBase-1.3-client.jar
>> org.apache.phoenix.mapreduce.CsvBulkLoadTool -Dfs.permissions.umask-mode=
>> 000
>> --table KEYWORDS_COMBINED_SALTED -d '|' --ignore-errors --input
>> s3://path/to/my/bucket/file.csv
>>
>>
>> The data on the last table is structurally the same as inserted before.
>>
>> Any ideas?
>>
>>
>>
>> --
>> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>>





--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/

Mime
View raw message