phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "rubysina" <ru...@sina.com>
Subject Re: Error with lines ended with backslash when Bulk Data Loading
Date Fri, 09 Dec 2016 01:28:53 GMT
ok.  thank you.

but there's no parameter -e on page http://phoenix.apache.org/bulk_dataload.html
and, why the -g,–ignore-errors parameter doesn't work?  if there's some lines ended with
backslash, just ignore it, why fail?

there's always something error in txt files. why not ignore it? how?

and, if using -e parameter, what character should I use? 
seems that I must find a special character, but I don't know which is correct.
actually, I don't want to use any escape character. 
is there any special option like "escape off" or something else, so I can load anything without
treating any character as an escape letter.

some other products , like greenplum, do have such interesting setting when bulkloading txt
file: escape: 'OFF'

-----------------------------------------------------------------
quote on http://phoenix.apache.org/bulk_dataload.html
The following parameters can be used with the MapReduce loader.
Parameter     Description
-i,–input     Input CSV path (mandatory)
-t,–table     Phoenix table name (mandatory)
-a,–array-delimiter     Array element delimiter (optional)
-c,–import-columns     Comma-separated list of columns to be imported
-d,–delimiter     Input delimiter, defaults to comma
-g,–ignore-errors     Ignore input errors
-o,–output     Output path for temporary HFiles (optional)
-s,–schema     Phoenix schema name (optional)
-z,–zookeeper     Zookeeper quorum to connect to (optional)
-it,–index-table     Index table name to load (optional)


--------------------------------

From: Gabriel Reid <g...@gmail.com>
Subject: Re: Error with lines ended with backslash when Bulk Data Loading
Date: 2016-12-09 02:06 (+0800)
List: user@phoenix.apache.org
Hi

Backslash is the default escape character that is used for parsing CSV
data when running a bulk import, so it has a special meaning.

You can supply a different (custom) escape character with the -e or
--escape flag on the command line so that parsing your CSV files that
include backslashes like this will run properly.

- Gabriel

----- Original Message -----
From: "rubysina" <rubik@sina.com>
To: "user" <user@phoenix.apache.org>
Subject: Error with lines ended with backslash when Bulk Data Loading
Date: 2016-12-08 16:11

hi, I'm new to phoenix sql and here's a little problem. 

I'm following this page http://phoenix.apache.org/bulk_dataload.html
I just found that the MapReduce importer could not load file with lines ended with backslash
even with the -g parameter , i.e. ignore-errors, "java.io.IOException: EOF whilst processing
escape sequence"

but it's OK if the line contains backslash but not at the end of line, 

and there's no problem when using psql.py to load the same file.

why?  how?

thank you.



-----------------------------------------------------------------------------------------------
for example:


create table a(a char(100) primary key)

echo \\>a.csv
cat a.csv
\
hdfs dfs -put  a.csv  
...JsonBulkLoadTool  -g -t a  -i a.csv  
-- error
16/12/08 15:44:21 INFO mapreduce.Job: Task Id : attempt_1481093434027_0052_m_000000_0, Status
: FAILED
Error: java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: EOF whilst
processing escape sequence
        at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:202)
        at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:74)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: java.io.IOException: EOF whilst processing escape sequence
        at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398)
        at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407)
        at com.google.common.collect.Iterators.getNext(Iterators.java:890)
        at com.google.common.collect.Iterables.getFirst(Iterables.java:781)
        at org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:109)
        at org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:91)
        at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:161)
        ... 9 more



echo \\a>a.csv
cat a.csv
\a
hdfs dfs -rm  a.csv  
hdfs dfs -put  a.csv  
...JsonBulkLoadTool -g -t a  -i a.csv  
-- success


echo \\>a.csv
cat a.csv
\
psql.py -t A zoo a.csv 
CSV Upsert complete. 1 rows upserted
-- success


thank you.
Mime
View raw message