phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: Error with lines ended with backslash when Bulk Data Loading
Date Thu, 08 Dec 2016 18:06:51 GMT
Hi

Backslash is the default escape character that is used for parsing CSV
data when running a bulk import, so it has a special meaning.

You can supply a different (custom) escape character with the -e or
--escape flag on the command line so that parsing your CSV files that
include backslashes like this will run properly.

- Gabriel

On Thu, Dec 8, 2016 at 9:10 AM, rubysina <rubik@sina.com> wrote:
> hi, I'm new to phoenix sql and here's a little problem.
>
> I'm following this page http://phoenix.apache.org/bulk_dataload.html
> I just found that the MapReduce importer could not load file with lines
> ended with backslash
> even with the -g parameter , i.e. ignore-errors, "java.io.IOException: EOF
> whilst processing escape sequence"
>
> but it's OK if the line contains backslash but not at the end of line,
>
> and there's no problem when using psql.py to load the same file.
>
> why?  how?
>
> thank you.
>
>
>
> -----------------------------------------------------------------------------------------------
> for example:
>
>
> create table a(a char(100) primary key)
>
> echo \\>a.csv
> cat a.csv
> \
> hdfs dfs -put  a.csv
> ...JsonBulkLoadTool  -g -t a  -i a.csv
> -- error
> 16/12/08 15:44:21 INFO mapreduce.Job: Task Id :
> attempt_1481093434027_0052_m_000000_0, Status : FAILED
> Error: java.lang.RuntimeException: java.lang.RuntimeException:
> java.io.IOException: EOF whilst processing escape sequence
>         at
> org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:202)
>         at
> org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:74)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.RuntimeException: java.io.IOException: EOF whilst
> processing escape sequence
>         at
> org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398)
>         at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407)
>         at com.google.common.collect.Iterators.getNext(Iterators.java:890)
>         at com.google.common.collect.Iterables.getFirst(Iterables.java:781)
>         at
> org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:109)
>         at
> org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:91)
>         at
> org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:161)
>         ... 9 more
>
>
>
> echo \\a>a.csv
> cat a.csv
> \a
> hdfs dfs -rm  a.csv
> hdfs dfs -put  a.csv
> ...JsonBulkLoadTool -g -t a  -i a.csv
> -- success
>
>
> echo \\>a.csv
> cat a.csv
> \
> psql.py -t A zoo a.csv
> CSV Upsert complete. 1 rows upserted
> -- success
>
>
> thank you.

Mime
View raw message