phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cox, Jonathan A" <ja...@sandia.gov>
Subject RE: [EXTERNAL] RE: Problem Bulk Loading CSV with Empty Value at End of Row
Date Wed, 30 Mar 2016 23:05:47 GMT
To add a little more detail on this issue, the real problems appears to be that a CSV containing
the "\" character is being interpreted as an escape sequence by Phoenix (java.lang.String).
So I happened to have a row where a "\" appeared directly before my delimiter. Therefore,
my delimiter was escaped and ignored.

I'm wondering if this is desirable behavior. Should the CSV be allowed to contain escape sequences,
or should the ASCII text be interpreted directly as it is? In other words, if you want a tab
(\t), it should just be ASCII 0x09 in the file (or whatever the latest and greatest text format
is these days).

From: Cox, Jonathan A [mailto:jacox@sandia.gov]
Sent: Wednesday, March 30, 2016 4:41 PM
To: user@phoenix.apache.org
Subject: [EXTERNAL] RE: Problem Bulk Loading CSV with Empty Value at End of Row

Actually, it seems that the line causing my problem really was missing a column. I checked
the behavior of StringToArrayConverter in org.apache.phoenix.util.csv, and it does not exhibit
such behavior.

So the fault is on my end.

Thanks

From: Cox, Jonathan A
Sent: Wednesday, March 30, 2016 3:36 PM
To: 'user@phoenix.apache.org'
Subject: Problem Bulk Loading CSV with Empty Value at End of Row

I am using the CsvBulkLoaderTool to ingest a tab separated file that can contain empty columns.
The problem is that the loader incorrectly interprets an empty last column as a non-existent
column (instead of as an null entry).

For example, imagine I have a comma separated CSV with the following format:
key,username,password,gender,position,age,school,favorite_color

Now, let's say my CSV file contains the following row, where the gender field is missing.
This will load correctly:
*#Ssj289,joeblow,sk29ssh, ,CEO,102,MIT,blue<new line>

However, if the missing field happens to be the last entry (favorite_color), it complains
that there are only 7 of 8 required columns present:
*#Ssj289,joeblow,sk29ssh,female ,CEO,102,MIT, <new line>

This behavior will throw an error and fail to load the entire CSV file. Any pointers on how
I can modify the source to have Phoenix interpret <delimiter><newline> as an empty/null
last column?

Thanks,
Jon
(actual error is pasted below)


java.lang.Exception: java.lang.RuntimeException: java.lang.IllegalArgumentException: CSV record
does not have enough values (has 26, but needs 27)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: CSV record does
not have enough values (has 26, but needs 27)
        at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:197)
        at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:72)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: CSV record does not have enough values (has
26, but needs 27)
        at org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:74)
        at org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:44)
        at org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133)
        at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:166)
        ... 10 more
16/03/30 15:01:01 INFO mapreduce.Job: Job job_local1507432235_0

Mime
View raw message