phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cox, Jonathan A" <ja...@sandia.gov>
Subject RE: Problem Bulk Loading CSV with Empty Value at End of Row
Date Wed, 30 Mar 2016 22:41:23 GMT
Actually, it seems that the line causing my problem really was missing a column. I checked
the behavior of StringToArrayConverter in org.apache.phoenix.util.csv, and it does not exhibit
such behavior.

So the fault is on my end.

Thanks

From: Cox, Jonathan A
Sent: Wednesday, March 30, 2016 3:36 PM
To: 'user@phoenix.apache.org'
Subject: Problem Bulk Loading CSV with Empty Value at End of Row

I am using the CsvBulkLoaderTool to ingest a tab separated file that can contain empty columns.
The problem is that the loader incorrectly interprets an empty last column as a non-existent
column (instead of as an null entry).

For example, imagine I have a comma separated CSV with the following format:
key,username,password,gender,position,age,school,favorite_color

Now, let's say my CSV file contains the following row, where the gender field is missing.
This will load correctly:
*#Ssj289,joeblow,sk29ssh, ,CEO,102,MIT,blue<new line>

However, if the missing field happens to be the last entry (favorite_color), it complains
that there are only 7 of 8 required columns present:
*#Ssj289,joeblow,sk29ssh,female ,CEO,102,MIT, <new line>

This behavior will throw an error and fail to load the entire CSV file. Any pointers on how
I can modify the source to have Phoenix interpret <delimiter><newline> as an empty/null
last column?

Thanks,
Jon
(actual error is pasted below)


java.lang.Exception: java.lang.RuntimeException: java.lang.IllegalArgumentException: CSV record
does not have enough values (has 26, but needs 27)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: CSV record does
not have enough values (has 26, but needs 27)
        at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:197)
        at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:72)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: CSV record does not have enough values (has
26, but needs 27)
        at org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:74)
        at org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:44)
        at org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133)
        at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:166)
        ... 10 more
16/03/30 15:01:01 INFO mapreduce.Job: Job job_local1507432235_0

Mime
View raw message