phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Soldatov <sergeysolda...@gmail.com>
Subject Re: [EXTERNAL] RE: Problem Bulk Loading CSV with Empty Value at End of Row
Date Wed, 30 Mar 2016 23:28:31 GMT
Hi Jon,

One of the parameters for CsvBulkLoadTool is -e that specify the
escape symbol. You may specify a symbol which is not supposed to be in
the data if you don't need the support for escaped sequences. Please
note, that escaped characters are not supported in the command line
(they will as soon as  PHOENIX-1523 will be accepted).

Thanks,
Sergey

On Wed, Mar 30, 2016 at 4:05 PM, Cox, Jonathan A <jacox@sandia.gov> wrote:
> To add a little more detail on this issue, the real problems appears to be
> that a CSV containing the “\” character is being interpreted as an escape
> sequence by Phoenix (java.lang.String). So I happened to have a row where a
> “\” appeared directly before my delimiter. Therefore, my delimiter was
> escaped and ignored.
>
>
>
> I’m wondering if this is desirable behavior. Should the CSV be allowed to
> contain escape sequences, or should the ASCII text be interpreted directly
> as it is? In other words, if you want a tab (\t), it should just be ASCII
> 0x09 in the file (or whatever the latest and greatest text format is these
> days).
>
>
>
> From: Cox, Jonathan A [mailto:jacox@sandia.gov]
> Sent: Wednesday, March 30, 2016 4:41 PM
> To: user@phoenix.apache.org
> Subject: [EXTERNAL] RE: Problem Bulk Loading CSV with Empty Value at End of
> Row
>
>
>
> Actually, it seems that the line causing my problem really was missing a
> column. I checked the behavior of StringToArrayConverter in
> org.apache.phoenix.util.csv, and it does not exhibit such behavior.
>
>
>
> So the fault is on my end.
>
>
>
> Thanks
>
>
>
> From: Cox, Jonathan A
> Sent: Wednesday, March 30, 2016 3:36 PM
> To: 'user@phoenix.apache.org'
> Subject: Problem Bulk Loading CSV with Empty Value at End of Row
>
>
>
> I am using the CsvBulkLoaderTool to ingest a tab separated file that can
> contain empty columns. The problem is that the loader incorrectly interprets
> an empty last column as a non-existent column (instead of as an null entry).
>
>
>
> For example, imagine I have a comma separated CSV with the following format:
>
> key,username,password,gender,position,age,school,favorite_color
>
>
>
> Now, let’s say my CSV file contains the following row, where the gender
> field is missing. This will load correctly:
>
> *#Ssj289,joeblow,sk29ssh, ,CEO,102,MIT,blue<new line>
>
>
>
> However, if the missing field happens to be the last entry (favorite_color),
> it complains that there are only 7 of 8 required columns present:
>
> *#Ssj289,joeblow,sk29ssh,female ,CEO,102,MIT, <new line>
>
>
>
> This behavior will throw an error and fail to load the entire CSV file. Any
> pointers on how I can modify the source to have Phoenix interpret
> <delimiter><newline> as an empty/null last column?
>
>
>
> Thanks,
>
> Jon
>
> (actual error is pasted below)
>
>
>
>
>
> java.lang.Exception: java.lang.RuntimeException:
> java.lang.IllegalArgumentException: CSV record does not have enough values
> (has 26, but needs 27)
>
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
>
> Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException:
> CSV record does not have enough values (has 26, but needs 27)
>
>         at
> org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:197)
>
>         at
> org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:72)
>
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
>
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
>         at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.lang.IllegalArgumentException: CSV record does not have
> enough values (has 26, but needs 27)
>
>         at
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:74)
>
>         at
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:44)
>
>         at
> org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133)
>
>         at
> org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:166)
>
>         ... 10 more
>
> 16/03/30 15:01:01 INFO mapreduce.Job: Job job_local1507432235_0

Mime
View raw message