From user-return-5136-apmail-phoenix-user-archive=phoenix.apache.org@phoenix.apache.org Wed Mar 30 23:28:41 2016 Return-Path: X-Original-To: apmail-phoenix-user-archive@minotaur.apache.org Delivered-To: apmail-phoenix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0C3B119E1A for ; Wed, 30 Mar 2016 23:28:41 +0000 (UTC) Received: (qmail 14601 invoked by uid 500); 30 Mar 2016 23:28:35 -0000 Delivered-To: apmail-phoenix-user-archive@phoenix.apache.org Received: (qmail 14551 invoked by uid 500); 30 Mar 2016 23:28:35 -0000 Mailing-List: contact user-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@phoenix.apache.org Delivered-To: mailing list user@phoenix.apache.org Received: (qmail 14541 invoked by uid 99); 30 Mar 2016 23:28:35 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Mar 2016 23:28:35 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 63645C17C9 for ; Wed, 30 Mar 2016 23:28:35 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.121 X-Spam-Level: X-Spam-Status: No, score=-0.121 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Qpb39cv5A9iq for ; Wed, 30 Mar 2016 23:28:33 +0000 (UTC) Received: from mail-yw0-f180.google.com (mail-yw0-f180.google.com [209.85.161.180]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 8E9E85F24D for ; Wed, 30 Mar 2016 23:28:32 +0000 (UTC) Received: by mail-yw0-f180.google.com with SMTP id g3so77770262ywa.3 for ; Wed, 30 Mar 2016 16:28:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-transfer-encoding; bh=JoFkyQtfFJf5n/IKZXsBJWE8Wh2jCXNOweLJwB8SbvI=; b=EziUHTszMaw2gBdsHBN5m7aSiUs55hhe3SZXn13omDeNIY5jJzQOUGcPQWuCmkgiH6 +aH5Ewd3DDbjpYdZ+NHke0nncG0h5FqV9O/essVvHP0Z1yZ/ZF7UrEpWiMng8Ss6Og/H 968krEL1tf0TABck7lzIZkDhvUs4QOBnlVpuZMywxgM5cPVVOf/rtdYFc20irRSY2nDN Z2TzCtjLiekhiGtaUMHATnnOdvR07Spnmd6KQCantOPt+VGTC17I57SLcqOUdnl277JX v2wJKSgcBBxRlgVvzvOa2DBBFCskY2GEU6+WkWgBJkQVSP2/KPGL2VH9NmjdXfSx0G94 OH1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:content-transfer-encoding; bh=JoFkyQtfFJf5n/IKZXsBJWE8Wh2jCXNOweLJwB8SbvI=; b=mS36Eb8vSgh13/Lt7pS4uIBcPq/L6jKpMgLylgx07cSXw2dq682I7pUOCKDBllFkDW sYslvHgBNLXieZwk2lNhEkNuGEN6iImut7ASXsehpsNb+nkiBOY3jF+nEeCvLzRaj0Dv KEZxsgCXoisAoJjp9LPM31vDf5eC8iSV350qmNfi8CSlf6QJflbL8ps5IlYlY+4EWBro geLjRt1ZLLmYqGZJcqSCFSglHGR2HHX1II2DtU7G3XLxExUnSBOH99djCTrer8L1e1nl EO1LJKTGuEnhKeeNADqgbWC69bB4jXdKzWkh3E6TklcOmzatyKi+NXGRISpjRYEc2NG+ buuA== X-Gm-Message-State: AD7BkJJSKGcYNswaVsarPQkVosy0ean3KjAhq1V41zYG11Jk5oJ+dxu+MDIdRaOQgtrjMeSTdFgWoHmWH6PZtw== MIME-Version: 1.0 X-Received: by 10.129.113.68 with SMTP id m65mr5516479ywc.244.1459380511529; Wed, 30 Mar 2016 16:28:31 -0700 (PDT) Sender: sergey.soldatov@gmail.com Received: by 10.37.207.85 with HTTP; Wed, 30 Mar 2016 16:28:31 -0700 (PDT) In-Reply-To: References: <943dcf94da1d46179b62181270d7db33@ES03AMSNLNT.srn.sandia.gov> Date: Wed, 30 Mar 2016 16:28:31 -0700 X-Google-Sender-Auth: 8yfj0gBGKOkK4eI2LGx4Ix6jGMw Message-ID: Subject: Re: [EXTERNAL] RE: Problem Bulk Loading CSV with Empty Value at End of Row From: Sergey Soldatov To: user@phoenix.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Jon, One of the parameters for CsvBulkLoadTool is -e that specify the escape symbol. You may specify a symbol which is not supposed to be in the data if you don't need the support for escaped sequences. Please note, that escaped characters are not supported in the command line (they will as soon as PHOENIX-1523 will be accepted). Thanks, Sergey On Wed, Mar 30, 2016 at 4:05 PM, Cox, Jonathan A wrote: > To add a little more detail on this issue, the real problems appears to b= e > that a CSV containing the =E2=80=9C\=E2=80=9D character is being interpre= ted as an escape > sequence by Phoenix (java.lang.String). So I happened to have a row where= a > =E2=80=9C\=E2=80=9D appeared directly before my delimiter. Therefore, my = delimiter was > escaped and ignored. > > > > I=E2=80=99m wondering if this is desirable behavior. Should the CSV be al= lowed to > contain escape sequences, or should the ASCII text be interpreted directl= y > as it is? In other words, if you want a tab (\t), it should just be ASCII > 0x09 in the file (or whatever the latest and greatest text format is thes= e > days). > > > > From: Cox, Jonathan A [mailto:jacox@sandia.gov] > Sent: Wednesday, March 30, 2016 4:41 PM > To: user@phoenix.apache.org > Subject: [EXTERNAL] RE: Problem Bulk Loading CSV with Empty Value at End = of > Row > > > > Actually, it seems that the line causing my problem really was missing a > column. I checked the behavior of StringToArrayConverter in > org.apache.phoenix.util.csv, and it does not exhibit such behavior. > > > > So the fault is on my end. > > > > Thanks > > > > From: Cox, Jonathan A > Sent: Wednesday, March 30, 2016 3:36 PM > To: 'user@phoenix.apache.org' > Subject: Problem Bulk Loading CSV with Empty Value at End of Row > > > > I am using the CsvBulkLoaderTool to ingest a tab separated file that can > contain empty columns. The problem is that the loader incorrectly interpr= ets > an empty last column as a non-existent column (instead of as an null entr= y). > > > > For example, imagine I have a comma separated CSV with the following form= at: > > key,username,password,gender,position,age,school,favorite_color > > > > Now, let=E2=80=99s say my CSV file contains the following row, where the = gender > field is missing. This will load correctly: > > *#Ssj289,joeblow,sk29ssh, ,CEO,102,MIT,blue > > > > However, if the missing field happens to be the last entry (favorite_colo= r), > it complains that there are only 7 of 8 required columns present: > > *#Ssj289,joeblow,sk29ssh,female ,CEO,102,MIT, > > > > This behavior will throw an error and fail to load the entire CSV file. A= ny > pointers on how I can modify the source to have Phoenix interpret > as an empty/null last column? > > > > Thanks, > > Jon > > (actual error is pasted below) > > > > > > java.lang.Exception: java.lang.RuntimeException: > java.lang.IllegalArgumentException: CSV record does not have enough value= s > (has 26, but needs 27) > > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:= 462) > > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) > > Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException= : > CSV record does not have enough values (has 26, but needs 27) > > at > org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToByte= sWritableMapper.java:197) > > at > org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToByte= sWritableMapper.java:72) > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784= ) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobR= unner.java:243) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java= :1142) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:617) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.lang.IllegalArgumentException: CSV record does not have > enough values (has 26, but needs 27) > > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.j= ava:74) > > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.j= ava:44) > > at > org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133) > > at > org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToByte= sWritableMapper.java:166) > > ... 10 more > > 16/03/30 15:01:01 INFO mapreduce.Job: Job job_local1507432235_0