phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Import Delimiter
Date Fri, 07 Feb 2014 01:10:47 GMT
Might be a bug. Take a look at the CSVLoaderTest, as it has some testing
around custom delimiters. Maybe add a test case with a sample line from
your table to isolate the issue.

Patches welcome, of course.

Thanks,
James


On Thu, Feb 6, 2014 at 6:12 AM, Devin Pinkston
<dpinkston@technicacorp.com>wrote:

>  James,
>
>
>
> Looks like I'm on the right track, however I'm not sure why it is not
> accepting my delimiters.  I am using the TPC-H data set, so for instance
> here is what a line from customer.csv looks like:
>
>
>
> 6967|Customer#000006967|uMPce8nER9v3PCIcsZmNlSrCKcau6tJd4qe|13|23-816-949-8373|7865.21|MACHINERY|r
> pinto beans. regular multipliers detect carefully. carefully final
> instructions affix quickly. packages boost af|
>
>
>
> When I try to import the csv file into my table "CUSTOMER", it looks like
> psql is not liking the delimiters I pass in.  If I use the 3 numbers like
> in the usage below, I just get a wrong format error, but it at least
> attempts to import the data.  Any thoughts?
>
>
>
> ./psql.sh -t CUSTOMER -h
> C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT
> -d | localhost:2181 customer.csv
>
>
>
>
>
>
>
> Usage: psql [-t table-name] [-h comma-separated-column-names | in-line]
> [-d field-delimiter-char quote-char escape-char]<zookeeper>
> <path-to-sql-or-csv-file>...
>
>   By default, the name of the CSV file is used to determine the Phoenix
> table into which the CSV data is loaded
>
>   and the ordinal value of the columns determines the mapping.
>
>   -t overrides the table into which the CSV data is loaded
>
>   -h overrides the column names to which the CSV data maps
>
>      A special value of in-line indicating that the first line of the CSV
> file
>
>      determines the column to which the data maps.
>
>   -s uses strict mode by throwing an exception if a column name doesn't
> match during CSV loading.
>
>   -d uses custom delimiters for CSV loader, need to specify single char
> for field delimiter, phrase delimiter, and escape char.
>
>      number is NOT usually a delimiter and shall be taken as 1 -> ctrl A,
> 2 -> ctrl B ... 9 -> ctrl I.
>
> Examples:
>
>   psql localhost my_ddl.sql
>
>   psql localhost my_ddl.sql my_table.csv
>
>   psql -t my_table my_cluster:1825 my_table2012-Q3.csv
>
>   psql -t my_table -h col1,col2,col3 my_cluster:1825 my_table2012-Q3.csv
>
>   psql -t my_table -h col1,col2,col3 -d 1 2 3 my_cluster:1825
> my_table2012-Q3.csv
>
>
>
>
>
> Thanks
>
>
>
>
>
>
>
> *From:* Devin Pinkston [mailto:dpinkston@technicacorp.com]
> *Sent:* Thursday, February 06, 2014 8:41 AM
> *To:* user@phoenix.incubator.apache.org
> *Subject:* RE: Import Delimiter
>
>
>
> James,
>
>
>
> Interesting thanks for the info.  So if I were to import data containing
> pipe delimiters, I would have to use the non map-reduce bulk loader.  Are
> you referencing that sqlline would have to be used?
>
>
>
> Sorry I am trying to figure out how I can import these large flat files
> this way.
>
>
>
> Thank you.
>
>
>
> *From:* James Taylor [mailto:jamestaylor@apache.org<jamestaylor@apache.org>]
>
> *Sent:* Wednesday, February 05, 2014 8:25 PM
> *To:* user@phoenix.incubator.apache.org
> *Subject:* Re: Import Delimiter
>
>
>
> You're right. It was added to the non map-reduce bulk loader. This is the
> loader that loads local CSV files through the bin/psql.sh script. There's a
> -d option that was added in this pull request[1]. It would be nice to add
> this same functionality to our csv map-reduce bulk loader too if anyone is
> interested.
>
> Thanks,
> James
>
>
> [1] https://github.com/forcedotcom/phoenix/pull/514
>
> On Wed, Feb 5, 2014 at 9:35 AM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
>
> Hi James,
>
>
>
> I'm looking through the bulkload job, and it looks to me light this isn't
> configurable at the moment. Have a look at
> https://github.com/apache/incubator-phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/map/reduce/MapReduceJob.java#L136
>
>
>
> Is there something I'm missing? Perhaps I'm looking in the wrong place?
>
>
>
> Thanks,
>
> Nick
>
>
>
> On Wed, Feb 5, 2014 at 10:16 AM, Devin Pinkston <
> dpinkston@technicacorp.com> wrote:
>
> James,
>
>
>
> Thanks for the quick response.  Do you know what the argument or command
> is to pass in?
>
>
>
> For instance ./csv-bulk-loader.sh -delimiter '|'
>
>
>
> Thanks
>
>
>
> *From:* James Taylor [mailto:jamestaylor@apache.org]
> *Sent:* Wednesday, February 05, 2014 11:51 AM
> *To:* user@phoenix.incubator.apache.org
> *Subject:* Re: Import Delimiter
>
>
>
> Hello,
>
> The CSV map-reduce based bulk loader supports custom delimiters. Might
> need to be doc-ed, though.
>
> Thanks,
>
> James
>
> On Wednesday, February 5, 2014, Devin Pinkston <dpinkston@technicacorp.com>
> wrote:
>
> Hello,
>
>
>
> I am trying to import data into HBASE however I have '|' or pipe
> delimiters=  in my file instead of commas.  I don't see a way to pass in a
> different separator/delimiter with the jar.  What would be the best way to
> import data = like this?
>
>
>
> Thanks
>
>
>
> The information contained in this transmission may contain privileged and
> confidential information.
> It is intended only for the use of the person(s) named above.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination, distribution or duplication of this communication is
> strictly prohibited.
> If you are not the intended recipient, please contact the sender by reply
> e-mail and destroy all copies of the original message.
> Technica Corporation does not represent this e-mail to be free from any
> virus, fault or defect and it is therefore the responsibility of the
> recipient to first scan it for viruses, faults and defects.
> To reply to our e-mail administrator directly, please send an e-mail to
> postmaster@technicacorp.com. Thank you.
>
> The information contained in this transmission may contain privileged and
> confidential information.
> It is intended only for the use of the person(s) named above.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination, distribution or duplication of this communication is
> strictly prohibited.
> If you are not the intended recipient, please contact the sender by reply
> e-mail and destroy all copies of the original message.
> Technica Corporation does not represent this e-mail to be free from any
> virus, fault or defect and it is therefore the responsibility of the
> recipient to first scan it for viruses, faults and defects.
> To reply to our e-mail administrator directly, please send an e-mail to
> postmaster@technicacorp.com. Thank you.
>
>
>
>
>
> The information contained in this transmission may contain privileged and
> confidential information.
> It is intended only for the use of the person(s) named above.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination, distribution or duplication of this communication is
> strictly prohibited.
> If you are not the intended recipient, please contact the sender by reply
> e-mail and destroy all copies of the original message.
> Technica Corporation does not represent this e-mail to be free from any
> virus, fault or defect and it is therefore the responsibility of the
> recipient to first scan it for viruses, faults and defects.
> To reply to our e-mail administrator directly, please send an e-mail to
> postmaster@technicacorp.com. Thank you.
>
> The information contained in this transmission may contain privileged and
> confidential information.
> It is intended only for the use of the person(s) named above.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination, distribution or duplication of this communication is
> strictly prohibited.
> If you are not the intended recipient, please contact the sender by reply
> e-mail and destroy all copies of the original message.
> Technica Corporation does not represent this e-mail to be free from any
> virus, fault or defect and it is therefore the responsibility of the
> recipient to first scan it for viruses, faults and defects.
> To reply to our e-mail administrator directly, please send an e-mail to
> postmaster@technicacorp.com. Thank you.
>

Mime
View raw message