phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devin Pinkston <dpinks...@technicacorp.com>
Subject RE: Import Delimiter
Date Thu, 06 Feb 2014 14:12:21 GMT
James,

Looks like I'm on the right track, however I'm not sure why it is not accepting my delimiters.
 I am using the TPC-H data set, so for instance here is what a line from customer.csv looks
like:


6967|Customer#000006967|uMPce8nER9v3PCIcsZmNlSrCKcau6tJd4qe|13|23-816-949-8373|7865.21|MACHINERY|r
pinto beans. regular multipliers detect carefully. carefully final instructions affix quickly.
packages boost af|

When I try to import the csv file into my table "CUSTOMER", it looks like psql is not liking
the delimiters I pass in.  If I use the 3 numbers like in the usage below, I just get a wrong
format error, but it at least attempts to import the data.  Any thoughts?


./psql.sh -t CUSTOMER -h C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT
-d | localhost:2181 customer.csv







Usage: psql [-t table-name] [-h comma-separated-column-names | in-line] [-d field-delimiter-char
quote-char escape-char]<zookeeper>  <path-to-sql-or-csv-file>...

  By default, the name of the CSV file is used to determine the Phoenix table into which the
CSV data is loaded

  and the ordinal value of the columns determines the mapping.

  -t overrides the table into which the CSV data is loaded

  -h overrides the column names to which the CSV data maps

     A special value of in-line indicating that the first line of the CSV file

     determines the column to which the data maps.

  -s uses strict mode by throwing an exception if a column name doesn't match during CSV loading.

  -d uses custom delimiters for CSV loader, need to specify single char for field delimiter,
phrase delimiter, and escape char.

     number is NOT usually a delimiter and shall be taken as 1 -> ctrl A, 2 -> ctrl
B ... 9 -> ctrl I.

Examples:

  psql localhost my_ddl.sql

  psql localhost my_ddl.sql my_table.csv

  psql -t my_table my_cluster:1825 my_table2012-Q3.csv

  psql -t my_table -h col1,col2,col3 my_cluster:1825 my_table2012-Q3.csv

  psql -t my_table -h col1,col2,col3 -d 1 2 3 my_cluster:1825 my_table2012-Q3.csv





Thanks



From: Devin Pinkston [mailto:dpinkston@technicacorp.com]
Sent: Thursday, February 06, 2014 8:41 AM
To: user@phoenix.incubator.apache.org
Subject: RE: Import Delimiter

James,

Interesting thanks for the info.  So if I were to import data containing pipe delimiters,
I would have to use the non map-reduce bulk loader.  Are you referencing that sqlline would
have to be used?

Sorry I am trying to figure out how I can import these large flat files this way.

Thank you.

From: James Taylor [mailto:jamestaylor@apache.org]
Sent: Wednesday, February 05, 2014 8:25 PM
To: user@phoenix.incubator.apache.org<mailto:user@phoenix.incubator.apache.org>
Subject: Re: Import Delimiter

You're right. It was added to the non map-reduce bulk loader. This is the loader that loads
local CSV files through the bin/psql.sh script. There's a -d option that was added in this
pull request[1]. It would be nice to add this same functionality to our csv map-reduce bulk
loader too if anyone is interested.
Thanks,
James

[1] https://github.com/forcedotcom/phoenix/pull/514
On Wed, Feb 5, 2014 at 9:35 AM, Nick Dimiduk <ndimiduk@gmail.com<mailto:ndimiduk@gmail.com>>
wrote:
Hi James,

I'm looking through the bulkload job, and it looks to me light this isn't configurable at
the moment. Have a look at https://github.com/apache/incubator-phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/map/reduce/MapReduceJob.java#L136

Is there something I'm missing? Perhaps I'm looking in the wrong place?

Thanks,
Nick

On Wed, Feb 5, 2014 at 10:16 AM, Devin Pinkston <dpinkston@technicacorp.com<mailto:dpinkston@technicacorp.com>>
wrote:
James,

Thanks for the quick response.  Do you know what the argument or command is to pass in?

For instance ./csv-bulk-loader.sh -delimiter '|'

Thanks

From: James Taylor [mailto:jamestaylor@apache.org<mailto:jamestaylor@apache.org>]
Sent: Wednesday, February 05, 2014 11:51 AM
To: user@phoenix.incubator.apache.org<mailto:user@phoenix.incubator.apache.org>
Subject: Re: Import Delimiter

Hello,
The CSV map-reduce based bulk loader supports custom delimiters. Might need to be doc-ed,
though.
Thanks,
James

On Wednesday, February 5, 2014, Devin Pinkston <dpinkston@technicacorp.com<mailto:dpinkston@technicacorp.com>>
wrote:

Hello,



I am trying to import data into HBASE however I have '|' or pipe delimiters=  in my file instead
of commas.  I don't see a way to pass in a different separator/delimiter with the jar.  What
would be the best way to import data = like this?



Thanks


The information contained in this transmission may contain privileged and confidential information.
It is intended only for the use of the person(s) named above.
If you are not the intended recipient, you are hereby notified that any review, dissemination,
distribution or duplication of this communication is strictly prohibited.
If you are not the intended recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.
Technica Corporation does not represent this e-mail to be free from any virus, fault or defect
and it is therefore the responsibility of the recipient to first scan it for viruses, faults
and defects.
To reply to our e-mail administrator directly, please send an e-mail to postmaster@technicacorp.com<mailto:postmaster@technicacorp.com>.
Thank you.

The information contained in this transmission may contain privileged and confidential information.
It is intended only for the use of the person(s) named above.
If you are not the intended recipient, you are hereby notified that any review, dissemination,
distribution or duplication of this communication is strictly prohibited.
If you are not the intended recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.
Technica Corporation does not represent this e-mail to be free from any virus, fault or defect
and it is therefore the responsibility of the recipient to first scan it for viruses, faults
and defects.
To reply to our e-mail administrator directly, please send an e-mail to postmaster@technicacorp.com<mailto:postmaster@technicacorp.com>.
Thank you.



The information contained in this transmission may contain privileged and confidential information.
It is intended only for the use of the person(s) named above.
If you are not the intended recipient, you are hereby notified that any review, dissemination,
distribution or duplication of this communication is strictly prohibited.
If you are not the intended recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.
Technica Corporation does not represent this e-mail to be free from any virus, fault or defect
and it is therefore the responsibility of the recipient to first scan it for viruses, faults
and defects.
To reply to our e-mail administrator directly, please send an e-mail to postmaster@technicacorp.com<mailto:postmaster@technicacorp.com>.
Thank you.
The information contained in this transmission may contain privileged and confidential information.

It is intended only for the use of the person(s) named above. 
If you are not the intended recipient, you are hereby notified that any review, dissemination,
distribution or duplication of this communication is strictly prohibited. 
If you are not the intended recipient, please contact the sender by reply e-mail and destroy
all copies of the original message. 
Technica Corporation does not represent this e-mail to be free from any virus, fault or defect
and it is therefore the responsibility of the recipient to first scan it for viruses, faults
and defects. 
To reply to our e-mail administrator directly, please send an e-mail to postmaster@technicacorp.com.
Thank you.

Mime
View raw message