phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siva <sbhavan...@gmail.com>
Subject Re: Phoenix bulk loading
Date Thu, 12 Feb 2015 18:32:08 GMT
Hi Gabriel,

Thanks for your response.

Your understanding is correct. The usecase we have is, we get the data from
different sources (having different table structure (in terms of columns)
based on client type) in csv format. if a column is not available in source
we dont have a choice to even append a blank comma (,) with in that place.
But in Hbase, it ignores the column if it dont find the data with in a
record.

if I have set of records like below, and if I specify as 4 columns
(excluding row key), then for first record it inserts data for 3 columns,
for 2nd record 2 cols, for 3rd record 4 column, it just ignore the column
if it dont find the data.

r1,c1,c2,c3
r2,c1,c2
r2,c1,c2,c3,c4


Since phoenix doesn't have this capability, we have to create tables in
HBase and load them through it. Use Phoenix just for sql queries.

I think we should enhance the Phoenix data loader in the same way like
Hbase loader. What do you say, any thoughts on this?

Thanks,
Siva.

On Wed, Feb 11, 2015 at 11:34 PM, Gabriel Reid <gabriel.reid@gmail.com>
wrote:

> Hi Siva,
>
> If I understand correctly, you want to explicitly supply null values
> in a CSV file for some fields. In general, this should work by just
> leaving the field empty in your CSV file. For example, if you have
> three fields (id, first_name, last_name) in your CSV file, then a
> record like "1,,Reid" should create a record with first_name left as
> null.
>
> Note that there is still an open bug, PHOENIX-1277 [1] that will
> prevent inserting null values via the bulk loader or psql, so for some
> datatypes there currently isn't a way to explicitly supply null
> values.
>
> - Gabriel
>
>
> 1. https://issues.apache.org/jira/browse/PHOENIX-1277
>
> On Thu, Feb 12, 2015 at 1:28 AM, Siva <sbhavanari@gmail.com> wrote:
> > Hello all,
> >
> > is there a way to specify to keep NULL values for the columns which were
> > not there in csv file as part of bulk loading?
> >
> > Requirement I have is, few row in csv file contains all the column, but
> > rows contain only few columns.
> >
> > In Hbase, if the given record doesnt have desired columns, it just ignore
> > the columns and it goes for next record while loading the data from
> > ImportTsv.
> >
> >
> HADOOP_CLASSPATH=/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-protocol.jar:/usr/hdp/2.2.0.0-2041/hbase/conf
> > hadoop jar
> > /usr/hdp/2.2.0.0-2041/phoenix/phoenix-4.2.0.2.2.0.0-2041-client.jar
> > org.apache.phoenix.mapreduce.CsvBulkLoadTool --table P_TEST_2_COLS
> --input
> > /user/sbhavanari/p_h_test_2_cols_less.csv --import-columns NAME,LEADID,D
> > --zookeeper 172.31.45.176:2181:/hbase
> >
> > Thanks,
> > Siva.
>

Mime
View raw message