phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siva <sbhavan...@gmail.com>
Subject Re: Phoenix bulk loading
Date Thu, 12 Feb 2015 22:05:13 GMT
Thanks for your inputs Gabriel. I m thinking of like this, having the
consistency to data loader as is and having a new parameter to extend the
logic to populate null values for missing columns. Lets see how it goes :)


On Thu, Feb 12, 2015 at 12:17 PM, Gabriel Reid <gabriel.reid@gmail.com>
wrote:

> Hi Siva,
>
> If I'm not mistaken, the bulk loader will currently just drop rows
> that don't have the correct number of fields.
>
> I'm actually in favor of this behavior (i.e. enforcing the consistency
> of records in an input file). A CSV file is generally considered a
> table with a given set of columns, so I think it's pretty reasonable
> to consider a CSV record corrupt if it doesn't include the correct
> number of columns.
>
> I think the workaround here (I think you already realized this) would
> be to encode your input data as
>
> r1,c1,c2,c3,
> r2,c1,c2,,
> r2,c1,c2,c3,4
>
> I realize that this makes your use case more complex, because it
> requires processing on the file before loading it into Phoenix, but I
> think that keeping the Phoenix loader relatively strict (and simple)
> in this case is the better choice.
>
> - Gabriel
>
> On Thu, Feb 12, 2015 at 7:32 PM, Siva <sbhavanari@gmail.com> wrote:
> > Hi Gabriel,
> >
> > Thanks for your response.
> >
> > Your understanding is correct. The usecase we have is, we get the data
> from
> > different sources (having different table structure (in terms of columns)
> > based on client type) in csv format. if a column is not available in
> source
> > we dont have a choice to even append a blank comma (,) with in that
> place.
> > But in Hbase, it ignores the column if it dont find the data with in a
> > record.
> >
> > if I have set of records like below, and if I specify as 4 columns
> > (excluding row key), then for first record it inserts data for 3 columns,
> > for 2nd record 2 cols, for 3rd record 4 column, it just ignore the
> column if
> > it dont find the data.
> >
> > r1,c1,c2,c3
> > r2,c1,c2
> > r2,c1,c2,c3,c4
> >
> >
> > Since phoenix doesn't have this capability, we have to create tables in
> > HBase and load them through it. Use Phoenix just for sql queries.
> >
> > I think we should enhance the Phoenix data loader in the same way like
> Hbase
> > loader. What do you say, any thoughts on this?
> >
> > Thanks,
> > Siva.
> >
> > On Wed, Feb 11, 2015 at 11:34 PM, Gabriel Reid <gabriel.reid@gmail.com>
> > wrote:
> >>
> >> Hi Siva,
> >>
> >> If I understand correctly, you want to explicitly supply null values
> >> in a CSV file for some fields. In general, this should work by just
> >> leaving the field empty in your CSV file. For example, if you have
> >> three fields (id, first_name, last_name) in your CSV file, then a
> >> record like "1,,Reid" should create a record with first_name left as
> >> null.
> >>
> >> Note that there is still an open bug, PHOENIX-1277 [1] that will
> >> prevent inserting null values via the bulk loader or psql, so for some
> >> datatypes there currently isn't a way to explicitly supply null
> >> values.
> >>
> >> - Gabriel
> >>
> >>
> >> 1. https://issues.apache.org/jira/browse/PHOENIX-1277
> >>
> >> On Thu, Feb 12, 2015 at 1:28 AM, Siva <sbhavanari@gmail.com> wrote:
> >> > Hello all,
> >> >
> >> > is there a way to specify to keep NULL values for the columns which
> were
> >> > not there in csv file as part of bulk loading?
> >> >
> >> > Requirement I have is, few row in csv file contains all the column,
> but
> >> > rows contain only few columns.
> >> >
> >> > In Hbase, if the given record doesnt have desired columns, it just
> >> > ignore
> >> > the columns and it goes for next record while loading the data from
> >> > ImportTsv.
> >> >
> >> >
> >> >
> HADOOP_CLASSPATH=/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-protocol.jar:/usr/hdp/2.2.0.0-2041/hbase/conf
> >> > hadoop jar
> >> > /usr/hdp/2.2.0.0-2041/phoenix/phoenix-4.2.0.2.2.0.0-2041-client.jar
> >> > org.apache.phoenix.mapreduce.CsvBulkLoadTool --table P_TEST_2_COLS
> >> > --input
> >> > /user/sbhavanari/p_h_test_2_cols_less.csv --import-columns
> NAME,LEADID,D
> >> > --zookeeper 172.31.45.176:2181:/hbase
> >> >
> >> > Thanks,
> >> > Siva.
> >
> >
>

Mime
View raw message