phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Dimiduk <ndimi...@gmail.com>
Subject Re: Line separator option in Bulk loader
Date Thu, 12 Feb 2015 19:10:12 GMT
Custom line separator is a reasonable request. Please open JIRAs for HBase
and/or Phoenix import tools -- and provide a patch, if you're feeling
generous ;)

On Thu, Feb 12, 2015 at 10:39 AM, Siva <sbhavanari@gmail.com> wrote:

> Hi Gabriel,
>
> Having special character as line separator other than (\n) does not work
> with even Hbase ImportTsv. But I found something richImportTsv in git.
>
> https://github.com/kawaa/RichImportTsv
>
> But it is 3 years old, was implemented by using old APIs. We should take a
> step to rewrite with new API.
>
> Thanks,
> Siva.
>
> On Wed, Feb 11, 2015 at 11:40 PM, Gabriel Reid <gabriel.reid@gmail.com>
> wrote:
>
>> Hi Siva,
>>
>> Handling multi-line records with the Bulk CSV Loader (i.e.
>> MapReduce-based loader) definitely won't support records split over
>> multiple input lines. It could be that loading via PSQL (as described
>> on http://phoenix.apache.org/bulk_dataload.html) will allow multi-line
>> records, as this might be supported by the underlying CSV parsing
>> library (commons-csv), although I'm not sure. In any case, I can't
>> really give you any advice on how to make it work there if it isn't
>> working right now.
>>
>> I assume this also won't work in HBase's ImportTsv.
>>
>> - Gabriel
>>
>>
>> On Thu, Feb 5, 2015 at 10:28 PM, Siva <sbhavanari@gmail.com> wrote:
>> > We have table contains a NOTE column, this column contains lines of text
>> > separated by new lines. When I export the data from .csv through
>> bulkloader,
>> > Phoenix is failing with error and Hbase terminates the text till
>> encounters
>> > the new line and assumes rest of NOTE as new record.
>> >
>> >
>> >
>> > Is there a way to specify new line separator in Hbase or Phoenix bulk
>> load?
>> >
>> >
>> >
>> > With phoenix:
>> >
>> >
>> >
>> >
>> HADOOP_CLASSPATH=/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-protocol.jar:/usr/hdp/2.2.0.0-2041/hbase/conf
>> > hadoop jar
>> > /usr/hdp/2.2.0.0-2041/phoenix/phoenix-4.2.0.2.2.0.0-2041-client.jar
>> > org.apache.phoenix.mapreduce.CsvBulkLoadTool --table test_leadwarehouse
>> > --input /user/sbhavanari/test_leadwarehouse.csv --zookeeper <zookeeper
>> > Ip>:2181:/hbase
>> >
>> >
>> >
>> > With hbase importtsv:
>> >
>> >
>> >
>> > base org.apache.hadoop.hbase.mapreduce.ImportTsv
>> '-Dimporttsv.separator=,'
>> > -Dimporttsv.columns=<col_list> test_leadwarehouse
>> > /user/data/test_leadwarehouse.csv
>>
>
>

Mime
View raw message