phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bulvik, Noam" <Noam.Bul...@teoco.com>
Subject RE: replace CsvToKeyValueMapper with my implementation
Date Thu, 29 Oct 2015 19:38:06 GMT
This is exactly what I need i.e. to be able to change the content of the row rather than different
input format.
The use case is when you need to load large amount of data from files and each row needs to
be handled before it is been processed by the CSV parser.  Examples can be change date format,
fix encoding, escape delimiters and more. Of course this can be done in different map-reduce
job but since we are already processing each row then it would be nice if we can do it there.


erom: James Taylor [mailto:jamestaylor@apache.org]
Sent: Thursday, October 29, 2015 7:33 PM
To: user <user@phoenix.apache.org>
Subject: Re: replace CsvToKeyValueMapper with my implementation

I seem to remember you starting down that path, Gabriel - a kind of pluggable transformation
for each row. It wasn't pluggable on the input format, but that's a nice idea too, Ravi. I'm
not sure if this is what Noam needs or if it's something else.

Probably good to discuss a bit more at the use case level to understand the specifics a bit
more.

On Thu, Oct 29, 2015 at 9:17 AM, Ravi Kiran <maghamravikiran@gmail.com<mailto:maghamravikiran@gmail.com>>
wrote:
It would be great if we can provide an api and have end users provided implementation on how
to parse each record . This way, we can move away with only bulk loading csv and have json
and other formats of input bulk loaded onto phoenix tables.

I can take that one up. Would it be something the community like as a feature ?





On Thu, Oct 29, 2015 at 8:10 AM, Gabriel Reid <gabriel.reid@gmail.com<mailto:gabriel.reid@gmail.com>>
wrote:
Hi Noam,

That specific piece of code in CsvBulkLoadTool that you referred to
allows packaging the CsvBulkLoadTool within a different job jar file,
but won't allow setting a different mapper class. The actual setting
of the mapper class is done further down in the submitJob method,
specifically the following piece:

   job.setMapperClass(CsvToKeyValueMapper.class);

There isn't currently a way to load a custom mapper in the
CsvBulkLoadTool, so the only (current) option is to create a fully new
custom implementation of the bulk load tool (probably copying or
reusing most of the existing tool). However, I can certainly imagine
this being a useful feature to have in some situations.

Could you log this request in jira? It would also be really good to
have some more detail on your specific use case. And even better is a
patch that implements it :-)

- Gabriel


On Thu, Oct 29, 2015 at 3:22 PM, Bulvik, Noam <Noam.Bulvik@teoco.com<mailto:Noam.Bulvik@teoco.com>>
wrote:
> Hi,
>
>
>
> We have private logic to be executed when parsing each line before it is
> uploaded to phoenix. I saw the following in the code of the CsvBulkLoadTool
>
> // Allow overriding the job jar setting by using a -D system property at
> startup
>
> if (job.getJar() == null)
>
>  {
>
>
> job.setJarByClass(CsvToKeyValueMapper.class);
>
>                  }
>
>
>
> Assuming I have the implementation for MyKeyValueMapper how can I make sure
> it will be loaded instead of standard one ?
>
>
>
> Also in CsvToKeyValueMapper class there are some private members like
>
> ·         private PhoenixConnection conn;
>
> ·         private byte[] tableName;
>
>
>
> can you add option to access these member or make them protected so we will
> be able to use them in the class we create that extends CsvToKeyValueMapper
> and not to duplicate them and the code that init them
>
>
>
> we are using  phoenix 4.5.2 over CDH
>
>
>
> thanks
>
> Noam
>
>
>
> Noam Bulvik
>
> R&D Manager
>
>
>
> TEOCO CORPORATION
>
> c: +972 54 5507984<tel:%2B972%2054%205507984>
>
> p: +972 3 9269145<tel:%2B972%203%209269145>
>
> Noam.Bulvik@teoco.com<mailto:Noam.Bulvik@teoco.com>
>
> www.teoco.com<http://www.teoco.com>
>
>
>
>
> ________________________________
>
> PRIVILEGED AND CONFIDENTIAL
> PLEASE NOTE: The information contained in this message is privileged and
> confidential, and is intended only for the use of the individual to whom it
> is addressed and others who have been specifically authorized to receive it.
> If you are not the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this communication is strictly
> prohibited. If you have received this communication in error, or if any
> problems occur with transmission, please contact sender. Thank you.



________________________________

PRIVILEGED AND CONFIDENTIAL
PLEASE NOTE: The information contained in this message is privileged and confidential, and
is intended only for the use of the individual to whom it is addressed and others who have
been specifically authorized to receive it. If you are not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of this communication is strictly
prohibited. If you have received this communication in error, or if any problems occur with
transmission, please contact sender. Thank you.
Mime
View raw message