phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Importing Postgres-generated CSVs into Phoenix
Date Fri, 08 Jan 2016 18:13:57 GMT
Hi Ken,
PHOENIX-2434 improved our CSV handling of booleans and will appear in our
upcoming 4.7.0 release. It'd be good if you can confirm whether or not this
is what you need. We definitely want to support ingest of CSVs from other
RDBMSs.

There are a couple of other avenues of ingest into Phoenix:
- using Flyway DB (http://flywaydb.org/) where Phoenix support has been
added (not sure if this has appeared in a release yet or not, though)
- through a pull request to add support for Phoenix in Scoop (
http://sqoop.apache.org/). Unfortunately, this isn't getting much attention
in that community (if this is useful for you, you may want to chime in
on SQOOP-2649 or on their mailing list).

Thanks,
James

On Thu, Jan 7, 2016 at 5:28 PM, Ken Hampson <hampsonk@gmail.com> wrote:

> Hi, all:
>
> I am in the process of importing some data from Postgres into Phoenix as
> part of a larger project. I noticed after doing so and cross-checking the
> results of a fairly complex JOIN query (of 6 tables of so) that the number
> of rows returned by Postgres and Phoenix differed -- Phoenix had a
> significant number more rows. I investigated this further, and ultimately
> determined that the reason was that all *boolean *values were imported as
> being false, regardless of their actual value.
>
> When I investigated that aspect further, I discovered that Phoenix appears
> to require the full strings "true" and "false" in the CSV in order to
> properly register their actual values. As a workaround, I am currently
> using a full SELECT in the "\copy" statement in psql to select each column
> individually, and for each boolean column, I am using a CASE statement to
> emit the full strings. I have since also learned that casting the boolean
> to the "text" datatype will also cause Postgres to emit the full string.
>
> In either case, though, the pain point is that this discrepancy requires
> that I examine each table closely for boolean columns, and if there are
> any, I then need to enumerate all the columns out explicitly and use either
> the CASE statement or the cast method on the boolean columns in order to
> get the values Phoenix expects. In my case, I didn't need to transform any
> of the data otherwise, so this forced a much more complex procedure.
>
> While psql provides many options to change various values -- delimiters
> and what not -- there is no option to change the boolean values. Likewise,
> as near as I can tell, there are no options to change what Phoenix expects,
> either.
>
> I am wondering:
>
> - Is there any way from within Phoenix to change the boolean values that
> it expects in the CSV?
> - Are there any other methods from a Phoenix perspective that you can
> think of that would obviate the need for the column-by-column enumeration
> procedure I outlined above?
> - Since Postgres is a major open source RDBMS which I suspect many people
> -- myself included -- will want to export data from into Phoenix, could
> there be an enhancement to Phoenix to allow an option to specify a default
> boolean value, or perhaps an option to specify the CSV source, from which
> the proper value could be defined and utilized automatically?
>
> Thanks,
> - Ken
>

Mime
View raw message