phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Hampson <hamps...@gmail.com>
Subject Importing Postgres-generated CSVs into Phoenix
Date Fri, 08 Jan 2016 01:28:38 GMT
Hi, all:

I am in the process of importing some data from Postgres into Phoenix as
part of a larger project. I noticed after doing so and cross-checking the
results of a fairly complex JOIN query (of 6 tables of so) that the number
of rows returned by Postgres and Phoenix differed -- Phoenix had a
significant number more rows. I investigated this further, and ultimately
determined that the reason was that all *boolean *values were imported as
being false, regardless of their actual value.

When I investigated that aspect further, I discovered that Phoenix appears
to require the full strings "true" and "false" in the CSV in order to
properly register their actual values. As a workaround, I am currently
using a full SELECT in the "\copy" statement in psql to select each column
individually, and for each boolean column, I am using a CASE statement to
emit the full strings. I have since also learned that casting the boolean
to the "text" datatype will also cause Postgres to emit the full string.

In either case, though, the pain point is that this discrepancy requires
that I examine each table closely for boolean columns, and if there are
any, I then need to enumerate all the columns out explicitly and use either
the CASE statement or the cast method on the boolean columns in order to
get the values Phoenix expects. In my case, I didn't need to transform any
of the data otherwise, so this forced a much more complex procedure.

While psql provides many options to change various values -- delimiters and
what not -- there is no option to change the boolean values. Likewise, as
near as I can tell, there are no options to change what Phoenix expects,
either.

I am wondering:

- Is there any way from within Phoenix to change the boolean values that it
expects in the CSV?
- Are there any other methods from a Phoenix perspective that you can think
of that would obviate the need for the column-by-column enumeration
procedure I outlined above?
- Since Postgres is a major open source RDBMS which I suspect many people
-- myself included -- will want to export data from into Phoenix, could
there be an enhancement to Phoenix to allow an option to specify a default
boolean value, or perhaps an option to specify the CSV source, from which
the proper value could be defined and utilized automatically?

Thanks,
- Ken

Mime
View raw message