phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Long, Xindian" <Xindian.L...@sensus.com>
Subject RE: Phoenix-Spark plug in cannot select by column family name
Date Wed, 23 Nov 2016 17:55:51 GMT
Thanks, I just filed a Jira Issue

https://issues.apache.org/jira/browse/PHOENIX-3506

Xindian


From: James Taylor [mailto:jamestaylor@apache.org]
Sent: Thursday, November 10, 2016 3:08 PM
To: user
Subject: Re: Phoenix-Spark plug in cannot select by column family name

Please file a JIRA, though, Xindian. It's a reasonable request to add the ability to prefix
column references with the column family name just like you can do in JDBC.

On Thu, Nov 10, 2016 at 12:05 PM, Chris Tarnas <cft@biotiquesystems.com<mailto:cft@biotiquesystems.com>>
wrote:
From my experience you will need to make sure that the column names are unique, even across
families, otherwise Spark will throw errors.

Chris Tarnas
Biotique Systems, Inc
cft@biotiquesystems.com<mailto:cft@biotiquesystems.com>

On Nov 10, 2016, at 10:14 AM, Long, Xindian <Xindian.Long@sensus.com<mailto:Xindian.Long@sensus.com>>
wrote:

It works with no column family, but I expect that I do not need to make sure column names
are unique  across different column families.

Xindian


From: James Taylor [mailto:jamestaylor@apache.org]
Sent: Tuesday, November 08, 2016 5:46 PM
To: user
Subject: Re: Phoenix-Spark plug in cannot select by column family name

Have you tried without the column family name? Unless the column names are not unique across
all column families, you don't need to include the column family name.

Thanks,
James

On Tue, Nov 8, 2016 at 2:19 PM, Long, Xindian <Xindian.Long@sensus.com<mailto:Xindian.Long@sensus.com>>
wrote:
I have a table with multiple column family with possible same column names.
I want to use phoenix-spark plug in to select some of the fields, but it returns a AnalysisException
(details in the attached file)

public void testSpark(JavaSparkContext sc, String tableStr, String dataSrcUrl) {
    //SparkContextBuilder.buildSparkContext("Simple Application", "local");

    // One JVM can only have one Spark Context now
    Map<String, String> options = new HashMap<String, String>();
    SQLContext sqlContext = new SQLContext(sc);


    options.put("zkUrl", dataSrcUrl);
    options.put("table", tableStr);
    log.info("Phoenix DB URL: " + dataSrcUrl + " tableStr: " + tableStr);

    DataFrame df = null;
    try {
        df = sqlContext.read().format("org.apache.phoenix.spark").options(options).load();
        df.explain(true);
        df.show();

        df = df.select("I.CI<http://i.ci/>", "I.FA");

        //df = df.select("\"I\".\"CI\"", "\"I\".\"FA\""); // This gives the same exception
too

    } catch (Exception ex) {
        log.error("sql error: ", ex);
    }

    try {
        log.info("Count By phoenix spark plugin: " + df.count());
   } catch (Exception ex) {
        log.error("dataframe error: ", ex);
    }

}


I can see in the log that there is something like

10728 [INFO] main  org.apache.phoenix.mapreduce.PhoenixInputFormat  - Select Statement: SELECT
"RID","I"."CI","I"."FA","I"."FPR","I"."FPT","I"."FR","I"."LAT","I"."LNG","I"."NCG","I"."NGPD","I"."VE","I"."VMJ","I"."VMR","I"."VP","I"."CSRE","I"."VIB","I"."IIICS","I"."LICSCD","I"."LEDC","I"."ARM","I"."FBM","I"."FTB","I"."NA2FR","I"."NA2PT","S"."AHDM","S"."ARTJ","S"."ATBM","S"."ATBMR","S"."ATBR","S"."ATBRR","S"."CS","S"."LAMT","S"."LTFCT","S"."LBMT","S"."LDTI","S"."LMT","S"."LMTN","S"."LMTR","S"."LPET","S"."LPORET","S"."LRMT","S"."LRMTP","S"."LRMTR","S"."LSRT","S"."LSST","S"."MHDMS0","S"."MHDMS1","S"."RFD","S"."RRN","S"."RRR","S"."TD","S"."TSM","S"."TC","S"."TPM","S"."LRMCT","S"."SS13FSK34","S"."LERMT","S"."LEMDMT","S"."AGTBRE","S"."SRM","S"."LTET","S"."TPMS","S"."TPMSM","S"."TM","S"."TMF","S"."TMFM","S"."NA2TLS","S"."NA2IT","S"."CWR","S"."BPR","S"."LR","S"."HLB","S"."NA2UFTBFR","S"."DT","S"."NA28ARE","S"."RM","S"."LMTB","S"."LRMTB","S"."RRB","P"."BADUC","P"."UAN","P"."BAPS","P"."BAS","P"."UAS","P"."BATBBR","P"."BBRI","P"."BLBR","P"."ULHT","P"."BLPST","P"."BLPT","P"."UTI","P"."UUC"
FROM TESTING.ENDPOINTS

But obviously, the column family is  left out of the Dataframe column name somewhere in the
process.
Any fix for the problem?

Thanks

Xindian




Mime
View raw message