phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Mahonin <>
Subject Re: Use Phoenix hints with Spark Integration [main use case: block cache disable]
Date Thu, 31 Aug 2017 17:06:04 GMT
Hi Roberto,

At present, I don't believe there's any way to pass a query hint
explicitly, as the SELECT statement is built based on the table name and
columns, down in this method:

However, it does seem that the Hive integration has this built-in, but
doesn't exist in the rest of the Phoenix MR codebase:

Would you mind filing a JIRA ticket? As always, patches are welcome as
well. I suspect we should be disabling the block cache for phoenix-spark by
default as Hive does.



On Wed, Aug 30, 2017 at 7:11 AM, Roberto Coluccio <>

> Hello folks,
> I'm facing the issue of disabling adding to the block cache records I'm
> selecting from my Spark application when reading as DataFrame  (e.g.
> sqlContext.phoenixTableAsDataFrame(myTable, myColumns, myPredicate,
> myZkUrl, myConf).
> I know I can force the no cache on a query basis when issuing SQL queries
> leveraging the /*+ NO_CACHE */ hint.
> I know I can disable the caching at a table-specific or colum-family
> specific basis through an ALTER TABLE HBase shell command.
> What I don't know is how to do so when leveraging Phoenix-Spark APIs. I
> think my problem can be stated as a more general purpose question:
> *how can Phoenix hints be specified when using Phoenix-Spark APIs? *For
> my specific use case, I tried to push within a Configuration object the
> property *hfile.block.cache.size=0* before creating the DataFrame but I
> realized records resulting from the underneath scan where still cached.
> Thank you in advance for your help.
> Best regards,
> Roberto

View raw message