phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominic Egger <>
Subject Write to Disk SQLLine vs Spark with secondary indexing
Date Mon, 26 Mar 2018 10:33:37 GMT
Hello Phoenix Usergroup
I have a query on a table about 170Mio strong selecting out around 700k.
The query retrieves row-key fields, fields covered by the index as well as
one only occurring in the table itself. We also use index-hinting. This
works very quickly when using SQLLine and dumping the results to a file
(46s). However writing the same query in Spark and materializing the result
in driver memory it takes much longer (10min). I suspect the issue is the
index-hinting but I cannot find out how to get Spark to use the correct
index. Does anyone know how to do that?

Looking at IO usage and HBase overview I suspect the Spark approach leads
to a complete tablescan. The HBase readrate and IO rate on Disk at least
would seem like it.

Best Regards
Dominic Egger

View raw message