phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antonio Murgia <antonio.mur...@eng.it>
Subject sc.phoenixTableAsRDD number of initial partitions
Date Thu, 13 Oct 2016 16:16:22 GMT
Hello everyone,

I'm trying to read data from a Phoenix Table using apache Spark. I 
actually use the suggested method: sc.phoenixTableAsRDD without issuing 
any query (e.g. reading the whole table) and I noticed that the number 
of partitions that spark creates is equal to the number of 
regionServers. Is there a way to use a custom number of regions?

The problem we actually face is that if a region is bigger than the 
available memory of the spark executor, it goes in OOM. Being able to 
tune the number of regions, we might use a higher number of partitions 
reducing the memory footprint of the processing (and also slowing it 
down, i know :( ).

Thank you in advance

#A.M.


Mime
View raw message