phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fustes, Diego" <>
Subject Phoenix-Spark: Number of partitions in PhoenixRDD
Date Mon, 18 Apr 2016 08:37:13 GMT
Hi all,

I'm working with the Phoenix spark plugin to process a HUGE table. The table is salted in
100 buckets and is split in 400 regions. When I read it with phoenixTableAsRDD, I get a RDD
with 150 parititions. These partitions are too big, such
that I am getting OutOfMemory problems. Therefore, I would like to get smaller partitions.
To do this, I could just call repartition, but it would shuffle the whole dataset... So, my
question is, is there a way to modify PhoenixInputFormat
to get more partitions in the resulting RDD?

Thanks and regards,


[Description: Description: cid:image001.png@01CF4378.72EDFE50]
Diego Fustes, Big Data and Machine Learning Expert
Gran Vía de les Corts Catalanes 130, 11th floor
08038 Barcelona, Spain
Phone: +34 93 43 255 27<><>

This email is intended only for the recipient(s) designated above.  Any dissemination, distribution,
copying, or use of the information contained herein by anyone other than the recipient(s)
designated by the sender is unauthorized and strictly prohibited and subject to legal privilege.
 If you have received this e-mail in error, please notify the sender immediately and delete
and destroy this email.

Der Inhalt dieser E-Mail und deren Anhänge sind vertraulich. Wenn Sie nicht der Adressat
sind, informieren Sie bitte den Absender unverzüglich, verwenden Sie den Inhalt nicht und
löschen Sie die E-Mail sofort.

NDT Global GmbH and Co. KG,  Friedrich-List-Str. 1, D-76297 Stutensee, Germany
Registry Court Mannheim
HRA 704288

Personally liable partner: 
NDT Verwaltungs GmbH
Friedrich-List-Straße 1, D-76297 Stutensee, Germany
Registry Court Mannheim
HRB 714639
CEO: Gunther Blitz

View raw message