phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankit Singhal <ankitsingha...@gmail.com>
Subject Re: split count for mapreduce jobs with PhoenixInputFormat
Date Wed, 30 Jan 2019 22:42:27 GMT
As Thomas said, no. of splits will be equal to the number of guideposts
available for the table or the ones required to cover the filter.
if you are seeing one split per region then either stats are disabled or
guidePostwidth is set higher than the size of the region , so try reducing
the guidepost width and re-run the UPDATE STATISTICS to rebuild the stats ,
check after some time to confirm that's no. of guideposts has increased by
querying SYSTEM.STATS table and then run MR job.

On Wed, Jan 30, 2019 at 2:33 PM venkata subbarayudu <avsrit2005@gmail.com>
wrote:

> You may recreate the table with salt_bucket table option to have
> reasonable regions and you may try having a secondary index to make the
> query run faster incase if your Mapreduce job performing specific filters
>
> On Thu 31 Jan, 2019, 12:09 AM Thomas D'Silva <tdsilva@salesforce.com
> wrote:
>
>> If stats are enabled PhoenixInputFormat will generate a split per
>> guidepost.
>>
>> On Wed, Jan 30, 2019 at 7:31 AM Josh Elser <elserj@apache.org> wrote:
>>
>>> You can extend/customize the PhoenixInputFormat with your own code to
>>> increase the number of InputSplits and Mappers.
>>>
>>> On 1/30/19 6:43 AM, Edwin Litterst wrote:
>>> > Hi,
>>> > I am using PhoenixInputFormat as input source for mapreduce jobs.
>>> > The split count (which determines how many mappers are used for the
>>> job)
>>> > is always equal to the number of regions of the table from where I
>>> > select the input.
>>> > Is there a way to increase the number of splits? My job is running too
>>> > slow with only one mapper for every region.
>>> > (Increasing the number of regions is no option.)
>>> > regards,
>>> > Eddie
>>>
>>

Mime
View raw message