phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eddie <>
Subject Re: split count for mapreduce jobs with PhoenixInputFormat
Date Thu, 31 Jan 2019 16:40:51 GMT
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Maybe there are differences in default
      values on different clusters running different hadoop versions for
      With hadoop 3.1.x and phoenix 5.0 useStatsForParallelization is
      true by default and the number of splits = guidepost count +
      number of regions.<br>
      I changed GUIDE_POST_WIDTH to another value:<br>
      ALTER TABLE &lt;tablename&gt; SET GUIDE_POSTS_WIDTH = 10240000<br>
      UPDATE STATISTICS &lt;tablename&gt; ALL<br>
      Unfortunately this didn't change the guidepost count and also not
      the split count. Am I missing something here?</div>
    <div class="moz-cite-prefix"><br>
    <div class="moz-cite-prefix"><br>
    <div class="moz-cite-prefix">Am 30.01.2019 um 19:38 schrieb Thomas
    <blockquote type="cite"
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr">If stats are enabled PhoenixInputFormat will
          generate a split per guidepost. </div>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Wed, Jan 30, 2019 at 7:31
          AM Josh Elser &lt;<a href=""
            moz-do-not-send="true"></a>&gt; wrote:<br>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px
          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">You
          can extend/customize the PhoenixInputFormat with your own code
          to <br>
          increase the number of InputSplits and Mappers.<br>
          On 1/30/19 6:43 AM, Edwin Litterst wrote:<br>
          &gt; Hi,<br>
          &gt; I am using PhoenixInputFormat as input source for
          mapreduce jobs.<br>
          &gt; The split count (which determines how many mappers are
          used for the job) <br>
          &gt; is always equal to the number of regions of the table
          from where I <br>
          &gt; select the input.<br>
          &gt; Is there a way to increase the number of splits? My job
          is running too <br>
          &gt; slow with only one mapper for every region.<br>
          &gt; (Increasing the number of regions is no option.)<br>
          &gt; regards,<br>
          &gt; Eddie<br>

View raw message