phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eddie <e...@gmx.net>
Subject Re: split count for mapreduce jobs with PhoenixInputFormat
Date Thu, 31 Jan 2019 16:40:51 GMT
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Maybe there are differences in default
      values on different clusters running different hadoop versions for
      USE_STATS_FOR_PARALLELIZATION.<br>
      With hadoop 3.1.x and phoenix 5.0 useStatsForParallelization is
      true by default and the number of splits = guidepost count +
      number of regions.<br>
      I changed GUIDE_POST_WIDTH to another value:<br>
      ALTER TABLE &lt;tablename&gt; SET GUIDE_POSTS_WIDTH = 10240000<br>
      UPDATE STATISTICS &lt;tablename&gt; ALL<br>
      Unfortunately this didn't change the guidepost count and also not
      the split count. Am I missing something here?</div>
    <div class="moz-cite-prefix"><br>
    </div>
    <div class="moz-cite-prefix"><br>
    </div>
    <div class="moz-cite-prefix">Am 30.01.2019 um 19:38 schrieb Thomas
      D'Silva:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAMJjmcx+Kn-=c+ZjAqT1q+omKYJBcwR-uTjcCUAaoh2xCT3ouA@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr">If stats are enabled PhoenixInputFormat will
          generate a split per guidepost. </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Wed, Jan 30, 2019 at 7:31
          AM Josh Elser &lt;<a href="mailto:elserj@apache.org"
            moz-do-not-send="true">elserj@apache.org</a>&gt; wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px
          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">You
          can extend/customize the PhoenixInputFormat with your own code
          to <br>
          increase the number of InputSplits and Mappers.<br>
          <br>
          On 1/30/19 6:43 AM, Edwin Litterst wrote:<br>
          &gt; Hi,<br>
          &gt; I am using PhoenixInputFormat as input source for
          mapreduce jobs.<br>
          &gt; The split count (which determines how many mappers are
          used for the job) <br>
          &gt; is always equal to the number of regions of the table
          from where I <br>
          &gt; select the input.<br>
          &gt; Is there a way to increase the number of splits? My job
          is running too <br>
          &gt; slow with only one mapper for every region.<br>
          &gt; (Increasing the number of regions is no option.)<br>
          &gt; regards,<br>
          &gt; Eddie<br>
        </blockquote>
      </div>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>

Mime
View raw message