phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thangamani, Arun" <than...@cobalt.com>
Subject Re: system.catalog and system.stats entries slows down bulk MR inserts by 20-25X (Phoenix 4.4)
Date Mon, 07 Dec 2015 23:46:00 GMT
I bounced the region servers with phoenix.stats.guidepost.width = 10737418240 (which is the
max file size set from ambari)

Like Matt, I am seeing entries created in the SYSTEM.STATS table as well. Any other suggestions
James?

From: Matt Kowalczyk <mattk@cloudability.com<mailto:mattk@cloudability.com>>
Reply-To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Date: Monday, December 7, 2015 at 2:52 PM
To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Subject: Re: system.catalog and system.stats entries slows down bulk MR inserts by 20-25X
(Phoenix 4.4)

I'm sorry I poorly communicated in the previous e-mail. I meant to provide a list of things
that I did. I bounced and then performed a major compaction and then ran the select count(*)
query.

On Mon, Dec 7, 2015 at 2:49 PM, James Taylor <jamestaylor@apache.org<mailto:jamestaylor@apache.org>>
wrote:
You need to bounce the cluster *before* major compaction or the region server will continue
to use the old guideposts setting during compaction.

On Mon, Dec 7, 2015 at 2:45 PM, Matt Kowalczyk <mattk@cloudability.com<mailto:mattk@cloudability.com>>
wrote:
bounced, just after major compaction, with the setting as indicated above. I'm unable to disable
the stats table.

select count(*) from system.stats where physical_name = 'XXXXX';
+------------------------------------------+
|                 COUNT(1)                 |
+------------------------------------------+
| 653                                      |
+------------------------------------------+
1 row selected (0.036 seconds)


On Mon, Dec 7, 2015 at 2:41 PM, James Taylor <jamestaylor@apache.org<mailto:jamestaylor@apache.org>>
wrote:
Yes, setting that property is another way to disable stats. You'll need to bounce your cluster
after setting either of these, and stats won't be updated until a major compaction occurs.


On Monday, December 7, 2015, Matt Kowalczyk <mattk@cloudability.com<mailto:mattk@cloudability.com>>
wrote:
I've set, phoenix.stats.guidepost.per.region to 1 and continue to see entries added to the
system.stats table. I believe this should have the same effect? I'll try setting the guidepost
width though.


On Mon, Dec 7, 2015 at 12:11 PM, James Taylor <jamestaylor@apache.org> wrote:
You can disable stats through setting the phoenix.stats.guidepost.width config parameter to
a larger value in the server side hbase-site.xml. The default is 104857600 (or 10MB). If you
set it to your MAX_FILESIZE (the size you allow a region to grow to before it splits - default
20GB), then you're essentially disabling it. You could also try increasing it somewhere in
between to maybe 5 or 10GB.

Thanks,
James

On Mon, Dec 7, 2015 at 10:25 AM, Matt Kowalczyk <mattk@cloudability.com> wrote:
We're also encountering slow downs after bulk MR inserts. I've only measured slow downs in
the query path (since our bulk inserts workloads vary in size it hasn't been clear that we
see slow downs here but i'll now measure this as well). The subject of my reported issue was
titled, "stats table causing slow queries".

the stats table seems to be re-built during compactions and and I have to actively purge the
table to regain sane query times. Would be sweet if the stats feature could be disabled.

On Mon, Dec 7, 2015 at 9:53 AM, Thangamani, Arun <thangar@cobalt.com> wrote:
This is on hbase-1.1.1.2.3.0.0-2557 if that would make any difference in analysis. Thanks

From: Arun Thangamani <thangar@cobalt.com>
Date: Monday, December 7, 2015 at 12:13 AM
To: "user@phoenix.apache.org" <user@phoenix.apache.org>
Subject: system.catalog and system.stats entries slows down bulk MR inserts by 20-25X (Phoenix
4.4)

Hello, I noticed an issue with bulk insert through map reduce in phoenix 4.4.0.2.3.0.0-2557,
using outline of the code below

Normally the inserts of about 25 million rows complete in about 5 mins, there are 5 region
servers and the phoenix table has 32 buckets
But sometimes (maybe after major compactions or region movement?), writes simply slow down
to 90 mins, when I truncate SYSTEM.STATS hbase table, the inserts get a little faster (60
mins), but when I truncate both SYSTEM.CATALOG & SYSTEM.STATS tables, and recreate the
phoenix table def(s) the inserts go back to 5 mins, the workaround of truncating SYSTEM tables
is not sustainable for long, can someone help and let me know if there is a patch available
for this? Thanks in advance.

Job job = Job.getInstance(conf, NAME);
// Set the target Phoenix table and the columns
PhoenixMapReduceUtil.setOutput(job, tableName, "WEB_ID,WEB_PAGE_LABEL,DEVICE_TYPE," +

        "WIDGET_INSTANCE_ID,WIDGET_TYPE,WIDGET_VERSION,WIDGET_CONTEXT," +
        "TOTAL_CLICKS,TOTAL_CLICK_VIEWS,TOTAL_HOVER_TIME_MS,TOTAL_TIME_ON_PAGE_MS,TOTAL_VIEWABLE_TIME_MS,"
+
        "VIEW_COUNT,USER_SEGMENT,DIM_DATE_KEY,VIEW_DATE,VIEW_DATE_TIMESTAMP,ROW_NUMBER");
FileInputFormat.setInputPaths(job, inputPath);
job.setMapperClass(WidgetPhoenixMapper.class);
job.setMapOutputKeyClass(NullWritable.class);
job.setMapOutputValueClass(WidgetPagesStatsWritable.class);
job.setOutputFormatClass(PhoenixOutputFormat.class);
TableMapReduceUtil.addDependencyJars(job);
job.setNumReduceTasks(0);
job.waitForCompletion(true);

public static class WidgetPhoenixMapper extends Mapper<LongWritable, Text, NullWritable,
WidgetPagesStatsWritable> {
    @Override
    public void map(LongWritable longWritable, Text text, Context context) throws IOException,
InterruptedException {
        Configuration conf = context.getConfiguration();
        String rundateString = conf.get("rundate");
        PagesSegmentWidgetLineParser parser = new PagesSegmentWidgetLineParser();
        try {
            PagesSegmentWidget pagesSegmentWidget = parser.parse(text.toString());

            if (pagesSegmentWidget != null) {
                WidgetPagesStatsWritable widgetPagesStatsWritable = new WidgetPagesStatsWritable();
                WidgetPagesStats widgetPagesStats = new WidgetPagesStats();

                widgetPagesStats.setWebId(pagesSegmentWidget.getWebId());
                widgetPagesStats.setWebPageLabel(pagesSegmentWidget.getWebPageLabel());
                widgetPagesStats.setWidgetInstanceId(pagesSegmentWidget.getWidgetInstanceId());
                …..

                widgetPagesStatsWritable.setWidgetPagesStats(widgetPagesStats);
                context.write(NullWritable.get(), widgetPagesStatsWritable);
            }

        }catch (Exception e){
            e.printStackTrace();
        }
    }
}

public final class WidgetPagesStats {
    private String webId;
    private String webPageLabel;
    private long widgetInstanceId;
    private String widgetType;

        …
    @Override
    public boolean equals(Object o) {

        ..
    }
    @Override
    public int hashCode() {

        ..
    }
    @Override
    public String toString() {
        return "WidgetPhoenix{“….
                '}';
    }
}

public class WidgetPagesStatsWritable implements DBWritable, Writable {

    private WidgetPagesStats widgetPagesStats;

    public void readFields(DataInput input) throws IOException {
        widgetPagesStats.setWebId(input.readLine());
        widgetPagesStats.setWebPageLabel(input.readLine());
        widgetPagesStats.setWidgetInstanceId(input.readLong());
        widgetPagesStats.setWidgetType(input.readLine());

        …
    }

    public void write(DataOutput output) throws IOException {
        output.writeBytes(widgetPagesStats.getWebId());
        output.writeBytes(widgetPagesStats.getWebPageLabel());

        output.writeLong(widgetPagesStats.getWidgetInstanceId());
        output.writeBytes(widgetPagesStats.getWidgetType());

        ..
    }

    public void readFields(ResultSet rs) throws SQLException {
        widgetPagesStats.setWebId(rs.getString("WEB_ID"));
        widgetPagesStats.setWebPageLabel(rs.getString("WEB_PAGE_LABEL"));
        widgetPagesStats.setWidgetInstanceId(rs.getLong("WIDGET_INSTANCE_ID"));
        widgetPagesStats.setWidgetType(rs.getString("WIDGET_TYPE"));

        …
    }

    public void write(PreparedStatement pstmt) throws SQLException {
        Connection connection = pstmt.getConnection();
        PhoenixConnection phoenixConnection = (PhoenixConnection) connection;
        //connection.getClientInfo().setProperty("scn", Long.toString(widgetPhoenix.getViewDateTimestamp()));

        pstmt.setString(1, widgetPagesStats.getWebId());
        pstmt.setString(2, widgetPagesStats.getWebPageLabel());
        pstmt.setString(3, widgetPagesStats.getDeviceType());

        pstmt.setLong(4, widgetPagesStats.getWidgetInstanceId());

        …
    }

    public WidgetPagesStats getWidgetPagesStats() {
        return widgetPagesStats;
    }

    public void setWidgetPagesStats(WidgetPagesStats widgetPagesStats) {
        this.widgetPagesStats = widgetPagesStats;
    }
}


________________________________
This message and any attachments are intended only for the use of the addressee and may contain
information that is privileged and confidential. If the reader of the message is not the intended
recipient or an authorized representative of the intended recipient, you are hereby notified
that any dissemination of this communication is strictly prohibited. If you have received
this communication in error, notify the sender immediately by return email and delete the
message and any attachments from your system.







Mime
View raw message