community-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebb (JIRA)" <>
Subject [jira] [Resolved] (COMDEV-161) may count a message multiple times
Date Sat, 26 Sep 2015 00:27:04 GMT


Sebb resolved COMDEV-161.
    Resolution: Fixed

COMDEV-161 may count a message multiple times
Fixed RE to look for "From " at the start of a line
Also changed code to read data by line rather than slurping entire mailbox into memory
Added some timestamp traces to check on performance


> may count a message multiple times
> -------------------------------------------------
>                 Key: COMDEV-161
>                 URL:
>             Project: Community Development
>          Issue Type: Bug
>          Components: Reporter Tool
>            Reporter: Sebb
> The script counts messages by matching /Date: (.*)/.
> It is looking to match header lines of the form:
> Date: Thu, 01 May 2008 05:06:51 +0000
> However such lines are not guaranteed to be unique within a message.
> In particular SVN commit messages have a "Date:" line which matches, and the parsed timestamp
will be much the same as the header date. For example:
> Author: cml
> Date: Wed Sep 16 19:06:03 2015
> New Revision: 1703436
> The mailbox format currently used by the ASF guarantees that each message is prefixed
with a line in the format:
> From Thu May 01 05:10:32 2008
> [Lines in the message body starting "From " are prefixed as ">From "; the prefix is
removed when messages are extracted]
> Only lines starting "From " are guaranteed not to occur in message bodies.
> The problem is trivial to fix, but it will change the generated statistics, particularly
for mailboxes that receive SVN commit messages (Git commits use a different prefix for the
timestamp). SVN mails will generally be counted twice.

This message was sent by Atlassian JIRA

View raw message