mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chun-Hung Hsiao <chhs...@apache.org>
Subject Re: Review Request 65665: Added operation state metrics in SLRP.
Date Thu, 12 Apr 2018 18:37:23 GMT


> On March 21, 2018, 1:12 p.m., Jan Schlicht wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 2713 (patched)
> > <https://reviews.apache.org/r/65665/diff/3/?file=1982876#file1982876line2717>
> >
> >     `statusUpdateManager.update` below might still fail to drop the message. In
that case we would count the operation as dropped in metrics when we increment here. Increment
in an `onReady` handler instead.
> 
> Chun-Hung Hsiao wrote:
>     If the `update()` call fails, the SLRP will immediately die anyway, so at least there
will be no double counting. But your suggestion makes sense so I'll do the change, and also
change where we increment the `operations_finished` or `operations_failed` metrics.

Got a question. We update the latest status of the operations from PENDING to FINISHED/FAILED/DROPPED
and checkpoint it before passing the status to the status update manager. If we increment
the metrics after the status update manager returns ready, then if an operation that becomes
terminal but the status update manager fails to send the update, this operation won't be counted
in any of the metrics.

So I prefer making the metrics reflect the view of SLRP not the status update manager. Thoughts?


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65665/#review199657
-----------------------------------------------------------


On April 12, 2018, 5:15 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65665/
> -----------------------------------------------------------
> 
> (Updated April 12, 2018, 5:15 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Greg Mann, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8383
>     https://issues.apache.org/jira/browse/MESOS-8383
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds `operations_pending`, `operations_finished`,
> `operations_failed`, `operations_error` (currently unused), and
> `operations_dropped` metrics to count the occurances of these operation
> states.
> 
> Additionally, An error log in `_applyOperation()` is removed because the
> error is already logged at the call site.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/storage/provider.cpp a07620d1c4bf618f669575b3e183bf598da905a6

> 
> 
> Diff: https://reviews.apache.org/r/65665/diff/4/
> 
> 
> Testing
> -------
> 
> sudo make check
> A unit test is added in the next patch in chain.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message