mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Wu <jos...@mesosphere.io>
Subject Re: Review Request 45905: Added metrics to the balloon framework.
Date Tue, 07 Jun 2016 00:33:35 GMT


> On April 21, 2016, 2:40 p.m., Vinod Kone wrote:
> > src/examples/balloon_framework.cpp, line 389
> > <https://reviews.apache.org/r/45905/diff/4/?file=1351768#file1351768line389>
> >
> >     Why do you need to store this?

With the process split, we don't need to store it anymore.


> On April 21, 2016, 2:40 p.m., Vinod Kone wrote:
> > src/examples/balloon_framework.cpp, line 297
> > <https://reviews.apache.org/r/45905/diff/4/?file=1351768#file1351768line297>
> >
> >     is failed to fetch the URI the only reason when we get REASON_CONTAINER_LAUNCH_FAILED
?

Not necessarily, but for a framework that constantly launches (and OOMs) tasks, this is the
most common "unexpected" failure condition.


- Joseph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45905/#review129966
-----------------------------------------------------------


On June 6, 2016, 5:30 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45905/
> -----------------------------------------------------------
> 
> (Updated June 6, 2016, 5:30 p.m.)
> 
> 
> Review request for mesos, Greg Mann, Artem Harutyunyan, Kevin Klues, and Vinod Kone.
> 
> 
> Bugs: MESOS-5174
>     https://issues.apache.org/jira/browse/MESOS-5174
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Adds metrics to gauge the health of the framework.  This includes:
> 
> * uptime_secs = How long the framework has been running.
> * registered = If the framework is registered.
> * tasks_finished = Number of tasks finished (successfully).
> * tasks_oomed = Number of tasks that were OOM killed.
> * allowed_terminations = Number of terminal status updates which
>   are acceptable due to infrastructure reasons.
> * abnormal_terminations = Number of terminal status updates which 
>   were not `TASK_FINISHED` or `TASK_FAILED` due to OOM.
> 
> 
> Diffs
> -----
> 
>   src/examples/balloon_framework.cpp 739fb504e93154bf032b4c621151fa3c99b60037 
> 
> Diff: https://reviews.apache.org/r/45905/diff/
> 
> 
> Testing
> -------
> 
> ```
> make check
> 
> sudo bin/mesos-tests.sh --gtest_filter="*ROOT_CGROUPS_BalloonFramework"
> 
> # Also launched two instances on a cluster.
> # This one OOM's:
> ./balloon-framework --master=zk://localhost:2181/mesos --checkpoint --balloon_limit=256MB
--task_memory=128MB --executor_uri="https://s3.amazonaws.com/url/to/balloon-executor" --executor_command="LD_LIBRARY_PATH=/path/to/libmesos
&& ./balloon-executor"
> 
> # This one does not OOM:
> ./balloon-framework --master=zk://localhost:2181/mesos --checkpoint --balloon_limit=256MB
--task_memory=256MB --executor_uri="https://s3.amazonaws.com/url/to/balloon-executor" --executor_command="LD_LIBRARY_PATH=/path/to/libmesos
&& ./balloon-executor"
> ```
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message