mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kone <vinodk...@gmail.com>
Subject Re: Review Request 55157: Fixed a bug in the default executor around not committing suicide.
Date Mon, 09 Jan 2017 03:26:47 GMT


> On Jan. 9, 2017, 2:10 a.m., Vinod Kone wrote:
> > src/launcher/default_executor.cpp, lines 663-683
> > <https://reviews.apache.org/r/55157/diff/1/?file=1596482#file1596482line663>
> >
> >     How about
> >     
> >     ```
> >     // Check to see if the executor needs to shutdown.
> >     
> >     // If the executor is already in the process of shutting down, return.
> >     if (shuttingDown) {
> >       return;
> >     }
> >     
> >     // If there are no active containers, shutdown the executor.
> >     if (containers.empty()) {
> >       shutdown();
> >       return;
> >     }
> >     
> >     // If this container exited with non-zero status, kill all the other containers,
> >     // per the default policy.
> >     if (taskState == TASK_FAILED) {
> >     
> >       // TODO(Anand): ...
> >       shutdown();
> >     }
> >     
> >     
> >     ```
> 
> Anand Mazumdar wrote:
>     hmm,  I had refrained from splitting the no active containers check and moving it
above was because I still wanted to log why the single task pod failed. With the above suggested
snippet, we would need to do the logging for both the cases where we return early if the task
failed? Another alternative can be that I return early when the executor is shutting down
while keeping the rest of the logic same:
>     
>     ```cpp
>         // Ignore if the executor is already in the process of shutting down.
>         if (shuttingDown) {
>           return;
>         }
>     
>         // The default restart policy for a task group is to kill all the
>         // remaining child containers if one of them terminated with a
>         // non-zero exit code.
>         if (taskState == TASK_FAILED) {
>           LOG(ERROR)
>             << "Child container " << containerId << " terminated with
status "
>             << (status.isSome() ? WSTRINGIFY(status.get()) : "unknown");
>     
>           // Kill all the other active containers.
>           //
>           // TODO(anand): Invoke `kill()` once per active container
>           // instead of directly invoking `shutdown()`.
>           if (!containers.empty()) {
>             shutdown();
>             return;
>           }
>         }
>     
>         // Shutdown the executor if all the active child containers have terminated.
>         if (containers.empty()) {
>           __shutdown();
>         }
>     ```
>     
>     What do you think?

I think we can print the exit code/reason in the LOG(INFO) message at #656. It is useful irrespective
of task status?

I moved up `if (containers.empty())` check because it was not very intuitive to me why you
had a `!containers.empty()` check inside that if statement. I had to read the `shutdown()`
code to understand that. As a side note, it sounds like `shutdown()` should have the logic
to call `__shutdown()` if containers is empty.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55157/#review160849
-----------------------------------------------------------


On Jan. 4, 2017, 12:38 a.m., Anand Mazumdar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55157/
> -----------------------------------------------------------
> 
> (Updated Jan. 4, 2017, 12:38 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6848
>     https://issues.apache.org/jira/browse/MESOS-6848
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This bug is only observed when the task group contains a single task.
> The default executor was not committing suicide when this single task
> used to exit with a non-zero status code as per the default restart
> policy.
> 
> 
> Diffs
> -----
> 
>   src/launcher/default_executor.cpp fd9963996c82461b60888397989e309d51b60aaa 
>   src/tests/default_executor_tests.cpp 340e8c8b36a6a3cc6e5bae021e19dfa7a54852c3 
> 
> Diff: https://reviews.apache.org/r/55157/diff/
> 
> 
> Testing
> -------
> 
> make check + added a test
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message