mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian Zhang <zhq527...@gmail.com>
Subject Re: Review Request 68495: Made command check always waits before removing the nested container.
Date Tue, 28 Aug 2018 14:18:55 GMT


> On Aug. 28, 2018, 1:16 a.m., Alexander Rukletsov wrote:
> > src/checks/checker_process.cpp
> > Lines 878-889 (original), 890-901 (patched)
> > <https://reviews.apache.org/r/68495/diff/1/?file=2077041#file2077041line890>
> >
> >     It looks like we should always call `waitNestedContainer()` after we said `previousCheckContainerId
= checkContainerId;`. For example here.
> >     
> >     Maybe it makes sense to call `waitNestedContainer()` right in the beginning?
We can end up calling it twice, but I think it's fine?
> 
> Qian Zhang wrote:
>     > Maybe it makes sense to call waitNestedContainer() right in the beginning? We
can end up calling it twice, but I think it's fine?
>     
>     That means we will call agent API `WAIT_NESTED_CONTAINER` twice for each successful
launch of check container, I think that might be a burden for agent in a large scale env.
So I'd still prefer to call it only in the places where we have to do it.

And for the case (L879:L888, i.e., the connection to agent failed) that you pointed out above,
I think when the connection to agent is back (e.g., agent starts up again), the check container
will be treated as orphan container and destroyed by agent, and then we will remove it here:
https://github.com/apache/mesos/blob/1.6.1/src/checks/checker_process.cpp#L616:L638. However
I am going to post another patch to change these codes (https://github.com/apache/mesos/blob/1.6.1/src/checks/checker_process.cpp#L660:L664)
to something like:
```
          promise->discard();
        } else {
          previousCheckContainerId = None();
          _nestedCommandCheck(promise, cmd, nested);
        }
```
In this way, if we fail to remove the check container (e.g., due to agent has not finished
recovery, or the check container is still in `DESTROYING` state), we will try to remove it
again.


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68495/#review207984
-----------------------------------------------------------


On Aug. 24, 2018, 5:54 p.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68495/
> -----------------------------------------------------------
> 
> (Updated Aug. 24, 2018, 5:54 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Alexander Rukletsov, Gastón Kleiman, and Gilbert
Song.
> 
> 
> Bugs: MESOS-8568
>     https://issues.apache.org/jira/browse/MESOS-8568
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Made command check always waits before removing the nested container.
> 
> 
> Diffs
> -----
> 
>   src/checks/checker_process.cpp 77a76f465fe57eab89f027b5acb74c2339551678 
> 
> 
> Diff: https://reviews.apache.org/r/68495/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message