mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph Wu" <jos...@mesosphere.io>
Subject Re: Review Request 40501: Cleanup a leaked reference to a test process living in the stack.
Date Fri, 20 Nov 2015 18:46:05 GMT


> On Nov. 19, 2015, 6:16 p.m., Neil Conway wrote:
> > Good find. I wonder:
> > 
> > (a) is there some general advice we should give to people implementing Processes
(e.g., "always provide a destructor that does terminate/wait" -- that is probably too broad
though). Would be nice to add this to the libprocess README.
> > (b) does this problem occur anywhere else? and/or is there a way to detect it?
> 
> Bernd Mathiske wrote:
>     [~neilc] a) I agree that we should provide documentation in this regard. This kind
of pattern is too easily overlooked, also by reviewers, as exemplified when the bug got introduced:
https://reviews.apache.org/r/27483/
>     b) For test code (such as is the case here), we could put something into the inherited
fixture that scans for orphaned processes at TearDown.
> 
> Benjamin Bannier wrote:
>     Note that there is some assymmetry in the API: when *not giving* a `manage` arg to
`spawn`. you get the default value `false`; but you then *do need to add code* to terminate
and wait for the process.
>     
>     I personally would naively have expected an API to either (i) require me to be explicitly
in both places (explictly set `manage=false`, and to manual `terminate` and `wait`), or (ii)
it just do The Right Thing if I was brief (I called `spawn` with default args, and do not
have to worry about cleanup).
>     
>     I.e., wouldn't it in the long run make more sense the change `spawn` to `PID<T>
spawn(T& t, bool manage = true)` than to add more documentation?

[~neilc] 
a) I agree that documentation is a good direction.   (I'll draft something.)

[~bernd-mesos]
b) We already have an orphaned process detector in `src/tests/environment.cpp` `Environment::TearDown`.
 However, this only catches **orphaned** processes (whose parents have been killed :( ). 
This HTTP process's parent is the test process.
There is an endpoint `/__processes__` which we might be able to use to check for extraneous
processes.  But this might be messy, since we have a great deal of global processes.

[~bbannier]
The `manage` argument will only change ownership of the process pointer.  If you dig into
the `GarbageCollector`, you'll see that all it does is `delete` the pointer when the process
terminates.  The original process is still in charge of its own termination.


- Joseph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40501/#review107290
-----------------------------------------------------------


On Nov. 19, 2015, 1:51 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40501/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2015, 1:51 p.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Artem Harutyunyan, and Joris Van Remoortere.
> 
> 
> Bugs: MESOS-3753
>     https://issues.apache.org/jira/browse/MESOS-3753
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Two of the fetcher tests will spawn a process which is stored in the stack (i.e. local
variable in the test).  `spawn` will store a pointer to the process in libprocess's `ProcessManager`.
 When the test finishes, the process goes out of scope and is therefore lost.  However, the
process is **not** terminated.
> 
> Failing to terminate this process will lead to an infinite loop in `~ProcessManager`,
which is called in `process::finalize`.  In `ProcessManager` 's destructor, we will loop and
try to kill all processes.  The process spawned in the test will be running.  However, since
the pointer lives in the stack, the `ProcessManager` will be unable to find the process and
will thereby be stuck trying to kill a process it cannot find.
> 
> 
> Diffs
> -----
> 
>   src/tests/fetcher_tests.cpp 04079964b3539f555351d1444f3635c64700a1a8 
> 
> Diff: https://reviews.apache.org/r/40501/diff/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Additional testing:
> 
> Insert a `process::finalize` in `src/test/main.cpp`.  i.e.
> ```
>   // Replace `return RUN_ALL_TESTS();` with this:
>   int ret = RUN_ALL_TESTS();
>   process::finalize();
>   return ret;
> ```
> 
> Then `make check GTEST_FILTER="*FetcherTest.OSNetUriTest*:*FetcherTest.OSNetUriSpaceTest*"`.
> The test program should not stall or segfault or abort in some weird way.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message