mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 52181: Added synchronization in link logic to prevent relinking races.
Date Wed, 28 Sep 2016 18:48:52 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52181/#review150751
-----------------------------------------------------------


Ship it!




Ship It!

- Benjamin Mahler


On Sept. 24, 2016, 1:04 a.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52181/
> -----------------------------------------------------------
> 
> (Updated Sept. 24, 2016, 1:04 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Artem Harutyunyan, and Joris Van Remoortere.
> 
> 
> Bugs: MESOS-6234
>     https://issues.apache.org/jira/browse/MESOS-6234
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> There is a general pattern in the `SocketManager` in which most methods
> will grab the mutex and then check if the socket to manage exists in
> the `SocketManager`s mapping.  If the socket does not exist, the
> `SocketManager` silently returns.
> 
> This adds similar logic inside two critical sections of the `link`
> codepath.  If there are multiple calls to `link` in-flight at once,
> this prevents sockets from being leaked into unmanaged callback loops.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/process.cpp 02a192529e53479d5a163fa6a20873674b51ee2c 
> 
> Diff: https://reviews.apache.org/r/52181/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> ---
> 
> This test is not included in the review diff because it cannot be run in a loop (i.e.
`--gtest_repeat`).  The system will run out of emphemeral sockets after ~80 iterations and
the test will begin to fail due to failing socket connections.
> 
> 3rdparty/libprocess/libprocess-tests --gtest_filter="ProcessRemoteLinkTest.RemoteRelinkLoop"
--gtest_repeat=50 --gtest_break_on_failure
> 
> ```
> // Verifies that repeated relinking of a remote process
> // maintains monitoring of the remote process.
> TEST_F(ProcessRemoteLinkTest, RemoteRelinkLoop)
> {
>   RemoteLinkTestProcess relinker(pid);
>   Future<UPID> relinkerExitedPid;
> 
>   EXPECT_CALL(relinker, exited(pid))
>     .WillOnce(FutureArg<0>(&relinkerExitedPid));
> 
>   spawn(relinker);
> 
>   // NOTE: The remote process we are linking to never closes sockets.
>   // Therefore, we can only make a limited number of connections before
>   // the remote process runs out of file descriptors.
>   for (int i = 0; i < 200; i++) {
>     relinker.relink();
>   }
> 
>   os::killtree(linkee->pid(), SIGKILL);
>   reap_linkee();
>   linkee = None();
> 
>   AWAIT_ASSERT_EQ(pid, relinkerExitedPid);
> 
>   terminate(relinker);
>   wait(relinker);
> }
> ```
> 
> ---
> 
> In order to test the SSL-downgrade path, you need to tweak the way we launch `test-linkee`
in `ProcessRemoteLinkTest::SetUp`:
> ```
>     std::map<std::string, std::string> empty_environment;
>     Try<Subprocess> s = process::subprocess(
>         path::join(BUILD_DIR, "test-linkee") +
>           " '" + stringify(coordinator.self()) + "'",
>         process::Subprocess::FD(STDIN_FILENO),
>         process::Subprocess::FD(STDOUT_FILENO),
>         process::Subprocess::FD(STDERR_FILENO),
>         empty_environment);
> ```
> 
> LIBPROCESS_SSL_ENABLED=true LIBPROCESS_SSL_SUPPORT_DOWNGRADE=true LIBPROCESS_SSL_KEY_FILE=key.pem
LIBPROCESS_SSL_CERT_FILE=cert.pem 3rdparty/libprocess/libprocess-tests --gtest_filter="ProcessRemoteLinkTest.RemoteRelinkLoop"
--gtest_repeat=50 --gtest_break_on_failure
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message