mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Wu <jos...@mesosphere.io>
Subject Re: Review Request 52181: Added synchronization in link logic to prevent relinking races.
Date Sat, 24 Sep 2016 01:04:34 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52181/
-----------------------------------------------------------

(Updated Sept. 23, 2016, 6:04 p.m.)


Review request for mesos, Benjamin Mahler, Artem Harutyunyan, and Joris Van Remoortere.


Changes
-------

Moved test into the "Testing Done" as it has some repeatability issues.


Summary (updated)
-----------------

Added synchronization in link logic to prevent relinking races.


Bugs: MESOS-6234
    https://issues.apache.org/jira/browse/MESOS-6234


Repository: mesos


Description
-------

There is a general pattern in the `SocketManager` in which most methods
will grab the mutex and then check if the socket to manage exists in
the `SocketManager`s mapping.  If the socket does not exist, the
`SocketManager` silently returns.

This adds similar logic inside two critical sections of the `link`
codepath.  If there are multiple calls to `link` in-flight at once,
this prevents sockets from being leaked into unmanaged callback loops.


Diffs (updated)
-----

  3rdparty/libprocess/src/process.cpp 02a192529e53479d5a163fa6a20873674b51ee2c 

Diff: https://reviews.apache.org/r/52181/diff/


Testing (updated)
-------

make check

---

This test is not included in the review diff because it cannot be run in a loop (i.e. `--gtest_repeat`).
 The system will run out of emphemeral sockets after ~80 iterations and the test will begin
to fail due to failing socket connections.

3rdparty/libprocess/libprocess-tests --gtest_filter="ProcessRemoteLinkTest.RemoteRelinkLoop"
--gtest_repeat=50 --gtest_break_on_failure

```
// Verifies that repeated relinking of a remote process
// maintains monitoring of the remote process.
TEST_F(ProcessRemoteLinkTest, RemoteRelinkLoop)
{
  RemoteLinkTestProcess relinker(pid);
  Future<UPID> relinkerExitedPid;

  EXPECT_CALL(relinker, exited(pid))
    .WillOnce(FutureArg<0>(&relinkerExitedPid));

  spawn(relinker);

  // NOTE: The remote process we are linking to never closes sockets.
  // Therefore, we can only make a limited number of connections before
  // the remote process runs out of file descriptors.
  for (int i = 0; i < 200; i++) {
    relinker.relink();
  }

  os::killtree(linkee->pid(), SIGKILL);
  reap_linkee();
  linkee = None();

  AWAIT_ASSERT_EQ(pid, relinkerExitedPid);

  terminate(relinker);
  wait(relinker);
}
```

---

In order to test the SSL-downgrade path, you need to tweak the way we launch `test-linkee`
in `ProcessRemoteLinkTest::SetUp`:
```
    std::map<std::string, std::string> empty_environment;
    Try<Subprocess> s = process::subprocess(
        path::join(BUILD_DIR, "test-linkee") +
          " '" + stringify(coordinator.self()) + "'",
        process::Subprocess::FD(STDIN_FILENO),
        process::Subprocess::FD(STDOUT_FILENO),
        process::Subprocess::FD(STDERR_FILENO),
        empty_environment);
```

LIBPROCESS_SSL_ENABLED=true LIBPROCESS_SSL_SUPPORT_DOWNGRADE=true LIBPROCESS_SSL_KEY_FILE=key.pem
LIBPROCESS_SSL_CERT_FILE=cert.pem 3rdparty/libprocess/libprocess-tests --gtest_filter="ProcessRemoteLinkTest.RemoteRelinkLoop"
--gtest_repeat=50 --gtest_break_on_failure


Thanks,

Joseph Wu


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message