mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lawrence Wu <lawren...@twitter.com>
Subject Re: Review Request 50540: Add systemd watchdog support.
Date Fri, 12 Aug 2016 20:19:04 GMT


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/linux/systemd.hpp, line 36
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455903#file1455903line36>
> >
> >     const reference?
> >     
> >     Does the period need to change for different calls? Why not make it a static
instance variable?

yes, let's make this an instance variable.


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/Makefile.am, line 2145
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455902#file1455902line2145>
> >
> >     Formatting should be 8 ts which I don't think this is. Use "set ts=8" in Vim.

fixed


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_tests.cpp, lines 122-123
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455909#file1455909line122>
> >
> >     Does this validate that the service has actually been stopped or just that systemd
accepted the stop command?

This just asserts that systemd has accepted the stop command and does not ensure that the
service was actually stopped. I think it is safe to assume that systemd will correctly stop
the service.


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_tests.cpp, line 126
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455909#file1455909line126>
> >
> >     What happens if there's an assertion during the test!? Because this is outside
the test sandbox it'll leak. See earlier comment about writing and referencing this from the
sandbox.

see above, using systemctl link and disable now.


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_tests.cpp, line 44
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455909#file1455909line44>
> >
> >     Is there any way to create the service files outside of the system directory
to avoid polluting it (especially if the test crashes)?
> >     
> >     And, systemd-test-helper is a pretty generic name...

I did originally try creating the service file in the test sandbox because the I thought I
could run systemctl with SYSTEMD_UNIT_PATH set to the sandbox directory and then pick up my
service file. Turns out you can't set SYSTEMD_UNIT_PATH on runtime -- it has to get set when
compiling ("Unit files are loaded from a set of paths determined during compilation"). So
this seemed like the only option.

I just investigated this some more, and you _can_ create the service file in the sandbox and
use `systemctl link $file`, which will create a symlink in /etc/systemd/system/...which seems
like almost as much pollution (symlink vs file), but is maybe a little better since if the
test crashes and the user sees a symlinked file that points to nowhere, she can feel a little
safer deleting the symlink. So let's go with this way.

Also renamed systemd-test-helper to mesos-systemd-test-helper.


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_test_helper.cpp, line 34
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455907#file1455907line34>
> >
> >     Is this universally true across all distributions?

Actually, we don't use any of the flags in the test, so let's just not set them.


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_test_helper.cpp, line 35
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455907#file1455907line35>
> >
> >     Can't assume cgroups are mounted here. Tests use TEST_CGROUPS_HIERARCHY?

not setting this anymore


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/linux/systemd.cpp, line 196
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455904#file1455904line196>
> >
> >     Is there any behavior defined if the variable is set but with no value? As opposed
to being unset?

systemd shouldn't set the variable improperly (with no value), but it's still possible that
happens (or someone runs mesos with this envvar)...
I added some error handling here to catch unparseable variables.


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > configure.ac, lines 615-616
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455901#file1455901line615>
> >
> >     What are the implications of linking this in for people that have systemd but
don't want to manage Mesos with systemd? Should this be a configure option?

If users don't want to manage Mesos with systemd, they can set the systemd_enable_support
flag to false. I don't think they should have to pass in another configure option for whether
they want to use systemd or not -- let's just link the library if it exists and let the user
disable systemd with the flag.


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_tests.cpp, line 87
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455909#file1455909line87>
> >
> >     no, no! bad... what can you set up to wait on to determine when it's been started?

Unfortunately, I think we have to sleep here because we are waiting on systemd (which is running
a separate event loop) to do stuff. If you can think of another solution I'd love to know,
since I agree that this isn't ideal.


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/linux/systemd.cpp, line 31
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455904#file1455904line31>
> >
> >     Do you need this conditional compilation *within* the linux/systemd.cpp file,
or is inclusion of the entire file conditional?

We need this within the file, since linux/systemd.cpp gets imported from slave/main.cpp even
if systemd does not exist on the machine.


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/linux/systemd.cpp, line 201
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455904#file1455904line201>
> >
> >     Watchdog* watchdog
> >     
> >     What's the lifetime of this object!? Who (and when) releases it?

the watchdog needs to live until the process ends, so it will get freed on exit (i think).
that's why nothing explicitly releases it.


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/Makefile.am, line 2193
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455902#file1455902line2193>
> >
> >     Ditto, formatting.

fixed


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/linux/systemd.hpp, lines 20-21
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455903#file1455903line20>
> >
> >     We alphabetize includes so process.hpp comes before subprocess.hpp. See the
style guide.

fixed


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/linux/systemd.hpp, line 38
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455903#file1455903line38>
> >
> >     Check style, I think it's to include comment on close of #ifdef, e.g.,
> >     
> >     #endif // HAVE_LIBSYSTEMD

fixed


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/linux/systemd.cpp, line 195
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455904#file1455904line195>
> >
> >     ditto

fixed


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/linux/systemd.cpp, line 199
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455904#file1455904line199>
> >
> >     use os::getenv()

fixed


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/linux/systemd.cpp, line 200
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455904#file1455904line200>
> >
> >     use numify<>()

fixed


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/linux/systemd.cpp, line 212
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455904#file1455904line212>
> >
> >     ditto

fixed


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/linux/systemd.cpp, line 213
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455904#file1455904line213>
> >
> >     This would be const& but see earlier.

using instance variable


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_test_helper.cpp, lines 19-21
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455907#file1455907line19>
> >
> >     split these

done


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_tests.cpp, lines 17-22
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455909#file1455909line17>
> >
> >     alphabetize please.

done


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_tests.cpp, line 45
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455909#file1455909line45>
> >
> >     import this and std::cout etc.

done


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_tests.cpp, line 47
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455909#file1455909line47>
> >
> >     std::endl on preceeding line?

done


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_tests.cpp, lines 65-68
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455909#file1455909line65>
> >
> >     Use stout's os::write(path, message).

done


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_tests.cpp, line 71
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455909#file1455909line71>
> >
> >     Don't use informational logging, use assertions with logging:
> >     ASSERT(false) << "Ooops!";

done


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_tests.cpp, line 79
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455909#file1455909line79>
> >
> >     ditto

done


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_tests.cpp, line 89
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455909#file1455909line89>
> >
> >     Use path::join()

done


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_tests.cpp, line 90
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455909#file1455909line90>
> >
> >     ditto

done


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_tests.cpp, lines 101-109
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455909#file1455909line101>
> >
> >     Use os::read(tmpout) to get a Result<string> and avoid all of this.

done


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_tests.cpp, lines 112-113
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455909#file1455909line112>
> >
> >     Use stout's `bool strings::contains()`

done


> On Aug. 9, 2016, 6:33 p.m., Ian Downes wrote:
> > src/tests/linux/systemd_tests.cpp, line 116
> > <https://reviews.apache.org/r/50540/diff/2/?file=1455909#file1455909line116>
> >
> >     ditto

done


- Lawrence


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50540/#review144525
-----------------------------------------------------------


On Aug. 12, 2016, 8:18 p.m., Lawrence Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50540/
> -----------------------------------------------------------
> 
> (Updated Aug. 12, 2016, 8:18 p.m.)
> 
> 
> Review request for mesos, David Robinson and Ian Downes.
> 
> 
> Bugs: MESOS-5376
>     https://issues.apache.org/jira/browse/MESOS-5376
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Add systemd watchdog support.
> 
> 
> Diffs
> -----
> 
>   configure.ac d2136909b7305498ae901a5ea00133142b77f9e6 
>   src/Makefile.am 599ebbef6d164fb2a530b55427ddabb5cd607634 
>   src/linux/systemd.hpp 91134f1d4b100759e45931bd09ca4e1e1aeaaf8a 
>   src/linux/systemd.cpp 619aa2778da5f99d3a078a8e1208bdaa9dc77581 
>   src/slave/main.cpp 4624392d30cf391015dcd63f447fe2414a47a16a 
>   src/tests/linux/systemd_test_helper.hpp PRE-CREATION 
>   src/tests/linux/systemd_test_helper.cpp PRE-CREATION 
>   src/tests/linux/systemd_test_helper_main.cpp PRE-CREATION 
>   src/tests/linux/systemd_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50540/diff/
> 
> 
> Testing
> -------
> 
> Tested by sending SIGSTOP to running mesos and verifying via journalctl that it was killed
by the watchdog.
> 
> The test I wrote for this does the following:
> - build up a unit file as a string and create a unit file in /etc/systemd/system/systemd-test-helper.service
> - reload the systemd daemon and start the newly discovered helper service
> - wait a bit (30s) to make sure the watchdog has had a chance to kill the service
> - use systemctl status systemd-test-helper to check that the service is still running
> - clean up the unit file.
> 
> TODO: create a similar test, but send a SIGSTOP to the service and ensure that it has
been killed by watchdog.
> 
> 
> Thanks,
> 
> Lawrence Wu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message