mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Mahler" <benjamin.mah...@gmail.com>
Subject Re: Review Request 41178: Fixed a message dropping bug in the health checker.
Date Thu, 10 Dec 2015 02:42:33 GMT


> On Dec. 10, 2015, 2:35 a.m., Artem Harutyunyan wrote:
> > src/health-check/main.cpp, line 120
> > <https://reviews.apache.org/r/41178/diff/1/?file=1157969#file1157969line120>
> >
> >     Do we need to create a JIRA for eventually get rid of the hack?

Good idea, I filed MESOS-4111 and will reference it in a TODO. Will also add a reference in
the command executor sleep.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41178/#review109667
-----------------------------------------------------------


On Dec. 10, 2015, 2:01 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41178/
> -----------------------------------------------------------
> 
> (Updated Dec. 10, 2015, 2:01 a.m.)
> 
> 
> Review request for mesos, Artem Harutyunyan and Timothy Chen.
> 
> 
> Bugs: MESOS-1613 and MESOS-4106
>     https://issues.apache.org/jira/browse/MESOS-1613
>     https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Much like in the command executor, we need to sleep after we send
> the final message in the health checker. Otherwise, we may exit
> before libprocess is able to finish sending the message over the
> local network.
> 
> This led to the following issues:
> https://issues.apache.org/jira/browse/MESOS-1613
> https://issues.apache.org/jira/browse/MESOS-4106
> 
> 
> Diffs
> -----
> 
>   src/health-check/main.cpp 83ee38cd853325b3adc7cb6bc2d1d67b343037f5 
>   src/tests/health_check_tests.cpp b1454b085b36bb7c4d8ef012c764cd8466b4fb02 
> 
> Diff: https://reviews.apache.org/r/41178/diff/
> 
> 
> Testing
> -------
> 
> Running the `HealthCheckTest.DISABLED_ConsecutiveFailures` test in repetition on a machine
loaded with many `openssl speed` commands in the background reproduces the flakiness. After
this patch it is no longer flaky in this setup.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message