mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Park" <mcyp...@gmail.com>
Subject Re: Review Request 35433: CHECK that checkpointed resources exist on the slave.
Date Wed, 17 Jun 2015 00:21:56 GMT


> On June 14, 2015, 10:46 a.m., Benjamin Hindman wrote:
> > Just so I understand, does this mean if we happen to get in the unfortunate situation
where a slave has neglected to get the dynamic reservation because it was just starting up
and then it gets the task launch it will shutdown the slave because the CHECK will fail? I
would expect the slave to simply send a TASK_LOST. Said another way, this is not an assertion
our code guarantees. If instead we were waiting for some kind of an ack from the slave that
it received the dynamic reservation before it send the task launch then a CHECK would make
sense.
> 
> Jie Yu wrote:
>     We don't expect this to happen because we always send a CheckpointResourcesMessage
before sending the task to the slave and TCP ensures in order delivery (out of order delivery
is possible if two sockets are used. it's possible because the way we create ephemeral connections,
but this is very unlikely to happen). Master won't send the task to the slave if the slave
hasn't registered.
>     
>     I would rather keep the CHECK here unless we found that this is a real issue (and
then we can change that to send status update).

So currently it is possible for this to happen, but only with a very small probability. Your
proposal is to keep the `CHECK` and put in the effort to eliminate the possibility once we
observe it as a real problem, correct? The part that I don't quite understand is, what's the
motivation to wait for a real problem to occur when we know it's possible to run into this
issue (even with a small probability), the effort to change the `CHECK` to sending `TASK_LOST`
seems to be small?


- Michael


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35433/#review87857
-----------------------------------------------------------


On June 15, 2015, 12:39 p.m., Michael Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35433/
> -----------------------------------------------------------
> 
> (Updated June 15, 2015, 12:39 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Jie Yu.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> No bug was observed (yet), but realized I forgot about this in the dynamic reservations
patches.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 67732a40ef683cb0188425f0bba8fe8ab83e461c 
> 
> Diff: https://reviews.apache.org/r/35433/diff/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Michael Park
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message