mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jie Yu <yujie....@gmail.com>
Subject Re: Review Request 63001: Updated protobuf definitions related to offer operations.
Date Fri, 27 Oct 2017 11:56:25 GMT


> On Oct. 16, 2017, 6:33 p.m., Greg Mann wrote:
> > src/messages/messages.proto
> > Lines 637-638 (patched)
> > <https://reviews.apache.org/r/63001/diff/1-2/?file=1856406#file1856406line637>
> >
> >     I think that this list should include pending operations, but should _not_ include
terminal operations with unacknowledged updates.
> >     
> >     When the master receives an update, it needs to decide whether or not it should
update the allocator. If the operation was pending when `UpdateSlaveMessage` was sent, then
the master should update the allocator. If the operation was not pending but its update had
not been acknowledged, then the result of that operation was already reflected in the last
`UpdateSlaveMessage` from the agent, and the master should not update the allocator.
> >     
> >     If the agent includes only pending operations in this field, then the master
will be able to make this distinction.
> 
> Jie Yu wrote:
>     The reason we need to send terminal but unacked operations is for reconciliation.
Master always keeps a cache about all the known non-completed operations (either pending or
terminal but unacked). This information can be used to answer reconciliation requests.
>     
>     The master can distinguish between a pending operation vs. a terminal but unacked
operation by looking at latest state of the operation. The master update the allocator if
the latest state if terminal, irrespective whether it's acked or not.
>     
>     That does bring up an issue. Consider the following two cases:
>     
>     1) Master knows about an operation that the agent does not know about. This can happen
if the operation gets lost on its way to the agent. This will trigger an agent re-registration.
The master in this case will need to generate a reconcile message (similar to that in `reconcileKnownSlave`)
so that the agent or LRP can reply with a terminal status update. This will guarantee that
the resources allocated for this operation are properly recovered to allocator.
>     
>     This case is also possible if there is a speculation failure on old operations and
result in an `UpdateSlaveMessage` being sent to the master. The operation that master knows
about (but agent does not) will eventually reach agent or LRP, and will be rejected by the
LRP or the agent. The master will properly recovered the resources to the allocator when a
failed update is received.
>     
>     2) Agent knows about an operation that the master does not know about. For instance,
This is a new master that just fails over, and an LRP re-registers with the agent. In that
case, the master needs to update the 'allocated' (i.e., 'used') resources for those new operations.
This is similar to what we do when slave registers. However, the tricky part is that the allocator
does not have an interface currently allowing us to add more allocation to a given agent.
We likely need to add that (or make it part of `Allocator::updateSlave` interface).
> 
> Greg Mann wrote:
>     OK, I see why we need to include unacked updates as well. Thanks Jie!
> 
> Greg Mann wrote:
>     Hey Jie, I have another question related to this issue so I re-opened it for discussion.
>     
>     Previously you wrote "The master update the allocator if the latest state if terminal,
irrespective whether it's acked or not."
>     This implies that subsequent retries of the update will _not_ have the `latest_state`
set; otherwise, the master would not be able to distinguish between the first terminal status
update and subsequent retries. Was it your intention to suggest that behavior? Looking at
the `Slave::forward` helper that we use when intitializing the status update manager, it looks
like we _always_ set `latest_state` for task status updates: https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/slave/slave.cpp#L4867-L4870
>     
>     I suppose we could set the latest state the first time an operation update is sent
to master, and then _not_ set it in subsequent retries. Then the value of `.has_latest_state()`
would be a direct indication of whether or not an update is a retry.
> 
> Greg Mann wrote:
>     Ah, I see that `latest_status` is optional, so I suspect this was indeed your intention.
Just want to confirm :)

Greg, the allocator will be updated only if the state is transitioning from non terminal to
terminal:
https://github.com/apache/mesos/blob/master/src/master/master.cpp#L9122

No need to strip latest_state when sending the update.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63001/#review188162
-----------------------------------------------------------


On Oct. 19, 2017, 12:08 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63001/
> -----------------------------------------------------------
> 
> (Updated Oct. 19, 2017, 12:08 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Gaston Kleiman, Greg Mann,
Jan Schlicht, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Updated protobuf definitions related to offer operations.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 859fdff4d9a0604bc506b08af79075084ae23466 
>   include/mesos/resource_provider/resource_provider.proto f5a9073075327019fd133bd51265f695ef464845

>   include/mesos/v1/mesos.proto cfd4abd3af1d8c9fbd31659161eada9ec9f92282 
>   include/mesos/v1/resource_provider/resource_provider.proto e5cbede5b6e57a8641fca1ebfee5454f292cc24c

>   src/messages/messages.proto 0a32b3457e9143a7d48670610ca3e56dd516136f 
>   src/resource_provider/manager.cpp 31fcb789f5ab907511e868c374c49f7457a33ed3 
>   src/resource_provider/validation.cpp d2927227f60ab0d4ae2481ad73a31ee444b48ee0 
>   src/tests/resource_provider_validation_tests.cpp f182bff4670318e9de22c2915c5dbb423a74ad6c

> 
> 
> Diff: https://reviews.apache.org/r/63001/diff/9/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message