mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jie Yu <yujie....@gmail.com>
Subject Re: Review Request 63732: Reconciled offer operations between agent and master.
Date Thu, 16 Nov 2017 01:05:47 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63732/#review191135
-----------------------------------------------------------




src/master/master.cpp
Line 7044 (original), 7044 (patched)
<https://reviews.apache.org/r/63732/#comment268748>

    Do you need to do that for resources in the operations?



src/master/master.cpp
Lines 7103-7120 (patched)
<https://reviews.apache.org/r/63732/#comment268746>

    I feel that long term, this way of removing all and then add back will probably not work.
    
    Removing offer operation means we'll need to send status update (if the current state
is not terminal). I'd suggest we only remove those that are not in the new list, and add those
that are not in the old list.
    
    Same comments apply to the agent chagne.



src/master/master.cpp
Lines 7107-7121 (patched)
<https://reviews.apache.org/r/63732/#comment268751>

    Do you need to also update allocator for added or removed new operations?
    
    For instance, the allocator currently think the new operation A uses 2cpus. Now, if A
is removed (because it's dropped), do we need to tell the allocator that the 2cpus are no
longer used and they can be allocated to others?



src/master/master.cpp
Lines 7108 (patched)
<https://reviews.apache.org/r/63732/#comment268753>

    Think about the case where agent crashes and restarts, not all RP has re-registered yet.
In that case, some operation from some not yet re-registered RP will not be part of this operation
list.
    
    I don't think we want to remove those operations just yet. I think we should remove those
operations only if the corresponding RP has re-registered with the agent and show up in the
offer operation list (or total resources).
    
    This is similar to we don't remove tasks when agent disconnects. In fact, you should follow
the similar patter in reconcileKnownSlave here. If an operation is unknown, instead of calling
`removeOfferOperation` directly, we should probably send a `reconcileOfferOperationMessage`
to the agent to ask the RP to generate a status update, and rely on status update handler
to properly handle the resource accounting.
    
    Also realized that we probably should also do the same in the agent code. Instead of directly
calling removeOfferOperation and addOfferOperation, send a `RECONCILE` message to RP to asking
the RP to generate a status udpate. If it's unknown to the RP, RP will send OFFER_OPERATION_DROPPED,
which is terminal.


- Jie Yu


On Nov. 15, 2017, 5:31 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63732/
> -----------------------------------------------------------
> 
> (Updated Nov. 15, 2017, 5:31 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8207
>     https://issues.apache.org/jira/browse/MESOS-8207
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 59a533940736f5cfd5ec31e0ed924f0b2ab13f9c 
> 
> 
> Diff: https://reviews.apache.org/r/63732/diff/2/
> 
> 
> Testing
> -------
> 
> `make check`, still need to implement dedicated tests.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message