mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 51317: Documented a bug with the use of `pendingTasks` in the master.
Date Wed, 24 Aug 2016 01:28:29 GMT


> On Aug. 23, 2016, 8:45 a.m., Guangya Liu wrote:
> > src/master/master.cpp, line 3362
> > <https://reviews.apache.org/r/51317/diff/1/?file=1481628#file1481628line3362>
> >
> >     The current logic is that the task with duplicate ID will be ignored and will
not be launched, so which task will trigger and where do we send out the TASK_ERROR message?
> >     
> >     I saw that the TASK_ERROR message will only be sent out if the task is killed
in `pendingTasks` and invalid.

Hm.. I'm not sure I follow your comment. Maybe this will clarify things:

There are two reasons a task will not be contained in `pendingTasks` when we arrive in `_accept()`:

1. The task was killed while in `pendingTasks`.
2. The task had a duplicate TaskID with another task in `pendingTasks`.

So if a task is missing in `pendingTasks` we still have to validate it to catch (2). This
means we could have sent TASK_ERROR after TASK_KILLED if the task was killed while in `pendingTasks`
but is invalid or unauthorized.

There are a few ways to fix this, but they're all a bit clunky so I decided on just documenting
this a bit better for now.


> On Aug. 23, 2016, 8:45 a.m., Guangya Liu wrote:
> > src/master/master.cpp, line 3364
> > <https://reviews.apache.org/r/51317/diff/1/?file=1481628#file1481628line3364>
> >
> >     Just a question here: Why do you want to distinguish the difference between
duplicate TaskID and getting killed while pending? What is the relatinonship of those two?

See my comment above about the 2 cases, hopefully that clarifies things?


> On Aug. 23, 2016, 8:45 a.m., Guangya Liu wrote:
> > src/master/master.cpp, line 3366
> > <https://reviews.apache.org/r/51317/diff/1/?file=1481628#file1481628line3366>
> >
> >     I saw that this only occur if the task authorization failed for now, shall we
highlight this in the comments here?

Hm.. it can occur if authorization fails or if validation fails, why did you mention only
authorization failure?

We could update the code to avoid sending TASK_ERROR REASON_UNAUTHORIZED when the task is
not in the `pendingTasks` map, since we know that it is either killed or invalid, but this
seemed not worth it since it only half-way fixes the issue.


> On Aug. 23, 2016, 8:45 a.m., Guangya Liu wrote:
> > src/master/master.cpp, lines 3765-3767
> > <https://reviews.apache.org/r/51317/diff/1/?file=1481628#file1481628line3765>
> >
> >     What about put more detail around line 3807 for `invalid` task case by adding
a `TODO`?
> >     
> >     ```
> >     // TODO(bmahler): Validate the task. We may send TASK_ERROR
> >     // after a TASK_KILLED if a task was killed (removed from
> >     // `pendingTasks`) *and* the task is invalid here.
> >     ```

Hm.. this looks like a duplicate of the TODO here? Are you suggesting moving it down? Since
it covers both authorization and validation I opted to keep it up higher.


> On Aug. 23, 2016, 8:45 a.m., Guangya Liu wrote:
> > src/master/master.cpp, line 3767
> > <https://reviews.apache.org/r/51317/diff/1/?file=1481628#file1481628line3767>
> >
> >     Do we need to put `unauthorized` in `TODO`? I saw it aws already handled in
line 3774 and 3793.

Yes, we may send TASK_ERROR for an unauthorized task after a TASK_KILLED was already sent,
much like the validation case.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51317/#review146492
-----------------------------------------------------------


On Aug. 23, 2016, 5:11 a.m., Benjamin Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51317/
> -----------------------------------------------------------
> 
> (Updated Aug. 23, 2016, 5:11 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The use of `pendingTasks` cannot distinguish between a duplicate
> TaskID and a task that has been killed while pending. This means
> that if an invalid or unauthorized task is killed while pending,
> TASK_KILLED is sent, and once in Master::_accept, we will also
> send TASK_ERROR.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp d94a8510c4cee9c010706f79caf27ef4a10b41a8 
> 
> Diff: https://reviews.apache.org/r/51317/diff/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message