mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Qian Zhang" <zhang...@cn.ibm.com>
Subject Re: Review Request 42355: Removed the timeout from the filter.
Date Thu, 21 Jan 2016 03:49:40 GMT


> On Jan. 20, 2016, 9:49 a.m., Qian Zhang wrote:
> > One question: Say allocation interval is 10s, at the time 5s, framework sets a filter
with 3s, so with this patch, we will expire the filter 10s (max(10, 3)) later, i.e., at the
time 15s. Then at the time of 10s (the next allocation cycle), allocator will not allocate
any resources to the framework due to the 10s filter which is good and is the issue that we
intend to fix. And then in the time 12, a new slave joins, at this moment, allocator will
not allocate any resources to the framework too due to the 10s filter, but maybe the new slave
has the resources needed by the framework. So my question is whether this is a reasonable
behavior, do we filter too much for the framework in this case?
> 
> Qian Zhang wrote:
>     In this case, do we need to cancel the filter once it has taken effect for one time
and last for long enough time?
> 
> Alexander Rukletsov wrote:
>     Your concern is valid and we indeed may filter too much. I wonder how probable is
your scenario in real-world setups.
>     
>     Our intention is "filter for X seconds but at least for one allocation touching filtered
agent". What we have here is more of a hack and I'd rather remove `std::max()` in favor of
a proper fix, which is allocating on resource recovery (MESOS-3078). Does a TODO I left in
the code explan it?
> 
> Alexander Rukletsov wrote:
>     To clarify Qian's concern and my answer: filters are set per-agent basis, so a new
agent joining the cluster won't be filtered by any existing filters. However, we indeed may
filter longer than asked by a framework, but I think being precise about the filter duration
is less important than making the refused resources available for other frameworks.

Yes, I agree it does make sense, thanks for the clarification!


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42355/#review115315
-----------------------------------------------------------


On Jan. 20, 2016, 7:32 a.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42355/
> -----------------------------------------------------------
> 
> (Updated Jan. 20, 2016, 7:32 a.m.)
> 
> 
> Review request for mesos, Ben Mahler and Joris Van Remoortere.
> 
> 
> Bugs: MESOS-4302
>     https://issues.apache.org/jira/browse/MESOS-4302
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Without the timeout, we rely on filter expiration only. This guarantees
> that filter removal is scheduled after `allocate()` if the allocator is
> backlogged given default parameters are used. Additionally we ensure the
> filter timeout is at least as big as the allocation interval.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.cpp 48acde69b1a2f305b568a7e322a58708063dd30a

>   src/tests/hierarchical_allocator_tests.cpp 9362dd306497ba01e0f387c3862456cdcac6f863

> 
> Diff: https://reviews.apache.org/r/42355/diff/
> 
> 
> Testing
> -------
> 
> On Mac OS 10.10.4:
> 
> `make check`
> 
> `GTEST_FILTER="HierarchicalAllocatorTest.FilterTimeout" ./bin/mesos-tests.sh --gtest_repeat=100
--gtest_break_on_failure` passes with the patch and fails without.
> 
> `GTEST_FILTER="HierarchicalAllocatorTest.*" ./bin/mesos-tests.sh --gtest_repeat=100 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message