mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jie Yu" <yujie....@gmail.com>
Subject Re: Review Request 31505: Add flow classifiers for fq_codel on egress
Date Wed, 27 May 2015 18:18:54 GMT


> On May 22, 2015, 10:09 p.m., Jie Yu wrote:
> > Two high level comments:
> > 
> > 1) We need a slave flag to control if turning on this feature or not
> > 2) Do you think we should at least add some unit test for this? For example, verify
that the filters are properly set up?
> 
> Jie Yu wrote:
>     Ping. Did you see these two comments? Seems that the new diff hasn't resolved these
two issues.
> 
> Cong Wang wrote:
>     Yes. 1) I think we all agree on not adding a new flag in case of inconsistent behavior;
2) We already have a test case, FqCodelClassifier.
> 
> Jie Yu wrote:
>     For 1), I think we need a flag so that we can slowly roll out this feature. Otherwise,
if there's a bug, all slave needs to be rebooted.
>     
>     For 2), I mean we need to add an integration test in port_mapping_tests.cpp to exercise
the code path in the isolator. The test you mentioned is a unit test for the routing library
stuff.
> 
> Cong Wang wrote:
>     1) That depends on what is our fallback solution in case of bugs, potentially every
line of new code needs a flag if not reverting to an old binary. :) So generally speaking,
what is our solution when we hit some bug during deloyment? If it is reverting to the old
version, then adding a flag just for deloyment is useless. Or if it is turning off some flag,
then as I said every new code would potentially need a flag and some new code can't have a
flag, that means this way doesn't scale.
>     
>     2) In src/tests/port_mapping_tests.cpp, it calls PortMappingIsolatorProcess::create()
which should cover these new code too, no?

For 1), if the fallback solution is to just to restart the slave, then we usually don't require
a new flag. However, in this case, we will leave filters on host eth0, and a simple restart
of the slave won't clean up those filters.

For 2), Yes, we exercise the code of creating those filters, but do we have ASSERT or EXPECT
to check if those filters are properly setup? AFAICT, even if the filters are not properly
setup, the existing tests will still finish, no?


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31505/#review84998
-----------------------------------------------------------


On May 26, 2015, 8:41 p.m., Cong Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31505/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 8:41 p.m.)
> 
> 
> Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-2422
>     https://issues.apache.org/jira/browse/MESOS-2422
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we do nothing on the host egress side. By default, kernel uses its own hash
function to classify the packets to different TX queues (if the hardware interface supports
multiqueue). So packets coming out of different containers could end up queueing in the same
TX queue, in this case we saw buffer bloat on some TX queue caused packet drops.
> 
> We need to isolation the egress traffic so that containers will not have interference
with each other. The number of hardware TX queues is limited by hardware interface, usually
not enough to map our container in 1:1 way, therefore we need some software solution. We choose
fq_codel and use tc filters to classify packets coming out of different containers to different
fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer
bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this
is however (almost) beyond what we can control, we have to rely on the kernel behavior.
> 
> TODO: get some performance numbers
> 
> 
> Diffs
> -----
> 
>   src/linux/routing/filter/basic.cpp 755be7de5c43c2f3b336af6cb47af9c97a21218c 
>   src/linux/routing/filter/filter.hpp 024582cf8274da2e5bcccc4ebd00fcf83d6930e2 
>   src/linux/routing/filter/icmp.cpp 60703e727ecab9557911ab0562773771db51db90 
>   src/linux/routing/filter/ip.hpp 62bb5f89d35fc9622fca303aa01a789fcfbf2f76 
>   src/linux/routing/filter/ip.cpp 0d25e2d4f0ac4c3f151ad40e796dbdbc54dba9fa 
>   src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e

>   src/slave/containerizer/isolators/network/port_mapping.cpp 49e983edab598e2ac487bb488fdd12840a9e7dfc

> 
> Diff: https://reviews.apache.org/r/31505/diff/
> 
> 
> Testing
> -------
> 
> Manually start two mesos containers with netperf running side.
> 
> 
> Thanks,
> 
> Cong Wang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message