mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neil Conway <neil.con...@gmail.com>
Subject Re: Review Request 43635: Changed scalar resources to use fixed-point internally.
Date Fri, 19 Feb 2016 22:29:50 GMT


> On Feb. 17, 2016, 12:30 a.m., Klaus Ma wrote:
> > src/common/values.cpp, line 59
> > <https://reviews.apache.org/r/43635/diff/1/?file=1252182#file1252182line59>
> >
> >     Replace 1000 with const integer

I refactored the code to use functions rather than repeating `1000` in a bunch of places,
which I think addresses this concern.


> On Feb. 17, 2016, 12:30 a.m., Klaus Ma wrote:
> > src/common/values.cpp, line 60
> > <https://reviews.apache.org/r/43635/diff/1/?file=1252182#file1252182line60>
> >
> >     Regarnding `lround`, it maybe overcommit resources, if framework keep launching
task with smaller CPU usage; I'd suggest to discard the additional precission directly (e.g.
1.2345 -> 1.234). If there's any idle resources because of discarding, oversubscription
will help.
> >     
> >     And we need to reject operation request with small resources which is handled
in MESOS-1807.
> 
> Neil Conway wrote:
>     Truncation rather than rounding might also be reasonable behavior; I'm curious what
other people think.

Chatting with Joris, our feeling was that it is hard to do something that is right for everyone,
but that the risk of overcommitting resources due to rounding is probably minimal. Overall
rounding probably seems like a better policy than either taking the floor or the ceiling.


- Neil


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/#review119382
-----------------------------------------------------------


On Feb. 19, 2016, 10:27 p.m., Neil Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43635/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2016, 10:27 p.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Michael Park.
> 
> 
> Bugs: MESOS-4687
>     https://issues.apache.org/jira/browse/MESOS-4687
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Scalar resource values are represented using floating point. As a result, users
> could see unexpected results when accepting offers and making reservations for
> fractional resources: values like "0.1" cannot be precisely represented using
> standard floating point, and the resource values returned to frameworks might
> contain an unpredictable amount of roundoff error.
> 
> This commit adjusts the master to use fixed-point when doing internal
> computations on scalar resource values. The fixed-point format only supports
> three decimal digits of precision: that is, fractional resource values like
> "0.001" will be supported, but "0.0001" will not be.
> 
> 
> Diffs
> -----
> 
>   docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
>   docs/upgrades.md 4f30d725c6ed28c09a1c5528fd4193c3f06b2d93 
>   include/mesos/mesos.proto 636550f255a122d7f714dbd58f587bea8221b461 
>   include/mesos/v1/mesos.proto 1d5af88a343fe9d81688bb3e83aa997ccba7d030 
>   src/common/resources.cpp 5d731870542166cec11f9956ccdf16207b2d22cc 
>   src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
>   src/master/allocator/mesos/hierarchical.cpp 5ef29f26ec8071f79c2f4f78dbe2bb0a613cc92d

>   src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
>   src/v1/resources.cpp 207eb61d6a6d03d314539d42751cac65fcffa9af 
>   src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 
> 
> Diff: https://reviews.apache.org/r/43635/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Manually verified that some of the floating point oddities in https://issues.apache.org/jira/browse/MESOS-4071
do not occur when this patch is applied, although I wasn't able to reproduce the crash described
in that ticket.
> 
> REVIEW NOTES:
> 
> * We don't currently emit a warning when discarding additional digits of precision from
input scalar resource values. Should we? That would require identifying all the points where
a resource value is seemed to be "user-provided", and also runs the risk of generating a ton
of log messages when an old framework is used.
> * Similarly, if the user gives us a resource value and we don't do anything to it, we
won't discard any additional precision that appears in the value -- the precision only gets
discarded when we apply an operator like `+` or `-`. Unclear if we should trim additional
precision from all scalar resource values more aggressively.
> 
> PERFORMANCE:
> 
> This is the performance of the `DeclineOffers` benchmark WITHOUT this RR applied (optimized
build on my laptop):
> 
> ```
> [ RUN      ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.192425secs to make 200 offers
> round 1 allocate took 2.221243secs to make 200 offers
> round 2 allocate took 2.236314secs to make 200 offers
> round 3 allocate took 2.224045secs to make 200 offers
> round 4 allocate took 2.232822secs to make 200 offers
> round 5 allocate took 2.264807secs to make 200 offers
> round 6 allocate took 2.224853secs to make 200 offers
> round 7 allocate took 2.224829secs to make 200 offers
> round 8 allocate took 2.24862secs to make 200 offers
> round 9 allocate took 2.2556secs to make 200 offers
> round 10 allocate took 2.256616secs to make 200 offers
> ```
> 
> And after applying this RR:
> 
> ```
> [ RUN      ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.267919secs to make 200 offers
> round 1 allocate took 2.202843secs to make 200 offers
> round 2 allocate took 2.20426secs to make 200 offers
> round 3 allocate took 2.263887secs to make 200 offers
> round 4 allocate took 2.266237secs to make 200 offers
> round 5 allocate took 2.276957secs to make 200 offers
> round 6 allocate took 2.291821secs to make 200 offers
> round 7 allocate took 2.261839secs to make 200 offers
> round 8 allocate took 2.325696secs to make 200 offers
> round 9 allocate took 2.310469secs to make 200 offers
> round 10 allocate took 2.21802secs to make 200 offers
> ```
> 
> Which suggests to me that any performance hit is pretty minimal.
> 
> 
> Thanks,
> 
> Neil Conway
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message