mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Sekretenko <asekrete...@mesosphere.io>
Subject Re: Review Request 71750: Improved performance of v1 operator API GetAgents call.
Date Thu, 21 Nov 2019 18:38:51 GMT


> On Nov. 19, 2019, 6:12 p.m., Andrei Sekretenko wrote:
> > src/master/http.cpp
> > Lines 2242 (patched)
> > <https://reviews.apache.org/r/71750/diff/1/?file=2172061#file2172061line2242>
> >
> >     Did you consider allocating the `agent` temporaries outside of the loops? In
my experience, protobuf construction used to be much more expensive than modification.
> >     
> >     (Tried this, observed a ~10% speedup for `v1 'master::call::GetState' application/x-protobuf`
in my setup.)
> 
> Benjamin Mahler wrote:
>     Hm.. that's interesting, could certainly look into this for the GetAgents and GetFrameworks
code which both use temporaries. The temporaries aren't necessary though, we could eliminate
the need to copy entirely, but allocating them outside the loops seems like an easier win.
>     
>     Unfortunately the most expensive part in practice (non-pending tasks) are not using
temporaries like this. I'll look into a patch at the end of this chain to allocate outside
the loops.
> 
> Benjamin Mahler wrote:
>     > (Tried this, observed a ~10% speedup for v1 'master::call::GetState' application/x-protobuf
in my setup.)
>     
>     Curious to see your setup (did you just use a lot of agents instead of a lot of tasks?)
and your diff.
> 
> Benjamin Mahler wrote:
>     Also did you run with optimizations on? How many runs did you do? Because there's
quite a high variance. Just did 10 runs of AgentFrameworkTaskCountContentType/MasterStateQuery_BENCHMARK_Test.GetState/3:
>     
>     before: median 5.29 from [4.41, 4.51, 5.07, 5.10, 5.27, 5.32, 5.49, 5.51, 5.70, 6.46]
>     after:  median 5.11 from [4.68, 4.79, 4.80, 4.84, 5.01, 5.21, 5.25, 5.42, 6.42, 7.43]
>     
>     So based on medians it's 96.6% of the time without the loop change?
>     
>     With the following diff applied to this chain to produce 'after':
>     
>     ```
>     diff --git a/src/master/http.cpp b/src/master/http.cpp
>     index 0ca25b6d7..c77788cdf 100644
>     --- a/src/master/http.cpp
>     +++ b/src/master/http.cpp
>     @@ -1480,6 +1480,7 @@ string Master::Http::serializeGetFrameworks(
>        // expect a large number of pending tasks, we currently don't
>        // bother with the more efficient approach.
>      
>     +  mesos::master::Response::GetFrameworks::Framework f;
>        foreachvalue (const Framework* framework,
>                      master->frameworks.registered) {
>          // Skip unauthorized frameworks.
>     @@ -1487,7 +1488,7 @@ string Master::Http::serializeGetFrameworks(
>            continue;
>          }
>      
>     -    mesos::master::Response::GetFrameworks::Framework f = model(*framework);
>     +    f = model(*framework);
>          writer.WriteTag(
>              WireFormatLite::MakeTag(
>                  mesos::master::Response::GetFrameworks
>     @@ -1504,7 +1505,7 @@ string Master::Http::serializeGetFrameworks(
>            continue;
>          }
>      
>     -    mesos::master::Response::GetFrameworks::Framework f = model(*framework);
>     +    f = model(*framework);
>          writer.WriteTag(
>              WireFormatLite::MakeTag(
>                  mesos::master::Response::GetFrameworks
>     @@ -2970,15 +2971,15 @@ string Master::Http::serializeGetAgents(
>        google::protobuf::io::StringOutputStream stream(&output);
>        google::protobuf::io::CodedOutputStream writer(&stream);
>      
>     +  mesos::master::Response::GetAgents::Agent agent;
>        foreachvalue (const Slave* slave, master->slaves.registered) {
>          // TODO(bmahler): Consider not constructing the temporary
>          // agent object and instead serialize directly.
>     -    mesos::master::Response::GetAgents::Agent agent =
>     -      protobuf::master::event::createAgentResponse(
>     -          *slave,
>     -          master->slaves.draining.get(slave->id),
>     -          master->slaves.deactivated.contains(slave->id),
>     -          approvers);
>     +    agent = protobuf::master::event::createAgentResponse(
>     +        *slave,
>     +        master->slaves.draining.get(slave->id),
>     +        master->slaves.deactivated.contains(slave->id),
>     +        approvers);
>      
>          writer.WriteTag(
>              WireFormatLite::MakeTag(
>     @@ -2988,14 +2989,15 @@ string Master::Http::serializeGetAgents(
>          agent.SerializeToCodedStream(&writer);
>        }
>      
>     +  SlaveInfo recoveredAgent;
>        foreachvalue (const SlaveInfo& slaveInfo, master->slaves.recovered) {
>          // TODO(bmahler): Consider not constructing the temporary
>          // SlaveInfo object and instead serialize directly.
>     -    SlaveInfo agent = slaveInfo;
>     -    agent.clear_resources();
>     +    recoveredAgent = slaveInfo;
>     +    recoveredAgent.clear_resources();
>          foreach (const Resource& resource, slaveInfo.resources()) {
>            if (approvers->approved<VIEW_ROLE>(resource)) {
>     -        *agent.add_resources() = resource;
>     +        *recoveredAgent.add_resources() = resource;
>            }
>          }
>      
>     @@ -3003,8 +3005,8 @@ string Master::Http::serializeGetAgents(
>              WireFormatLite::MakeTag(
>                  mesos::master::Response::GetAgents::kRecoveredAgentsFieldNumber,
>                  WireFormatLite::WIRETYPE_LENGTH_DELIMITED));
>     -    writer.WriteVarint32(agent.ByteSizeLong());
>     -    agent.SerializeToCodedStream(&writer);
>     +    writer.WriteVarint32(recoveredAgent.ByteSizeLong());
>     +    recoveredAgent.SerializeToCodedStream(&writer);
>        }
>      
>        // While an explicit Trim() isn't necessary (since the coded
>     
>     ```

That was a release build, but a short run (3x) with a similar patch applied only to `getAgents`
on top of r71750. I didn't realize that this benchamrk is THAT noisy, and somehow got "lucky".
Repeated that with two longer interleaving runs (a/b/a/b) - obtained 3% difference by any
conceivable metric (median/mean/Hodghes-Lehmann difference/...)

And these patches seem to increase the relative noise: running `GetState/3` with the whole
chain applied leaves me with a standard deviation of 10%. So I'd not bother measuring the
potential impact of this.


- Andrei


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71750/#review218678
-----------------------------------------------------------


On Nov. 12, 2019, 12:03 a.m., Benjamin Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71750/
> -----------------------------------------------------------
> 
> (Updated Nov. 12, 2019, 12:03 a.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko, Greg Mann, and Meng Zhu.
> 
> 
> Bugs: MESOS-10026
>     https://issues.apache.org/jira/browse/MESOS-10026
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This updates the handling to serialize directly to protobuf or json
> from the in-memory v0 state, bypassing expensive intermediate
> serialization / de-serialization / object construction / object
> destruction.
> 
> This initial patch shows the approach that will be used for the
> other expensive calls. Note that this type of manual writing is
> more brittle and complex, but it can be mostly eliminated if we
> keep an up-to-date v1 GetState in memory in the future.
> 
> When this approach is applied fully to GetState, it leads to the
> following improvement:
> 
> Before:
> v0 '/state' response took 6.55 secs
> v1 'GetState' application/x-protobuf response took 24.08 secs
> v1 'GetState' application/json response took 22.76 secs
> 
> After:
> v0 '/state' response took 8.00 secs
> v1 'GetState' application/x-protobuf response took 5.73 secs
> v1 'GetState' application/json response took 9.62 secs
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 60765c9b9d6903f6ed94fa8c614055698caad0da 
>   src/master/master.hpp dc45028d2ecfb61bf9ea82d90d2393af648a6023 
> 
> 
> Diff: https://reviews.apache.org/r/71750/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message