mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anindya Sinha <anindya_si...@apple.com>
Subject Re: Review Request 48313: Consistency in persistent volumes between master and agent on failure.
Date Tue, 12 Jul 2016 20:25:38 GMT


> On July 12, 2016, 5:29 p.m., Jiang Yan Xu wrote:
> > src/slave/slave.cpp, line 4803
> > <https://reviews.apache.org/r/48313/diff/7/?file=1441668#file1441668line4803>
> >
> >     "or no target resources are present": We are inside the 
> >     
> >     ```
> >     if (resourcesState.get().target.isSome()) {
> >     }
> >     ```
> >     
> >     block, so we are certain that the target exists right? 
> >     
> >     ```
> >     CHECK(os::exists(paths::getResourcesTargetPath(metaDir)));
> >     ```
> >     
> >     instead?

I fixed the comment but did not add the `CHECK()` since although it should never happen, crashing
the agent does not seem necessary especially because we do a `LOG(ERROR)` if `os::rm()` fails
on the target resources file.


- Anindya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48313/#review141912
-----------------------------------------------------------


On July 11, 2016, 9:42 p.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48313/
> -----------------------------------------------------------
> 
> (Updated July 11, 2016, 9:42 p.m.)
> 
> 
> Review request for mesos, Neil Conway and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-5448
>     https://issues.apache.org/jira/browse/MESOS-5448
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Consistency in persistent volumes between master and agent on failure.
> 
> When the agent receives CheckpointedResourcesMessage, we store the
> target checkpoint on disk. On successful create and destroy of
> persistent volumes as a part of handling this messages, we commit
> the checkpoint on the disk, and clear the target checkpoint.
> 
> However, incase of any failure we do not commit the checkpoint to
> disk, and exit the agent. When the agent restarts and there is a
> target checkpoint present on disk which differs from the committed
> checkpoint, we retry to sync the target and committed checkpoint.
> On success, we reregister the agent with the master, but in case it
> fails, we do not commit the checkpoint and the agent exists.
> 
> 
> Diffs
> -----
> 
>   src/slave/paths.hpp 339e539863c678b6ed4d4670d75c7ff4c54daa79 
>   src/slave/paths.cpp 03157f93b1e703006f95ef6d0a30afae375dcdb5 
>   src/slave/slave.hpp 42afa9e2ebe5cf8e35802c8d169f52879d6073ac 
>   src/slave/slave.cpp 02982d542c9e6b5b5f7fc8b3c73db6f5bac01358 
>   src/slave/state.hpp 0de2a4ee4fabaad612c4526166157b001c380bdb 
>   src/slave/state.cpp 9cec0868b1187ed3ccac7f065e8a21c2f52178d9 
> 
> Diff: https://reviews.apache.org/r/48313/diff/
> 
> 
> Testing
> -------
> 
> All tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message