cf-natali opened a new pull request #355: Handle EBUSY when destroying a cgroup. URL: https://github.com/apache/mesos/pull/355 It's a workaround for kernel bugs https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/cgroup/cgroup.c?id=9c974c77246460fa6a92c18554c3311c8c83c160 and https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/cgroup/cgroup.c?id=c03cd7738a83b13739f00546166969342c8ff014 Fixes MESOS-10107. @abudnik > Does the workaround work reliably after changing the initial delay and retry count to the values taken from libcontainerd (10ms and 5)? Yes, however I chose 1ms and 10 for two reasons: - this possibly yields lower latency - more importantly, while doing an strace I can see that it can take sometimes up to ver 100-200ms for rmdir to succeed: ``` [pid 1965] 13:22:36.021260 rmdir("/sys/fs/cgroup/freezer/mesos/b99efad6-b9eb-43bd-8242-29a2b321dd07") = -1 EBUSY (Périphérique ou ressource occupé) <0.000017> [pid 1965] 13:22:36.022604 rmdir("/sys/fs/cgroup/freezer/mesos/b99efad6-b9eb-43bd-8242-29a2b321dd07") = -1 EBUSY (Périphérique ou ressource occupé) <0.000018> [pid 1965] 13:22:36.024807 rmdir("/sys/fs/cgroup/freezer/mesos/b99efad6-b9eb-43bd-8242-29a2b321dd07") = -1 EBUSY (Périphérique ou ressource occupé) <0.000080> [pid 1965] 13:22:36.029116 rmdir("/sys/fs/cgroup/freezer/mesos/b99efad6-b9eb-43bd-8242-29a2b321dd07") = -1 EBUSY (Périphérique ou ressource occupé) <0.000466> [pid 1965] 13:22:36.037990 rmdir("/sys/fs/cgroup/freezer/mesos/b99efad6-b9eb-43bd-8242-29a2b321dd07") = -1 EBUSY (Périphérique ou ressource occupé) <0.000190> [pid 1965] 13:22:36.054528 rmdir("/sys/fs/cgroup/freezer/mesos/b99efad6-b9eb-43bd-8242-29a2b321dd07") = -1 EBUSY (Périphérique ou ressource occupé) <0.000038> [pid 1965] 13:22:36.086874 rmdir("/sys/fs/cgroup/freezer/mesos/b99efad6-b9eb-43bd-8242-29a2b321dd07") = -1 EBUSY (Périphérique ou ressource occupé) <0.000029> [pid 3225] 13:22:36.127365 +++ killed by SIGKILL +++ [pid 1965] 13:22:36.151151 rmdir("/sys/fs/cgroup/freezer/mesos/b99efad6-b9eb-43bd-8242-29a2b321dd07") = 0 <0.000114> ``` And 10ms with 5 retries only 320ms (10 * 2**5), so I'd rather have a bit more margin. > Should we retry only if `::rmdir()` returns EBUSY errno error? Definitely - I wanted to do that but I'm not sure what's the best way to do it: is there a way to access `errno` from `Try rmdir` or can I just assume that the global `errno` is preserved and access it directly? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services