mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian Zhang <zhq527...@gmail.com>
Subject Re: Review Request 68018: Added `SeccompFilter` class.
Date Wed, 16 Jan 2019 03:16:58 GMT


> On Jan. 14, 2019, 4:31 p.m., Qian Zhang wrote:
> > src/linux/seccomp/seccomp.cpp
> > Lines 137-139 (patched)
> > <https://reviews.apache.org/r/68018/diff/14/?file=2117423#file2117423line137>
> >
> >     Will this affect the task run by Mesos? E.g., a task may want to run a program
which has `set-user-ID` bit.
> 
> Andrei Budnik wrote:
>     Yes, `no_new_privs` flag affects the task that wants to run a program which has `set-user-ID`
bit.
>     E.g., launching a `ping -c 3 8.8.8.8` fails with seccomp. You'll see a message in
executor logs:
>     ```
>     I0114 07:19:21.887670 13264 executor.cpp:706] Forked command at 13276
>     ping: socket: Operation not permitted
>     I0114 07:19:22.055352 13263 executor.cpp:1007] Command exited with status 2 (pid:
13276)
>     ```
>     
>     Also, see my previous comment https://reviews.apache.org/r/68018/#comment297000
> 
> Qian Zhang wrote:
>     In your previous comment, you mentioned that Docker daemon launches its containers
with `SCMP_FLTATR_CTL_NNP` flag set by default, does that mean any containers launched by
Docker daemon cannot run program which has set-user-ID bit?
>     
>     This seems unfortunate since it might break some use cases or applications that we
already supported. And can you please elaborate a bit about `"Disabling SCMP_FLTATR_CTL_NNP
flag for a root means that Seccomp filter can be reverted anytime"`? How will the Seccomp
filter be reverted? Do you mean the task launched by Mesos can call libseccomp API to revert
the filter itself?
>     
>     If we have to live with this limitation (i.e., cannot run program which has set-user-ID
bit), then we need to highlight it in the document.
> 
> Gilbert Song wrote:
>     Seems like we asked the same question.
>     
>     Andrei, let align on this thread? :/thanks:)
> 
> Andrei Budnik wrote:
>     >does that mean any containers launched by Docker daemon cannot run program which
has set-user-ID bit?
>     
>     Docker daemon can not be used to run arbitrary programs (in opposity to Mesos c'zer).
So, when one launches a Docker container, Docker daemon launches a container process with
`NNP` bit set, which means that a container process (and it descendants) can't gain more previleges
**outside** its container. Mesos containerizer has exactly the same behaviour:
>     
>     1) Run system-provided `/bin/ping` (*outside* its container) as a non-privileged
user:
>     ```
>     $ ./src/mesos-execute --master="`hostname`:5050" --name="a" --containerizer=mesos
--command="ping -c 3 8.8.8.8"
>     ...
>     Received status update TASK_FAILED for task 'a'
>       message: 'Command exited with status 2'
>       source: SOURCE_EXECUTOR
>     ```
>     
>     2) Run system-provided `/bin/ping` (*outside* its container) as a privileged user:
>     ```
>     sudo ./src/mesos-execute --master="`hostname`:5050" --name="a" --containerizer=mesos
--command="ping -c 3 8.8.8.8"
>     ...
>     Received status update TASK_FINISHED for task 'a'
>       message: 'Command exited with status 0'
>       source: SOURCE_EXECUTOR
>     ```
>     
>     3) Run container image provided `ping` (*inside* its image/container) as a non-privileged
user:
>     ```
>     $ ./src/mesos-execute --master="`hostname`:5050" --name="a" --containerizer=mesos
--docker_image="fedora:latest" --command="yum -y install iputils;ping -c 3 8.8.8.8"
>     ...
>     Received status update TASK_FINISHED for task 'a'
>       message: 'Command exited with status 0'
>       source: SOURCE_EXECUTOR
>     
>     $ cat /path/to/container/stdout
>     ...
>     PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
>     64 bytes from 8.8.8.8: icmp_seq=1 ttl=122 time=13.9 ms
>     ```
>     
>     > This seems unfortunate since it might break some use cases or applications that
we already supported.
>     
>     It's very unlikely that the agent launches tasks, whose binary has `setuid`/`setgid`
bit specified. Because... what the point?
>     I doubt if any of the following programs a launched as a Mesos container:
>     ```
>     $ sudo find /bin/ -perm -u=s -type f 2>/dev/null
>     /bin/newgrp
>     /bin/pkexec
>     /bin/mount
>     /bin/umount
>     /bin/newuidmap
>     /bin/newgidmap
>     /bin/sudo
>     /bin/crontab
>     /bin/su
>     /bin/gpasswd
>     /bin/chage
>     /bin/passwd
>     /bin/staprun
>     /bin/fusermount
>     /bin/fusermount-glusterfs
>     /bin/chfn
>     /bin/chsh
>     /bin/at
>     ```
>     
>     > And can you please elaborate a bit about "Disabling SCMP_FLTATR_CTL_NNP flag
for a root means that Seccomp filter can be reverted anytime"? How will the Seccomp filter
be reverted? Do you mean the task launched by Mesos can call libseccomp API to revert the
filter itself?
>     
>     Yes, without `NNP` (`no_new_privs`) bit set, a privileged task might call `seccomp`
Linux syscall to install an empty Seccomp filter.

> Run system-provided /bin/ping (outside its container) as a non-privileged user:

As you mentioned in the above comment, this task will fail, but that's **after** your seccomp
patches are applied. Before your seccomp patches are applied (e.g., I am using the latest
code in Mesos master branch), it will succeed:
```
$ ./src/mesos-execute --master=192.168.56.5:5050 --name=test --command="ping -c 3 8.8.8.8"
--checkpoint  
I0116 10:15:02.699398 14271 scheduler.cpp:189] Version: 1.8.0
I0116 10:15:02.977327 14287 scheduler.cpp:355] Using default 'basic' HTTP authenticatee
I0116 10:15:02.979837 14285 scheduler.cpp:538] New master detected at master@192.168.56.5:5050
Subscribed with ID ea9488e1-a171-423f-8eb5-4d70187349fb-0001
Submitted task 'test' to agent '12866186-dc2b-48a9-88ad-f9d951cf8c7f-S0'
Received status update TASK_STARTING for task 'test'
  source: SOURCE_EXECUTOR
Received status update TASK_RUNNING for task 'test'
  source: SOURCE_EXECUTOR
Received status update TASK_FINISHED for task 'test'
  message: 'Command exited with status 0'
  source: SOURCE_EXECUTOR
```
To me, this is kind of feature broken, i.e., some previously supported user cases or applications
will fail after your seccomp patches are applied.

> when one launches a Docker container, Docker daemon launches a container process with
NNP bit set, which means that a container process (and it descendants) can't gain more previleges
outside its container.

This seems not what I found with Docker. I created a Docker image with ping installed and
a non-root user added:
```
FROM ubuntu:18.04

RUN apt-get update && apt-get install -y iputils-ping
RUN adduser --disabled-password --gecos "" stack
```

And then I created a Docker container from that image with the non-root user, and I found
ping worked.
```
docker run --rm -it --user stack ubuntu:stack sh   
$ id 
uid=1000(stack) gid=1000(stack) groups=1000(stack)
$ ls -la /bin/ping 
-rwsr-xr-x. 1 root root 64424 Mar  9  2017 /bin/ping
$ ping 8.8.8.8 
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=116 time=3.25 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=116 time=3.20 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=116 time=3.48 ms
^C
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 3.200/3.312/3.481/0.121 ms
```
So Docker daemon actually can create a container to run the program which has set-user-ID
bit, I am a bit confused what is the impact of `SCMP_FLTATR_CTL_NNP` flag which is set by
Docker daemon for its containers as you mentioned.


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68018/#review211946
-----------------------------------------------------------


On Nov. 8, 2018, 11:24 p.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68018/
> -----------------------------------------------------------
> 
> (Updated Nov. 8, 2018, 11:24 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, James Peach, and Qian Zhang.
> 
> 
> Bugs: MESOS-9034
>     https://issues.apache.org/jira/browse/MESOS-9034
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> `SeccompFilter` class is a wrapper for `libseccomp` API. Its main
> purpose is to provide a translation of the `ContainerSeccompProfile`
> message into calls of `libseccomp` API.
> 
> 
> Diffs
> -----
> 
>   src/CMakeLists.txt a574d449dc26b820cbef7ff0b5e94b42b6fe86cf 
>   src/Makefile.am cd785255fcdf1302a8f9fa358039e5d1f200e132 
>   src/linux/seccomp/seccomp.hpp PRE-CREATION 
>   src/linux/seccomp/seccomp.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68018/diff/16/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message