mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian Zhang <zhq527...@gmail.com>
Subject Re: Review Request 68018: Added `SeccompFilter` class.
Date Thu, 17 Jan 2019 03:10:57 GMT


> On Jan. 14, 2019, 4:31 p.m., Qian Zhang wrote:
> > src/linux/seccomp/seccomp.cpp
> > Lines 137-139 (patched)
> > <https://reviews.apache.org/r/68018/diff/14/?file=2117423#file2117423line137>
> >
> >     Will this affect the task run by Mesos? E.g., a task may want to run a program
which has `set-user-ID` bit.
> 
> Andrei Budnik wrote:
>     Yes, `no_new_privs` flag affects the task that wants to run a program which has `set-user-ID`
bit.
>     E.g., launching a `ping -c 3 8.8.8.8` fails with seccomp. You'll see a message in
executor logs:
>     ```
>     I0114 07:19:21.887670 13264 executor.cpp:706] Forked command at 13276
>     ping: socket: Operation not permitted
>     I0114 07:19:22.055352 13263 executor.cpp:1007] Command exited with status 2 (pid:
13276)
>     ```
>     
>     Also, see my previous comment https://reviews.apache.org/r/68018/#comment297000
> 
> Qian Zhang wrote:
>     In your previous comment, you mentioned that Docker daemon launches its containers
with `SCMP_FLTATR_CTL_NNP` flag set by default, does that mean any containers launched by
Docker daemon cannot run program which has set-user-ID bit?
>     
>     This seems unfortunate since it might break some use cases or applications that we
already supported. And can you please elaborate a bit about `"Disabling SCMP_FLTATR_CTL_NNP
flag for a root means that Seccomp filter can be reverted anytime"`? How will the Seccomp
filter be reverted? Do you mean the task launched by Mesos can call libseccomp API to revert
the filter itself?
>     
>     If we have to live with this limitation (i.e., cannot run program which has set-user-ID
bit), then we need to highlight it in the document.
> 
> Gilbert Song wrote:
>     Seems like we asked the same question.
>     
>     Andrei, let align on this thread? :/thanks:)
> 
> Andrei Budnik wrote:
>     >does that mean any containers launched by Docker daemon cannot run program which
has set-user-ID bit?
>     
>     Docker daemon can not be used to run arbitrary programs (in opposity to Mesos c'zer).
So, when one launches a Docker container, Docker daemon launches a container process with
`NNP` bit set, which means that a container process (and it descendants) can't gain more previleges
**outside** its container. Mesos containerizer has exactly the same behaviour:
>     
>     1) Run system-provided `/bin/ping` (*outside* its container) as a non-privileged
user:
>     ```
>     $ ./src/mesos-execute --master="`hostname`:5050" --name="a" --containerizer=mesos
--command="ping -c 3 8.8.8.8"
>     ...
>     Received status update TASK_FAILED for task 'a'
>       message: 'Command exited with status 2'
>       source: SOURCE_EXECUTOR
>     ```
>     
>     2) Run system-provided `/bin/ping` (*outside* its container) as a privileged user:
>     ```
>     sudo ./src/mesos-execute --master="`hostname`:5050" --name="a" --containerizer=mesos
--command="ping -c 3 8.8.8.8"
>     ...
>     Received status update TASK_FINISHED for task 'a'
>       message: 'Command exited with status 0'
>       source: SOURCE_EXECUTOR
>     ```
>     
>     3) Run container image provided `ping` (*inside* its image/container) as a non-privileged
user:
>     ```
>     $ ./src/mesos-execute --master="`hostname`:5050" --name="a" --containerizer=mesos
--docker_image="fedora:latest" --command="yum -y install iputils;ping -c 3 8.8.8.8"
>     ...
>     Received status update TASK_FINISHED for task 'a'
>       message: 'Command exited with status 0'
>       source: SOURCE_EXECUTOR
>     
>     $ cat /path/to/container/stdout
>     ...
>     PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
>     64 bytes from 8.8.8.8: icmp_seq=1 ttl=122 time=13.9 ms
>     ```
>     
>     > This seems unfortunate since it might break some use cases or applications that
we already supported.
>     
>     It's very unlikely that the agent launches tasks, whose binary has `setuid`/`setgid`
bit specified. Because... what the point?
>     I doubt if any of the following programs a launched as a Mesos container:
>     ```
>     $ sudo find /bin/ -perm -u=s -type f 2>/dev/null
>     /bin/newgrp
>     /bin/pkexec
>     /bin/mount
>     /bin/umount
>     /bin/newuidmap
>     /bin/newgidmap
>     /bin/sudo
>     /bin/crontab
>     /bin/su
>     /bin/gpasswd
>     /bin/chage
>     /bin/passwd
>     /bin/staprun
>     /bin/fusermount
>     /bin/fusermount-glusterfs
>     /bin/chfn
>     /bin/chsh
>     /bin/at
>     ```
>     
>     > And can you please elaborate a bit about "Disabling SCMP_FLTATR_CTL_NNP flag
for a root means that Seccomp filter can be reverted anytime"? How will the Seccomp filter
be reverted? Do you mean the task launched by Mesos can call libseccomp API to revert the
filter itself?
>     
>     Yes, without `NNP` (`no_new_privs`) bit set, a privileged task might call `seccomp`
Linux syscall to install an empty Seccomp filter.
> 
> Qian Zhang wrote:
>     > Run system-provided /bin/ping (outside its container) as a non-privileged user:
>     
>     As you mentioned in the above comment, this task will fail, but that's **after**
your seccomp patches are applied. Before your seccomp patches are applied (e.g., I am using
the latest code in Mesos master branch), it will succeed:
>     ```
>     $ ./src/mesos-execute --master=192.168.56.5:5050 --name=test --command="ping -c 3
8.8.8.8" --checkpoint  
>     I0116 10:15:02.699398 14271 scheduler.cpp:189] Version: 1.8.0
>     I0116 10:15:02.977327 14287 scheduler.cpp:355] Using default 'basic' HTTP authenticatee
>     I0116 10:15:02.979837 14285 scheduler.cpp:538] New master detected at master@192.168.56.5:5050
>     Subscribed with ID ea9488e1-a171-423f-8eb5-4d70187349fb-0001
>     Submitted task 'test' to agent '12866186-dc2b-48a9-88ad-f9d951cf8c7f-S0'
>     Received status update TASK_STARTING for task 'test'
>       source: SOURCE_EXECUTOR
>     Received status update TASK_RUNNING for task 'test'
>       source: SOURCE_EXECUTOR
>     Received status update TASK_FINISHED for task 'test'
>       message: 'Command exited with status 0'
>       source: SOURCE_EXECUTOR
>     ```
>     To me, this is kind of feature broken, i.e., some previously supported user cases
or applications will fail after your seccomp patches are applied.
>     
>     > when one launches a Docker container, Docker daemon launches a container process
with NNP bit set, which means that a container process (and it descendants) can't gain more
previleges outside its container.
>     
>     This seems not what I found with Docker. I created a Docker image with ping installed
and a non-root user added:
>     ```
>     FROM ubuntu:18.04
>     
>     RUN apt-get update && apt-get install -y iputils-ping
>     RUN adduser --disabled-password --gecos "" stack
>     ```
>     
>     And then I created a Docker container from that image with the non-root user, and
I found ping worked.
>     ```
>     docker run --rm -it --user stack ubuntu:stack sh   
>     $ id 
>     uid=1000(stack) gid=1000(stack) groups=1000(stack)
>     $ ls -la /bin/ping 
>     -rwsr-xr-x. 1 root root 64424 Mar  9  2017 /bin/ping
>     $ ping 8.8.8.8 
>     PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
>     64 bytes from 8.8.8.8: icmp_seq=1 ttl=116 time=3.25 ms
>     64 bytes from 8.8.8.8: icmp_seq=2 ttl=116 time=3.20 ms
>     64 bytes from 8.8.8.8: icmp_seq=3 ttl=116 time=3.48 ms
>     ^C
>     --- 8.8.8.8 ping statistics ---
>     3 packets transmitted, 3 received, 0% packet loss, time 2002ms
>     rtt min/avg/max/mdev = 3.200/3.312/3.481/0.121 ms
>     ```
>     So Docker daemon actually can create a container to run the program which has set-user-ID
bit, I am a bit confused what is the impact of `SCMP_FLTATR_CTL_NNP` flag which is set by
Docker daemon for its containers as you mentioned.
> 
> Andrei Budnik wrote:
>     The example you have provided with Docker daemon is identical to the 3rd case from
my previous comment:
>     ```
>     $ ./src/mesos-execute --master="`hostname`:5050" --name="a" --containerizer=mesos
--docker_image="fedora:latest" --command="yum -y install iputils;ping -c 3 8.8.8.8"
>     ```
>     We behave in this case exactly as Docker.
>     
>     At the same time first two cases are not supported by Docker, but supported by Mesos
containerizer. Hence, the difference in behaviour.
>     
>     Anyway, we need to set `NNP` bit both for a non-privileged user (otherwise, we have
no permissions to install Seccomp filter - more details in seccomp man page) and for privileged
user (otherwise, it does not make sense to install a Seccomp filter as it can be easily reverted
later).
> 
> Andrei Budnik wrote:
>     I will highlight this nuance in Seccomp documentation.
> 
> Andrei Budnik wrote:
>     Added a note in the Seccomp doc: https://reviews.apache.org/r/69493/diff/4-5/

> The example you have provided with Docker daemon is identical to the 3rd case from my
previous comment:

I think they are different, the `ping` binary I installed with the `ubuntu` image has the
set-user-ID bit, but the `ping` binary you installed with the `fedora` image has **no** set-user-ID
bit. So my example proves Docker daemon actually can create a container with a non-root user
to run a program which has set-user-ID bit. Can you please try your 3rd case with the `ubunut`
image? If it fails, then I think that's not acceptable since the same use case can be supported
by Docker but not by us.

> Added a note in the Seccomp doc: https://reviews.apache.org/r/69493/diff/4-5/

I see you added the statement below in the doc:
```
So, when a framework wants to launch an OS-provided `ping` task as a non-privileged user,
the task will fail.
```
My concern is, when a framework wants to launch an image-provided (e.g., ubuntu image) `ping`
task as non-privileged user, will the task fail too? And why do we need to care about OS-provided
and image-provided? I think the point should be whether the binary (no matter it is OS-provided
or image-provided) that the task will execute has set-user-ID bit or not, right?

> So, when one launches a Docker container, Docker daemon launches a container process
with NNP bit set

This seems not what I found with Docker:
```
$ docker run --rm --user operator alpine sleep 1000
$ ps -ef | grep sleep 
stack    25409 23826  0 10:49 pts/0    00:00:00 docker run --rm --user operator alpine sleep
1000
11       25478 25455  0 10:49 ?        00:00:00 sleep 1000
$ cat /proc/25478/status | grep NoNewPrivs
NoNewPrivs:     0
```
So as you see, the NNP bit is **not** set for the container process. I think it will only
be set when one specifies `--security-opt="no-new-privileges:true"` when launching a Docker
container.


> we need to set NNP bit both for a non-privileged user (otherwise, we have no permissions
to install Seccomp filter - more details in seccomp man page)

Can you please elaborate a bit why Seccomp filter cannot be installed for a non-privileged
user if NNP bit is not set? That seems not true for Docker, Docker daemon can install the
Seccomp filter defined in the default Seccomp profile without NNP bit set. I can create a
Docker container successfully with the command like `"docker run --rm -it --user operator
--security-opt seccomp=/home/stack/workspace/mesos/build/default.json --security-opt="no-new-privileges:false"
alpine sh"`.


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68018/#review211946
-----------------------------------------------------------


On Nov. 8, 2018, 11:24 p.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68018/
> -----------------------------------------------------------
> 
> (Updated Nov. 8, 2018, 11:24 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, James Peach, and Qian Zhang.
> 
> 
> Bugs: MESOS-9034
>     https://issues.apache.org/jira/browse/MESOS-9034
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> `SeccompFilter` class is a wrapper for `libseccomp` API. Its main
> purpose is to provide a translation of the `ContainerSeccompProfile`
> message into calls of `libseccomp` API.
> 
> 
> Diffs
> -----
> 
>   src/CMakeLists.txt a574d449dc26b820cbef7ff0b5e94b42b6fe86cf 
>   src/Makefile.am cd785255fcdf1302a8f9fa358039e5d1f200e132 
>   src/linux/seccomp/seccomp.hpp PRE-CREATION 
>   src/linux/seccomp/seccomp.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68018/diff/16/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message