Bug 1839065

Summary: Privileged containers should be unconfined_t
Product: Red Hat Enterprise Linux 8 Reporter: Olimp Bockowski <obockows>
Component: container-selinuxAssignee: Jindrich Novy <jnovy>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 8.3CC: bbaude, bbreard, cglombek, dornelas, dwalsh, imcleod, jligon, jnovy, kanderso, kchamart, lsm5, mheon, nijoshi, nstielau, travier, tsweeney, walters, ypu
Target Milestone: rc   
Target Release: 8.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: container-selinux-2.135.0-1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-21 15:32:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186913, 1793607    
Attachments:
Description Flags
strace with timeout none

Description Olimp Bockowski 2020-05-22 12:28:13 UTC
Description of problem:
When inside an OCP Node (Worker or Master) using the command "oc debug node/<NODE>", the command `hostnamectl` is bloqued by SELinux, and it hangs until a connection timeout.

Version-Release number of selected component (if applicable):
any RHCOS (for versions related until 4.4)

How reproducible:
always

Steps to Reproduce:
1. oc debug node/$someNode
2. chroot /host
3. hostnamectl

Actual results:
timeout

Expected results:
working

Additional info:
other systemd tools relying on dBus API are NOT blocked by SELinux as well, such as timedatectl

AVC:
ttype=SERVICE_START msg=audit(1581591934.786:182): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset"
type=USER_AVC msg=audit(1581591934.788:183): pid=20361 uid=81 auid=4294967295 ses=4294967295 subj=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 msg='avc:  denied  { send_msg } for msgtype=method_return dest=:1.1386 spid=2831744 tpid=2831743 scontext=system_u:system_r:systemd_hostnamed_t:s0 tcontext=system_u:system_r:spc_t:s0 tclass=dbus permissive=0  exe="/usr/bin/dbus-daemon" sauid=81 hostname=? addr=? terminal=?'UID="dbus" AUID="unset" SAUID="dbus"
type=SERVICE_STOP msg=audit(1581591964.829:184): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset"

Comment 1 Olimp Bockowski 2020-05-22 12:31:56 UTC
Created attachment 1691053 [details]
strace with timeout

Comment 2 Olimp Bockowski 2020-05-22 12:33:45 UTC
 DBUS daemon fails to forward a reply message from systemd to hostnamectl, this would explain why hostnamectl times out

Comment 3 Micah Abbott 2020-05-22 13:44:43 UTC
Setting as low priority as this doesn't appear to impede functionality of the cluster.

Comment 4 Derrick Ornelas 2020-05-29 15:11:56 UTC
Is this really expected to work?  Either way it seems like a SELinux issue, unrelated to RHCOS.

I can reproduce it on RHEL 8:

# podman run --rm -ti --privileged -v /:/host registry.access.redhat.com/ubi8

[root@70b2ffa3f3a7 /]# chroot /host/

sh-4.4# time hostnamectl
Failed to query system properties: Connection timed out

real	0m25.021s
user	0m0.004s
sys	0m0.006s

Comment 5 Colin Walters 2020-05-29 17:08:01 UTC
Yes, it's a base podman/container-selinux problem.

I basically don't understand the rationale behind spc_t.  Anything that's spc_t can easily "escape" to unconfined_t by e.g. `systemd-run` or destroy the system via rm -rf /host or whatever.

We just keep hitting random "unconfined_t can do it but spc_t can't" problems that impede us managing and debugging the operating system via containers.

I tried this but something seems to be denying it:

```
$ podman run --privileged --security-opt=label=type:unconfined_t --rm -ti registry.access.redhat.com/ubi8/ubi:latest
{"msg":"exec container process `/bin/bash`: Permission denied","level":"error","time":"2020-05-29T17:03:20.000206344Z"}
```

Got a bit farther with this:
```
podman run --privileged --security-opt=label=user:unconfined_u,role:unconfined_r,type:unconfined_t --rm -ti registry.access.redhat.com/ubi8/ubi:latest
Error: failed to mount shm tmpfs "/var/lib/containers/storage/overlay-containers/b4fa6193d72291cf30de73d639c8ecaa6af5367ad3d359bb69e0aa72fc253dc3/userdata/shm": invalid argument
```

And the real problem there seems to be:
```
[251727.698188] SELinux: security_context_str_to_sid(unconfined_u,role:unconfined_r,type:unconfined_t:object_r:container_file_t:s0:c32,c492) failed for (dev tmpfs, type tmpfs) errno=-22
```

So I think the RFE here is: add an option to the container stack that lets us be unconfined_t - or stated more generally, have a container image act as closely as possible as if it's run from a root shell.

(This is a bit of a debatable topic whether it should be unconfined_t or init_t; we might need both)

Comment 6 Colin Walters 2020-05-29 17:34:30 UTC
Another case for example is https://github.com/openshift/cluster-network-operator/pull/477
where we were actively fighting spc_t because we want a container image to manage openvswitch on the host, and
it turned into a mess of spc_t and openvswitch_t, whereas it's completely fine to manage openvswitch via the usual unconfined_t.

Comment 7 Daniel Walsh 2020-05-29 17:41:48 UTC
Fixed in https://github.com/containers/container-selinux/releases/tag/v2.135.0

This allows you to specify

# podman run -ti  --security-opt label=type:unconfined_t fedora cat /proc/self/attr/current 
Trying to pull registry.fedoraproject.org/fedora...
Getting image source signatures
Copying blob 1657ffead824 [--------------------------------------] 0.0b / 0.0b
Copying config eb7134a03c done  
Writing manifest to image destination
Storing signatures
system_u:system_r:unconfined_t:s0:c290,c962

Or to define it in the kubernetes yaml.

Comment 8 Colin Walters 2020-05-29 17:58:59 UTC
Awesome, thank you!  Can we get this backported to 8.3?

Comment 9 Tom Sweeney 2020-05-29 18:17:43 UTC
Assigning to Jindrich for packaging needs.  Jindrich can you also answer Colin's question in https://bugzilla.redhat.com/show_bug.cgi?id=1839065#c8 please?

Comment 10 Daniel Walsh 2020-05-29 18:33:03 UTC
Yes we should be able to get this on 8.3, I have no problem on shipping it in 8.2.1 as well.

Comment 11 Jindrich Novy 2020-05-31 05:48:59 UTC
(In reply to Colin Walters from comment #8)
> Awesome, thank you!  Can we get this backported to 8.3?

Yes, it's now in 8.3.0 as well.

Comment 12 Jindrich Novy 2020-05-31 05:49:55 UTC
Can we get QA ack on this one please?

Comment 14 Jindrich Novy 2020-06-01 06:21:45 UTC
Dan, can you please answer the three questions in comment #13 ? It is now required for 8.2.1 exception+. Thanks.

Comment 15 Colin Walters 2020-06-01 14:14:11 UTC
I think we could probably live with having this just in 8.3.

Comment 20 Joy Pu 2020-06-03 11:21:25 UTC
Test with podman-1.9.3-1.module+el8.2.1+6750+e53a300c.x86_64 and container-selinux-2.135.0-1.module+el8.2.1+6849+893e4f4a.noarch with --security-opt label=type:unconfined_t now it works as expect. So set this to verified.
Details:
#  podman run --rm -ti --privileged --security-opt label=type:unconfined_t  -v /:/host registry.access.redhat.com/ubi8
[root@83ef1574f3c2 /]# chroot /host
sh-4.4# time hostnamectl
   Static hostname: ibm-x3250m6-05.lab.eng.pek2.redhat.com
         Icon name: computer-server
           Chassis: server
        Machine ID: 28183d0c95ce4e2da2c950a11992c7e1
           Boot ID: e2b076d01d2c49fb946142f3a9aab571
  Operating System: Red Hat Enterprise Linux 8.2 (Ootpa)
       CPE OS Name: cpe:/o:redhat:enterprise_linux:8.2:GA
            Kernel: Linux 4.18.0-193.1.2.el8_2.x86_64
      Architecture: x86-64

real	0m0.270s
user	0m0.005s
sys	0m0.000s
sh-4.4# exit
exit
[root@83ef1574f3c2 /]# exit
exit

Comment 23 Colin Walters 2020-07-07 14:01:40 UTC
In the meantime, we're hitting issues like this:

type=AVC msg=audit(1594130128.905:155): avc: denied { write } for pid=523559 comm="test2" name="yum.repos.d" dev="dm-0" ino=220215514 scontext=system_u:
system_r:init_t:s0 tcontext=system_u:object_r:system_conf_t:s0 tclass=dir permissive=0`

Where in order to "escape" spc_t we're using systemd-run to run code as init_t, but that domain itself is denied to do some things.

Ultimately we need a fully privileged domain capable of *all* systems management - that is unconfined_t and not e.g. init_t right?
Compare with how e.g. Ansible works by logging in over ssh - that code hence runs as unconfined_t, which means it works differently than doing systems management via a systemd unit or a container.

If with this we're agreed to use unconfined_t, let's be sure that we have well defined mechanisms to transition to that domain from the other cases (e.g. a systemd unit).
-

Comment 27 errata-xmlrpc 2020-07-21 15:32:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3053

Comment 28 Colin Walters 2020-11-10 18:55:46 UTC
*** Bug 1896369 has been marked as a duplicate of this bug. ***

Comment 29 Red Hat Bugzilla 2023-09-14 06:01:03 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days