Bug 1841822 - SELinux blocks 'qemu-kvm' running in a container (running in a VM) [NEEDINFO]
Summary: SELinux blocks 'qemu-kvm' running in a container (running in a VM)
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Cédric Jeanneret
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On: 1846364 1846403
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-29 14:49 UTC by Kashyap Chamarthy
Modified: 2020-12-21 19:23 UTC (History)
22 users (show)

Fixed In Version: openstack-tripleo-heat-templates-10.6.3-0.20200113185561.cf467ea
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-20 09:23:01 UTC
Target Upstream Version:
cjeanner: needinfo? (green)


Attachments (Terms of Use)
audit.log from the Compute host (a VM) (581.38 KB, text/plain)
2020-05-29 14:55 UTC, Kashyap Chamarthy
no flags Details
QEMU instance log; here QEMU is launched by libvirtd in the container (27.69 KB, text/plain)
2020-05-29 14:57 UTC, Kashyap Chamarthy
no flags Details
libvirtd log (with filters); this was captured when libvirtd launched QEMU in the container (6.71 MB, text/plain)
2020-05-29 14:59 UTC, Kashyap Chamarthy
no flags Details
SELinux policy to fix further sVirt-related denials (this is only partial; and does not fix the problem) (4.53 KB, text/plain)
2020-06-05 14:00 UTC, Kashyap Chamarthy
no flags Details


Links
System ID Priority Status Summary Last Updated
OpenStack gerrit 736173 None MERGED Modify how libvirt related containers use SELinux 2020-12-18 01:12:53 UTC

Description Kashyap Chamarthy 2020-05-29 14:49:33 UTC
Description of problem
----------------------

[This problem was discovered in a RHOS setup.]

SELinux is blocking 'qemu-kvm' in a container (whose sole purpose is to
launch QEMU processes), which is in turn running in a VM on a baremetal
host.  This prevents launching any of the OpenStack Nova VMs.


Setup
-----

At a high-level, the setup where the bug is triggerred is as follows:

  - L0 (level-0; baremetal host), which is running OpenStack setup in
    several VMs; one of them being a Compute VM.

  - L1 (level-1 guest; called Compute node): this runs the container
    'nova_libvirt', and other containers that are required for running
    guests.

  - And finally, in the 'nova_libvirt' container: this container runs
    the libvirt daemon, which launches Nova instances, which are QEMU
    proceses


After fixing the first SELinux AVC; 'qemu-kvm' still gets blocked
-----------------------------------------------------------------

The first SELinux failure was the following:

    2020-05-28 09:07:06.831 6 ERROR nova.compute.manager [instance:
    f817f3f7-fd57-4abb-8f7f-4a4ba77be144] libvirt.libvirtError: internal
    error: process exited while connecting to monitor: libvirt:  error :
    cannot execute binary /usr/libexec/qemu-kvm: Permission denied

I fixed the above by generating a reference this way:

    # (1) Set SELinux to permissive
    $ setenforce 0

    # (2) Clear your audit log
    $ > /var/log/audit/audit.log

    # (3) Start a VM 

    # (4) Show a reference policy                                                                                                                         
    $ cat /var/log/audit/audit.log | audit2allow -R
    [...]

    # (5) Generate an SELinux loadable module package                                                                                                     
    $ audit2allow -a -M fix_sVirt_for_qemu-kvm

    # (6) Install the Policy Package
    $ semodule -i fix_sVirt_for_qemu-kvm.pp

    # (7) Enable SELinux
    $> setenforce 1

    # (8) Start a VM, again

That seemingly solved the "permission denied" error, but it threw a
different error:

    [...]
    type=AVC msg=audit(1590744222.416:21096): avc:  denied  { read execute } for  pid=143151 comm="qemu-kvm" path="/usr/libexec/qemu-kvm" dev="overlay" ino=603985 scontext=system_u:system_r:svirt_t:s0:c485,c705 tcontext=system_u:object_r:container_file_t:s0:c400,c876 tclass=file permissive=1
    [...]


Version-Release number of selected components
---------------------------------------------
openstack-selinux-0.8.20-0.20200403124624.386b429.el8ost.noarch
container-selinux-2.124.0-1.module+el8.1.1+5259+bcdd613a.noarch
selinux-policy-3.14.3-20.el8.noarch
4.18.0-147.8.1.el8_1.x86_64


How reproducible
----------------

Consistently.


Steps to reproduce
------------------

[This was reproduced in a RHOS-16 setup, with SELinux enabled in the
'nova_libvirt' container.  Nova is the OpenStack Compute project.  I'm
trying to describe as close of a 'reproducer' env as I can:]

(1) Launch a RHEL-8 VM on a baremetal host.

(2) Create a container with the same parameters as RHOS launches
    its 'nova_libvirt' container:

-----------------------------------------------------------------------
[root@compute-0 ~]# ps -ef | grep nova_libvirt
root      149793       1  0 09:51 ?        00:00:00 /usr/bin/conmon --api-version 1 -s -c 62e2f80e5e031198fb487010a7d5e97319294c8d6992180168823c166f843da9 -u 62e2f80e5e031198fb487010a7d5e97319294c8d6992180168823c166f843da9 -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/62e2f80e5e031198fb487010a7d5e97319294c8d6992180168823c166f843da9/userdata -p /var/run/containers/storage/overlay-containers/62e2f80e5e031198fb487010a7d5e97319294c8d6992180168823c166f843da9/userdata/pidfile -l k8s-file:/var/log/containers/stdouts/nova_libvirt.log --exit-dir /var/run/libpod/exits --socket-dir-path /var/run/libpod/socket --log-level error --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/var/run/containers/storage/overlay-containers/62e2f80e5e031198fb487010a7d5e97319294c8d6992180168823c166f843da9/userdata/oci-log --conmon-pidfile /var/run/nova_libvirt.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 62e2f80e5e031198fb487010a7d5e97319294c8d6992180168823c166f843da9
-----------------------------------------------------------------------

(3) Then in the container: 

    (a) has 'qemu-kvm' running obtained from RHEL-AV (Advanced Virt)
        module
    (b) make sure it has 'libvirtd' running

(4) Launch a VM via libvirt -- it _has_ to be via libvirt, to trigger the
    sVirt issue

PS: This is all probably easier if you can get hold of a RHOS-16 setup.


Actual results
--------------

In the audit.log, 'qemu-kvm' is still blocked:

[...]
type=AVC msg=audit(1590748328.949:22805): avc:  denied  { read execute } for  pid=157970 comm="qemu-kvm" path="/usr/libexec/qemu-kvm" dev="overlay" ino=3060706 scontext=unconfined_u:system_r:svirt_t:s0:c250,c556 tcontext=system_u:object_r:container_file_t:s0:c400,c876 tclass=file permissive=0
[...]
type=AVC msg=audit(1590744222.416:21096): avc:  denied  { read execute } for  pid=143151 comm="qemu-kvm" path="/usr/libexec/qemu-kvm" dev="overlay" ino=603985 scontext=system_u:system_r:svirt_t:s0:c485,c705 tcontext=system_u:object_r:container_file_t:s0:c400,c876 tclass=file permissive=1
[...]


And the below is the reference policy (after fixing the first SELinux
blockage of 'qemu-kvm):

-----------------------------------------------------------------------
[root@compute-0 ~]# cat /var/log/audit/audit.log | audit2allow -R 

require {
        type svirt_t;
        type container_file_t;
        class file { entrypoint execute open read };
        class lnk_file read;
        class dir read;
}

#============= svirt_t ==============

#!!!! This avc is a constraint violation.  You would need to modify the attributes of either the source or target types to allow this access.
#Constraint rule: 
#       mlsconstrain dir { ioctl read lock search } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED

#       Possible cause is the source level (s0:c485,c705) and target level (s0:c400,c876) are different.
allow svirt_t container_file_t:dir read;

#!!!! This avc is allowed in the current policy
allow svirt_t container_file_t:file { entrypoint open };

#!!!! This avc is a constraint violation.  You would need to modify the attributes of either the source or target types to allow this access.
#Constraint rule: 
#       mlsconstrain file { ioctl read lock execute execute_no_trans } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED
mlsconstrain file { write setattr append unlink link rename } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED

#       Possible cause is the source level (s0:c485,c705) and target level (s0:c400,c876) are different.
allow svirt_t container_file_t:file { execute read };

#!!!! This avc is a constraint violation.  You would need to modify the attributes of either the source or target types to allow this access.
#Constraint rule: 
#       mlsconstrain lnk_file { ioctl read getattr } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED

#       Possible cause is the source level (s0:c485,c705) and target level (s0:c400,c876) are different.
allow svirt_t container_file_t:lnk_file read;
-----------------------------------------------------------------------


Expected results
----------------

SELinux should not block 'qemu-kvm' in the container that is running in
a VM.

Comment 1 Kashyap Chamarthy 2020-05-29 14:55:54 UTC
Created attachment 1693402 [details]
audit.log from the Compute host (a VM)

Comment 2 Kashyap Chamarthy 2020-05-29 14:57:16 UTC
Created attachment 1693403 [details]
QEMU instance log; here QEMU is launched by libvirtd in the container

Comment 3 Kashyap Chamarthy 2020-05-29 14:59:32 UTC
Created attachment 1693404 [details]
libvirtd log (with filters); this was captured when libvirtd launched QEMU in the container

Comment 4 Daniel Walsh 2020-05-29 18:18:56 UTC
This looks like some how you launched a vm with svirt_t, and then this is attempting to execute qemu within the container?
Why and how did you do this?

Comment 5 Kashyap Chamarthy 2020-06-02 09:56:31 UTC
This launch-a-QEMU-process-by-libvirt-in-a-container is how RHOS (Red
Hat OpenStack) 16 (and 15) deploys VMs on its Compute nodes.  

The structure is this:

    Compute host (running the 'nova_libvirt' container)
    |
    '-- 'nova_libvirt' container (running (libvirtd')
        |
        '-- QEMU processes

From the same environment, some outputs:

SELinux labels of 'libvirtd' file and the process:

    ()[root@compute-0 /]# ps -eZ `pgrep libvirtd` | grep /usr/bin/libvirtd
    unconfined_u:system_r:spc_t:s0  1039196 pts/3    S+     0:00 grep --color=auto /usr/bin/libvirtd

    ()[root@compute-0 /]# ls -lZ /usr/sbin/libvirtd
    -rwxr-xr-x. 1 root root system_u:object_r:container_file_t:s0:c400,c876 618304 Dec 20 01:11 /usr/sbin/libvirtd

SELinux label of the QEMU binary (can't get the label for the process,
as that's the failure in question — QEMU process fails to launch in the
container):

    ()[root@compute-0 /]# ls -lZ /usr/libexec/qemu-kvm
    -rwxr-xr-x. 1 root root system_u:object_r:container_file_t:s0:c400,c876 16356640 Mar  4 23:01 /usr/libexec/qemu-kvm

SELinux label of the 'nova_libvirt' container, which is where the QEMU
processes are supposed to run:

    [root@compute-0 ~]# ps -eZ `pgrep conmon` | grep nova_libvirt
    unconfined_u:system_r:container_runtime_t:s0 149793 ? Ssl   0:00 /usr/bin/conmon --api-version 1 -s -c 62e2f80e5e031198fb487010a7d5e97319294c8d6992180168823c166f843da9 -u 62e2f80e5e031198fb487010a7d5e97319294c8d6992180168823c166f843da9 -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/62e2f80e5e031198fb487010a7d5e97319294c8d6992180168823c166f843da9/userdata -p /var/run/containers/storage/overlay-containers/62e2f80e5e031198fb487010a7d5e97319294c8d6992180168823c166f843da9/userdata/pidfile -l k8s-file:/var/log/containers/stdouts/nova_libvirt.log --exit-dir /var/run/libpod/exits --socket-dir-path /var/run/libpod/socket --log-level error --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/var/run/containers/storage/overlay-containers/62e2f80e5e031198fb487010a7d5e97319294c8d6992180168823c166f843da9/userdata/oci-log --conmon-pidfile /var/run/nova_libvirt.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 62e2f80e5e031198fb487010a7d5e97319294c8d6992180168823c166f843da9
    unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 1035176 pts/0 Sl+   0:00 podman exec -it nova_libvirt bash
    unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 1040710 pts/1 R+   0:00 grep --color=auto nova_libvirt

                     - - -

From the above, libvirtd running in the OSP container ('nova_libvirt') 
above has failed on both accounts: where the /usr/bin/libvirtd file has
the label 'spc_t' and the libvirtd process has 'container_file_t'

However, Dan Berrangé tells me on IRC that normally /usr/bin/libvirtd 
file will have the label 'virtd_exec_t' and the libvirtd process will
have 'virtd_t'.  So the OSP container seems to have failed on both
counts.

Comment 7 Kashyap Chamarthy 2020-06-04 11:33:24 UTC
The bind mounts:

[root@compute-0 ~]# podman inspect nova_libvirt | grep Binds -A26
            "Binds": [
                "/var/lib/vhost_sockets:/var/lib/vhost_sockets:rw,rprivate,rbind",
                "/run:/run:rw,rprivate,nosuid,nodev,rbind",
                "/dev/log:/dev/log:rw,rprivate,nosuid,rbind",
                "/etc/hosts:/etc/hosts:ro,rprivate,rbind",
                "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro,rprivate,rbind",
                "/sys/fs/selinux:/sys/fs/selinux:rw,rprivate,rbind",
                "/etc/libvirt:/etc/libvirt:rw,rprivate,rbind",
                "/var/log/libvirt/qemu:/var/log/libvirt/qemu:ro,rprivate,rbind",
                "/var/run/libvirt:/var/run/libvirt:shared,rw,rbind",
                "/var/log/containers/libvirt:/var/log/libvirt:rw,rprivate,rbind",
                "/var/lib/config-data/puppet-generated/nova_libvirt:/var/lib/kolla/config_files/src:ro,rprivate,rbind",
                "/var/lib/kolla/config_files/nova_libvirt.json:/var/lib/kolla/config_files/config.json:ro,rprivate,rbind",
                "/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind",
                "/lib/modules:/lib/modules:ro,rprivate,rbind",
                "/etc/puppet:/etc/puppet:ro,rprivate,rbind",
                "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro,rprivate,rbind",
                "/var/lib/libvirt:/var/lib/libvirt:shared,rw,rbind",
                "/etc/localtime:/etc/localtime:ro,rprivate,rbind",
                "/var/lib/nova:/var/lib/nova:shared,rw,rbind",
                "/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro,rprivate,rbind",
                "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro,rprivate,rbind",
                "/dev:/dev:rw,rprivate,nosuid,rbind",
                "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro,rprivate,rbind",
                "/sys/fs/cgroup:/sys/fs/cgroup:rw,rprivate,noexec,nosuid,nodev,rbind",
                "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro,rprivate,rbind"
            ],

Comment 8 Ondrej Mosnacek 2020-06-04 11:58:19 UTC
Trying to summarize the conclusions from the call we had regarding this:
* the issue is (almost certainly) not in the context of the container processes (it is be superprivileged, which should be more than enough to do all that it needs to)
* we don't want to turn off libvirt's VM process isolation (running VMs as svirt_t:s0:cX,cY), since the container is privileged
* the root of the issue seems to lie in how podman sets up the overlayfs mount that contains the files from the container image (which include the qemu-kvm binary), specifically that it labels them as container_file_t:s0:cA,cB, which the VM doesn't have access to due to the mismatching category sets

So, the question is: Is it possible to get podman to label the overlayfs mount with a different context? If so how? (This is probably easy, but I have too little knowledge of podman...) It seems the best context to keep compatibility with the old policy would be "system_u:object_r:container_share_t:s0" (no categories), but maybe some other one would work better.

Comment 9 Daniel Berrangé 2020-06-04 12:09:03 UTC
On further investigation I don't think we need to mess with the overlayfs labelling. That only affects labelling of the QEMU binary we're execing. A custom policy rule to allow libvirtd to transition to svirt_t when exec'ing container_file_t is sufficient for that.

The places where libvirt needs to set per-QEMU file labels are in

  /var/run/libvirt
  /var/lib/libvirt
  /var/log/libvirt
  /var/cache/libvirt
  ...the dir where nova wants disk images stored...

These are already being bind mounted from the host FS.

IIUC, we should just make sure that these bind mounts do NOT have the ":z" flag added. That flag would force their labels to appear as container_file_t. By omitting :z, then libvirtd should be able to do its regular "chcon" logic to set per-QEMU labels on files.

Comment 10 Julie Pichon 2020-06-04 12:30:16 UTC
(In reply to Daniel Berrangé from comment #9)
> The places where libvirt needs to set per-QEMU file labels are in
> 
>   /var/run/libvirt
>   /var/lib/libvirt
>   /var/log/libvirt
>   /var/cache/libvirt
>   ...the dir where nova wants disk images stored...
> 
> These are already being bind mounted from the host FS.
> 
> IIUC, we should just make sure that these bind mounts do NOT have the ":z"
> flag added. That flag would force their labels to appear as
> container_file_t. By omitting :z, then libvirtd should be able to do its
> regular "chcon" logic to set per-QEMU labels on files.

Right. From the link Cédric shared, we can see a few of the bind mounts do have the :z flag:

https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/train/deployment/nova/nova-libvirt-container-puppet.yaml#L690-L784

                  - /var/run/libvirt:/var/run/libvirt:shared,z
                  - /var/lib/libvirt:/var/lib/libvirt:shared,z
                  - /var/lib/vhost_sockets:/var/lib/vhost_sockets:z

Comment 11 Cédric Jeanneret 2020-06-04 16:03:22 UTC
wondering if the "shared" is also needed.. Any thoughts?

Comment 12 Daniel Walsh 2020-06-04 17:42:20 UTC
Shared definitely is not needed, Unless you are mounting content on /var/lib/libvirt and /var/run/libvirt that you want to be seen by the host and other containers that have these mountpoints.
I have no problem in adding container_file_t as an entrypoint for svirt_t.

Comment 13 Daniel Walsh 2020-06-04 17:44:04 UTC
These rules should be added to the selinux-policy package.

Comment 14 Ondrej Mosnacek 2020-06-04 18:25:24 UTC
(In reply to Daniel Berrangé from comment #9)
> On further investigation I don't think we need to mess with the overlayfs
> labelling. That only affects labelling of the QEMU binary we're execing. A
> custom policy rule to allow libvirtd to transition to svirt_t when exec'ing
> container_file_t is sufficient for that.

(In reply to Daniel Walsh from comment #12)
> I have no problem in adding container_file_t as an entrypoint for svirt_t.

But that won't help you with the category set mismatch issue.

Comment 15 Daniel Walsh 2020-06-04 18:45:27 UTC
Can you launch the container with a level:s0

podman run -ti --security-opt label=type:svirt_t --security-opt label=level:s0 fedora

Comment 16 Cédric Jeanneret 2020-06-05 07:37:46 UTC
After some digging:

- /var/run/libvirt is apparently mounted in 3 containers. Two of them mounts it with write access. Both are privileged, so we might be able to drop the :z here. The third one is ceilometer, iirc it's deprecated anyway (and read-only)

- /var/lib/libvirt is apparently mounted in 2 containers. Both are running as privileged, so we might be able to drop the :z here.

- /var/lib/vhost_sockets seems to be used by some Neutron thing, but I couldn't find what container actually uses it :/. We might still need the :z if the other containers aren't privileged.

The "shared" needs some deeper investigations, but I *think* we should be able to drop it.

Comment 18 Cédric Jeanneret 2020-06-05 09:04:06 UTC
Since this issue isn't directly RHEL related, nor a SELinux component problem, I'm moving it to the right product (OSP) and component (openstack-tripleo-heat-templates).
I set the version to OSP-15, but clones will probably be needed for osp-16.0, 16.1, 16.2 and 17, seeing the content in upstream "master".

Comment 19 Kashyap Chamarthy 2020-06-05 14:00:49 UTC
Created attachment 1695442 [details]
SELinux policy to fix further sVirt-related denials (this is only partial; and does not fix the problem)

Along with the attempt I described in comment#0 (Description), these are other AVCs that we fixed; but as the notes show, despite applying the policy, there's further denials for sVirt:

    [root@compute-0 ~]# cat /var/log/audit/audit.log | audit2allow -R 
    
    require {
            type svirt_t;
            type container_file_t;
            class file { entrypoint execute open read };
            class lnk_file read;
            class dir read;
    }
    
    #============= svirt_t ==============
    
    #!!!! This avc is a constraint violation.  You would need to modify the attributes of either the source or target types to allow this access.
    #Constraint rule: 
    #       mlsconstrain dir { ioctl read lock search } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED
    
    #       Possible cause is the source level (s0:c485,c705) and target level (s0:c400,c876) are different.
    allow svirt_t container_file_t:dir read;
    
    #!!!! This avc is allowed in the current policy
    allow svirt_t container_file_t:file { entrypoint open };
    
    #!!!! This avc is a constraint violation.  You would need to modify the attributes of either the source or target types to allow this access.
    #Constraint rule: 
    #       mlsconstrain file { ioctl read lock execute execute_no_trans } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED
    mlsconstrain file { write setattr append unlink link rename } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED
    
    #       Possible cause is the source level (s0:c485,c705) and target level (s0:c400,c876) are different.
    allow svirt_t container_file_t:file { execute read };
    
    #!!!! This avc is a constraint violation.  You would need to modify the attributes of either the source or target types to allow this access.
    #Constraint rule: 
    #       mlsconstrain lnk_file { ioctl read getattr } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED
    
    #       Possible cause is the source level (s0:c485,c705) and target level (s0:c400,c876) are different.
    allow svirt_t container_file_t:lnk_file read;

Comment 25 Kashyap Chamarthy 2020-06-10 09:04:27 UTC
Interim Update (no good news; with MLS disabled for 'nova_libvirt')

Evidence that MLS is disabled (see the "s0-s0") for the 'nova_libvirt' container:

    ()[root@overcloud-0-novacompute-0 /]# ps -faZ | grep [p]odman
    unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 root 45866 40730  0 08:48 pts/0 00:00:00 podman exec -ti -u root nova_libvirt bash
    unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 root 46188 45163  0 08:49 pts/3 00:00:00 podman exec -ti -u root nova_libvirt bash


These are the mounts for 'nova_libvirt' container:

[root@overcloud-0-novacompute-0 ~]# podman inspect nova_libvirt | grep Binds -A28
            "Binds": [
                "/var/log/libvirt/qemu:/var/log/libvirt/qemu:ro,rprivate,rbind",
                "/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro,rprivate,rbind",
                "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro,rprivate,rbind",
                "/var/run/libvirt:/var/run/libvirt:shared,rw,rbind",
                "/sys/fs/selinux:/sys/fs/selinux:rw,rprivate,rbind",
                "/run:/run:rw,rprivate,nosuid,nodev,rbind",
                "/etc/hosts:/etc/hosts:ro,rprivate,rbind",
                "/var/lib/kolla/config_files/nova_libvirt.json:/var/lib/kolla/config_files/config.json:ro,rprivate,rbind",
                "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro,rprivate,rbind",
                "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro,rprivate,rbind",
                "/var/lib/config-data/puppet-generated/nova_libvirt:/var/lib/kolla/config_files/src:ro,rprivate,rbind",
                "/sys/fs/cgroup:/sys/fs/cgroup:rw,rprivate,noexec,nosuid,nodev,rbind",
                "/etc/libvirt:/etc/libvirt:rw,rprivate,rbind",
                "/var/lib/libvirt:/var/lib/libvirt:shared,rw,rbind",
                "/var/lib/nova:/var/lib/nova:shared,rw,rbind",
                "/etc/puppet:/etc/puppet:ro,rprivate,rbind",
                "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro,rprivate,rbind",
                "/var/log/containers/libvirt:/var/log/libvirt:rw,rprivate,rbind",
                "/etc/selinux/config:/etc/selinux/config:ro,rprivate,rbind",
                "/dev/log:/dev/log:rw,rprivate,nosuid,rbind",
                "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro,rprivate,rbind",
                "/etc/localtime:/etc/localtime:ro,rprivate,rbind",
                "/var/lib/vhost_sockets:/var/lib/vhost_sockets:rw,rprivate,rbind",
                "/dev:/dev:rw,rprivate,nosuid,rbind",
                "/lib/modules:/lib/modules:ro,rprivate,rbind",
                "/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind",
                "/var/cache/libvirt:/var/cache/libvirt:shared,rw,rbind"
            ],



Even with the above, starting an instance results in the following 
denial for 'qemu-kvm':

    type=AVC msg=audit(1591778291.438:11091): avc:  denied  { read 
    execute } for  pid=43435 comm="qemu-kvm"
    path="/usr/libexec/qemu-kvm" dev="overlay" ino=173491
    scontext=unconfined_u:system_r:svirt_tcg_t:s0:c34,c865
    tcontext=system_u:object_r:container_file_t:s0:c389,c923 tclass=file
    permissive=0

Comment 26 Daniel Berrangé 2020-06-10 09:12:45 UTC
(In reply to Kashyap Chamarthy from comment #25)
> Interim Update (no good news; with MLS disabled for 'nova_libvirt')
> 
> Evidence that MLS is disabled (see the "s0-s0") for the 'nova_libvirt'
> container:
> 
>     ()[root@overcloud-0-novacompute-0 /]# ps -faZ | grep [p]odman
>     unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 root 45866
> 40730  0 08:48 pts/0 00:00:00 podman exec -ti -u root nova_libvirt bash
>     unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 root 46188
> 45163  0 08:49 pts/3 00:00:00 podman exec -ti -u root nova_libvirt bash

Those podman processes are not the ones actually launching the container. You need to look for a "conman" process, and then find its first child, which I presume will be libvirtd.

Look in /proc/mounts for "container_file_t" to see what label is being used for the filesystem

> Even with the above, starting an instance results in the following 
> denial for 'qemu-kvm':
> 
>     type=AVC msg=audit(1591778291.438:11091): avc:  denied  { read 
>     execute } for  pid=43435 comm="qemu-kvm"
>     path="/usr/libexec/qemu-kvm" dev="overlay" ino=173491
>     scontext=unconfined_u:system_r:svirt_tcg_t:s0:c34,c865
>     tcontext=system_u:object_r:container_file_t:s0:c389,c923 tclass=file
>     permissive=0

This existence of "system_u:object_r:container_file_t:s0:c389,c923" rather suggest the security opts have not been used when launching the container.  I presume /proc/mounts will show the same container_file_t MLS.

Comment 27 Cédric Jeanneret 2020-06-10 16:34:05 UTC
Hello there,

So. some updates :).

TLDR; it's not working with label=level:s0

Here's the actual command used to run the container:

podman run --name nova_libvirt-4kqf94c7 --conmon-pidfile=/var/run/nova_libvirt-4kqf94c7.pid --detach=true --env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS --net=host --pid=host --ulimit=nofile=131072 --ulimit=nproc=126960 --privileged=true --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/log/containers/libvirt:/var/log/libvirt:z --volume=/var/lib/kolla/config_files/nova_libvirt.json:/var/lib/kolla/config_files/config.json:ro --volume=/var/lib/config-data/puppet-generated/nova_libvirt:/var/lib/kolla/config_files/src:ro --volume=/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro --volume=/lib/modules:/lib/modules:ro --volume=/dev:/dev --volume=/run:/run --volume=/sys/fs/cgroup:/sys/fs/cgroup --volume=/etc/libvirt:/etc/libvirt --volume=/var/run/libvirt:/var/run/libvirt:shared --volume=/var/lib/libvirt:/var/lib/libvirt:shared --volume=/var/cache/libvirt:/var/cache/libvirt:shared --volume=/var/log/libvirt/qemu:/var/log/libvirt/qemu:ro --volume=/var/lib/vhost_sockets:/var/lib/vhost_sockets:z --volume=/var/lib/nova:/var/lib/nova:shared --volume=/sys/fs/selinux:/sys/fs/selinux --volume=/etc/selinux/config:/etc/selinux/config:ro --security-opt=label=level:s0 undercloud.ctlplane:8787/rh-osbs/rhosp16-openstack-nova-libvirt:20200520.1

(for the records, on osp-16, we can use the following to get this command: paunch debug --file /var/lib/tripleo-config/container-startup-config/step_3/nova_libvirt.json --action print-cmd --container nova_libvirt)

As we can see, there's this near the end of the command:
--security-opt=label=level:s0

This should disable the MLS level in the container, if I understand dwalsh proposal and other IRC discussions.

But, according to danpb last comment, I think it's not working as expected:

podman ps | grep nova_libvirt
87d730c690f9  undercloud.ctlplane:8787/rh-osbs/rhosp16-openstack-nova-libvirt:20200520.1                kolla_start           10 minutes ago  Up 10 minutes ago         nova_libvirt

grep 87d730c690f9 /proc/mounts 
shm /var/lib/containers/storage/overlay-containers/87d730c690f9fcbc96d5bb36a52a28616e660c2c648e0586c3231809a5458d65/userdata/shm tmpfs rw,context="system_u:object_r:container_file_t:s0:c544,c561",nosuid,nodev,noexec,relatime,size=64000k 0 0

Here, we can see context="system_u:object_r:container_file_t:s0:c544,c561" - sooo... that's wrong, right?

If I inspect the container like this: podman inspect 87d730c690f9 | jq -r '.[0]["HostConfig"]["SecurityOpt"]'
[
  "label=level:s0"
]

So it looks like it's been applied, isn't it?

Here's the actual SELinux denials on the host, in Permissive, without any crafted SELinux policy:
type=AVC msg=audit(1591805692.426:7608): avc:  denied  { entrypoint } for  pid=34243 comm="libvirtd" path="/usr/libexec/qemu-kvm" dev="overlay" ino=172355 scontext=system_u:system_r:svirt_t:s0:c653,c843 tcontext=system_u:object_r:container_file_t:s0:c544,c561 tclass=file permissive=1
type=AVC msg=audit(1591805692.426:7608): avc:  denied  { read execute } for  pid=34243 comm="qemu-kvm" path="/usr/libexec/qemu-kvm" dev="overlay" ino=172355 scontext=system_u:system_r:svirt_t:s0:c653,c843 tcontext=system_u:object_r:container_file_t:s0:c544,c561 tclass=file permissive=1
type=AVC msg=audit(1591805692.429:7609): avc:  denied  { open } for  pid=34243 comm="qemu-kvm" path="/etc/ld.so.cache" dev="overlay" ino=169249 scontext=system_u:system_r:svirt_t:s0:c653,c843 tcontext=system_u:object_r:container_file_t:s0:c544,c561 tclass=file permissive=1
type=AVC msg=audit(1591805692.429:7610): avc:  denied  { read } for  pid=34243 comm="qemu-kvm" name="lib64" dev="overlay" ino=169245 scontext=system_u:system_r:svirt_t:s0:c653,c843 tcontext=system_u:object_r:container_file_t:s0:c544,c561 tclass=lnk_file permissive=1
type=AVC msg=audit(1591805692.462:7611): avc:  denied  { read } for  pid=34243 comm="qemu-kvm" name="qemu-kvm" dev="overlay" ino=170589 scontext=system_u:system_r:svirt_t:s0:c653,c843 tcontext=system_u:object_r:container_file_t:s0:c544,c561 tclass=dir permissive=1

As a first step, I've produced a policy, "just to see what would be needed" and applied it. The policy is:

module nova-libvirt-podman 1.0;

require {
        type svirt_t;
        type container_file_t;
        class file { entrypoint execute open read };
        class lnk_file read;
        class dir read;
}

#============= svirt_t ==============
allow svirt_t container_file_t:dir read;
allow svirt_t container_file_t:file { entrypoint execute open read };
allow svirt_t container_file_t:lnk_file read;

Then I've started a new VM, still with a Permissive host. Here are the denials:

type=AVC msg=audit(1591805958.653:7734): avc:  denied  { read execute } for  pid=35448 comm="qemu-kvm" path="/usr/libexec/qemu-kvm" dev="overlay" ino=172355 scontext=system_u:system_r:svirt_t:s0:c223,c656 tcontext=system_u:object_r:container_file_t:s0:c544,c561 tclass=file permissive=1
type=AVC msg=audit(1591805958.655:7735): avc:  denied  { read } for  pid=35448 comm="qemu-kvm" name="lib64" dev="overlay" ino=169245 scontext=system_u:system_r:svirt_t:s0:c223,c656 tcontext=system_u:object_r:container_file_t:s0:c544,c561 tclass=lnk_file permissive=1
type=AVC msg=audit(1591805958.684:7736): avc:  denied  { read } for  pid=35448 comm="qemu-kvm" name="qemu-kvm" dev="overlay" ino=170589 scontext=system_u:system_r:svirt_t:s0:c223,c656 tcontext=system_u:object_r:container_file_t:s0:c544,c561 tclass=dir permissive=1
type=AVC msg=audit(1591805958.696:7737): avc:  denied  { read } for  pid=35448 comm="qemu-kvm" name="bios-256k.bin" dev="overlay" ino=278247 scontext=system_u:system_r:svirt_t:s0:c223,c656 tcontext=system_u:object_r:container_file_t:s0:c544,c561 tclass=lnk_file permissive=1

And if we want to re-run audit2allow with those denials:
module bad-mls 1.0;

require {
        type container_file_t;
        type svirt_t;
        class file { execute read };
        class lnk_file read;
        class dir read;
}

#============= svirt_t ==============

#!!!! This avc is a constraint violation.  You would need to modify the attributes of either the source or target types to allow this access.
#Constraint rule: 
#       mlsconstrain dir { ioctl read lock search } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED

#       Possible cause is the source level (s0:c223,c656) and target level (s0:c544,c561) are different.
allow svirt_t container_file_t:dir read;

#!!!! This avc is a constraint violation.  You would need to modify the attributes of either the source or target types to allow this access.
#Constraint rule: 
#       mlsconstrain file { ioctl read lock execute execute_no_trans } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED
mlsconstrain file { write setattr append unlink link rename } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED

#       Possible cause is the source level (s0:c223,c656) and target level (s0:c544,c561) are different.
allow svirt_t container_file_t:file { execute read };

#!!!! This avc is a constraint violation.  You would need to modify the attributes of either the source or target types to allow this access.
#Constraint rule: 
#       mlsconstrain lnk_file { ioctl read getattr } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED

#       Possible cause is the source level (s0:c223,c656) and target level (s0:c544,c561) are different.
allow svirt_t container_file_t:lnk_file read;


In addition, here's the first VM startup log, with an Enforcing system:

2020-06-10 16:13:29.062+0000: starting up libvirt version: 5.6.0, package: 10.module+el8.1.1+5309+6d656f05 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2019-12-20-01:08:55, ), qemu version: 4.1.0qemu-kvm-4.1.0-23.module+el8.1.1+6238+f5d69f68.3, kernel: 4.18.0-147.8.1.el8_1.x86_64, hostname: overcloud-0-novacompute-0.localdomain
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
HOME=/var/lib/libvirt/qemu/domain-1-instance-00000001 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-instance-00000001/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-instance-00000001/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-instance-00000001/.config \
QEMU_AUDIO_DRV=none \
/usr/libexec/qemu-kvm \
-name guest=instance-00000001,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-instance-00000001/master-key.aes \
-machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \
-cpu EPYC-IBPB,x2apic=on,tsc-deadline=on,hypervisor=on,tsc-adjust=on,arch-capabilities=on,cmp-legacy=on,perfctr-core=on,virt-ssbd=on,skip-l1dfl-vmentry=on,monitor=off,svm=off \
-m 512 \
-overcommit mem-lock=off \
-smp 1,sockets=1,cores=1,threads=1 \
-uuid ad468e3a-ef5f-4bd2-a88b-ef791da2bf54 \
-smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=20.1.2-0.20200401205214.28324e6.el8ost,serial=ad468e3a-ef5f-4bd2-a88b-ef791da2bf54,uuid=ad468e3a-ef5f-4bd2-a88b-ef791da2bf54,family=Virtual Machine' \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=33,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-boot strict=on \
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
-drive file=/var/lib/nova/instances/ad468e3a-ef5f-4bd2-a88b-ef791da2bf54/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,fd=35,id=hostnet0,vhost=on,vhostfd=36 \
-device virtio-net-pci,rx_queue_size=512,host_mtu=1442,netdev=hostnet0,id=net0,mac=fa:16:3e:d5:93:ca,bus=pci.0,addr=0x3 \
-add-fd set=3,fd=38 \
-chardev pty,id=charserial0,logfile=/dev/fdset/3,logappend=on \
-device isa-serial,chardev=charserial0,id=serial0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-vnc 172.16.13.58:0 \
-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied
2020-06-10 16:13:29.120+0000: shutting down, reason=failed

Here's the SELinux context for the qemu-kvm in the container:
[root@overcloud-0-novacompute-0 qemu]# podman exec -u root nova_libvirt ls -lZ /usr/libexec/qemu-kvm
-rwxr-xr-x. 1 root root system_u:object_r:container_file_t:s0:c544,c561 16356584 Apr  6 20:47 /usr/libexec/qemu-kvm

From the outside and the inside of the nova_libvirt container, libvirt is running like this:
system_u:system_r:spc_t:s0        26133 ?        Sl     0:00  |   \_ /usr/sbin/libvirtd


Am I missing something? Or is it a podman bug?

Cheers,

C.

Comment 28 Ondrej Mosnacek 2020-06-10 17:21:17 UTC
(In reply to Cédric Jeanneret from comment #27)
> As we can see, there's this near the end of the command:
> --security-opt=label=level:s0
> 
> This should disable the MLS level in the container, if I understand dwalsh
> proposal and other IRC discussions.
> 
> But, according to danpb last comment, I think it's not working as expected:

Not an expert on podman, but reading podman-run(1) I think it is, in fact, working as expected:
"--security-opt=option
[...]
· label=level:LEVEL: Set the label level for the container processes
[...]
· label=filetype:TYPE_: Set the label file type for the container files"

So the user/role/type/level options apply to _processes_, but not _files_. For files, there is only the "filetype" option, which allows overriding only the type. Unfortunately, there is no option to disable/override the setting of the level part of file labels. So it would seem your use case depends on podman adding support for something like --security-opt=label=filelevel:s0 :(

Comment 29 Cédric Jeanneret 2020-06-11 07:06:42 UTC
Thank you Ondrej.
Maybe we can tweak things in order to run things in a better context than container_file_t. Though, in order to do that, we'll need to update paunch in order to allow to pass multiple security_opt (for now it's a string in our tool (paunch)). I'll try to tweak my env in order to pre-test this feature and allow, hopefully, to get some new things, using either dwalsh "label=type:svirt_t" proposal, or something affecting the filesystem in the container.

For the record, the patch I'm proposing against paunch is here: https://review.opendev.org/735063 - it will need to be backported down to train (osp-16) or even stein (osp-15).

Comment 30 Cédric Jeanneret 2020-06-11 08:45:12 UTC
update: I think I just hit a podman bug, since I'm apparently unable to pass multiple "--security-opt" to the "podman run" command - for instance this one:
podman run --net host --name selinux-test --rm -ti --security-opt=label=level:s0 --security-opt=label=type:svirt_t --privileged centos:8 sh

doesn't show both options in podman inspect:
[                                                                                                                                                                                                                                             
  "label=type:svirt_t"
]


I just opened an issue upstream: https://github.com/containers/libpod/issues/6567

In the meanwhile, we're stuck for good on that issue, imho.

Comment 31 Daniel Berrangé 2020-06-11 09:40:30 UTC
(In reply to Cédric Jeanneret from comment #27)
> Here's the actual command used to run the container:
> 
> podman run --name nova_libvirt-4kqf94c7
> --conmon-pidfile=/var/run/nova_libvirt-4kqf94c7.pid --detach=true
> --env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS --net=host --pid=host
> --ulimit=nofile=131072 --ulimit=nproc=126960 --privileged=true
> --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro
> --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro
> --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:
> ro
> --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:
> ro
> --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.
> trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro
> --volume=/dev/log:/dev/log
> --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro
> --volume=/etc/puppet:/etc/puppet:ro
> --volume=/var/log/containers/libvirt:/var/log/libvirt:z
> --volume=/var/lib/kolla/config_files/nova_libvirt.json:/var/lib/kolla/
> config_files/config.json:ro
> --volume=/var/lib/config-data/puppet-generated/nova_libvirt:/var/lib/kolla/
> config_files/src:ro
> --volume=/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro
> --volume=/lib/modules:/lib/modules:ro --volume=/dev:/dev --volume=/run:/run
> --volume=/sys/fs/cgroup:/sys/fs/cgroup --volume=/etc/libvirt:/etc/libvirt
> --volume=/var/run/libvirt:/var/run/libvirt:shared
> --volume=/var/lib/libvirt:/var/lib/libvirt:shared
> --volume=/var/cache/libvirt:/var/cache/libvirt:shared
> --volume=/var/log/libvirt/qemu:/var/log/libvirt/qemu:ro
> --volume=/var/lib/vhost_sockets:/var/lib/vhost_sockets:z
> --volume=/var/lib/nova:/var/lib/nova:shared
> --volume=/sys/fs/selinux:/sys/fs/selinux
> --volume=/etc/selinux/config:/etc/selinux/config:ro
> --security-opt=label=level:s0
> undercloud.ctlplane:8787/rh-osbs/rhosp16-openstack-nova-libvirt:20200520.1
> 
> (for the records, on osp-16, we can use the following to get this command:
> paunch debug --file
> /var/lib/tripleo-config/container-startup-config/step_3/nova_libvirt.json
> --action print-cmd --container nova_libvirt)
> 
> As we can see, there's this near the end of the command:
> --security-opt=label=level:s0
> 
> This should disable the MLS level in the container, if I understand dwalsh
> proposal and other IRC discussions.
> 
> But, according to danpb last comment, I think it's not working as expected:
> 
> podman ps | grep nova_libvirt
> 87d730c690f9 
> undercloud.ctlplane:8787/rh-osbs/rhosp16-openstack-nova-libvirt:20200520.1  
> kolla_start           10 minutes ago  Up 10 minutes ago         nova_libvirt
> 
> grep 87d730c690f9 /proc/mounts 
> shm
> /var/lib/containers/storage/overlay-containers/
> 87d730c690f9fcbc96d5bb36a52a28616e660c2c648e0586c3231809a5458d65/userdata/
> shm tmpfs
> rw,context="system_u:object_r:container_file_t:s0:c544,c561",nosuid,nodev,
> noexec,relatime,size=64000k 0 0
> 
> Here, we can see context="system_u:object_r:container_file_t:s0:c544,c561" -
> sooo... that's wrong, right?

That does not match expected behaviour that I see when using podman in a test scenario, both latest 1.9.3 and older 1.6.4

$ rpm -q podman container-selinux selinux-policy-targeted
podman-1.6.4-4.module+el8.1.1+5885+44006e55.x86_64
container-selinux-2.124.0-1.module+el8.1.1+5259+bcdd613a.noarch
selinux-policy-targeted-3.14.3-20.el8.noarch

$ podman run --security-opt=label=level:s0 -t -i fedora  /bin/sh
sh-5.0# cat /proc/mounts | grep container_file_t | head -1
overlay / overlay rw,context=system_u:object_r:container_file_t:s0,relatime,lowerdir=/var/lib/containers/storage/overlay/l/ZVVMG37KKMQDLMX2NKZHSG2D6I,upperdir=/var/lib/containers/storage/overlay/5f437dc609b0547b28173247061310bfc969f72ca810e48b556fcaa648b45da8/diff,workdir=/var/lib/containers/storage/overlay/5f437dc609b0547b28173247061310bfc969f72ca810e48b556fcaa648b45da8/work 0 0

In another shell

$ podman ps
CONTAINER ID  IMAGE                            COMMAND  CREATED        STATUS            PORTS  NAMES
a1af6f4bf136  docker.io/library/fedora:latest  /bin/sh  4 seconds ago  Up 3 seconds ago         jovial_clarke

$ grep a1af6f4bf136 /proc/mounts 
shm /var/lib/containers/storage/overlay-containers/a1af6f4bf136d68679d086accb51babac0a5bc218e4d98b5557ba369de4d5ca4/userdata/shm tmpfs rw,context=system_u:object_r:container_file_t:s0,nosuid,nodev,noexec,relatime,size=64000k 0 0

So security-opt is honoured as expected.

Thus there must be something about the way nova-libvirt is being launched that is incorrect in some manner.

Comment 32 Daniel Berrangé 2020-06-11 12:58:10 UTC
This turns out to be a bug in podman 1.6.4

When --privileged is passed, it no longer honours the --security-opt when configuring the filesystem mount context. The process context is unaffected.

This flaw is fixed with libpod commit 58cbbbc56e9f1cee4992ae4f4d3971c0e336ecd2 which was 1.8.1 timeframe

See https://bugzilla.redhat.com/show_bug.cgi?id=1846364

If the podman bug is fixed, then it should be possible to configure nova-libvirt container to avoid triggering  MLS category constraint violations.

Comment 33 Daniel Walsh 2020-06-11 13:26:26 UTC
Podman 1.9.3 should be released in RHEL8.2.1.  But this might need to be back ported for podman 1.6.4

Comment 34 Tom Sweeney 2020-06-11 13:29:23 UTC
Adding Matt Heon so he can be aware of the potential backport to v1.6.4

Comment 36 Cédric Jeanneret 2020-06-17 12:09:37 UTC
Finally got a patch (this is for master). There are other dependencies, especially one on podman...

Comment 37 Kashyap Chamarthy 2020-06-17 13:19:45 UTC
To summarize, this was the following working setup (it requires 'podman' 1.9.3 + the 


OSP-16 sVirt labels for QEMU / libvirt
======================================

tl;dr — the four key SELinux labels for QEMU and libvirt for 'qemu-kvm'
processes to launch successfully:

  - 'spc_t' for the libvirtd process; 
  - 'container_ro_file_t' for /usr/bin/libvirtd file; 
  - 'svirt_t' for the QEMU process; and finally 
  - 'container_ro_file_t' for /usr/libexec/qemu-kvm file

            - - -

Exec into the 'nova_libvirt' container:

    [root@overcloud-0-novacompute-0 ~]# podman exec -it nova_libvirt /bin/bash

The versions of podman (we need 1.9.3) / container-selinux:

    [root@overcloud-0-novacompute-0 ~]# rpm -q podman container-selinux
    openstack-selinux
    podman-1.9.3-2.module+el8.2.1+6867+366c07d6.x86_64
    container-selinux-2.124.0-1.module+el8.1.1+5259+bcdd613a.noarch
    openstack-selinux-0.8.20-0.20200403124624.386b429.el8ost.noarch

SELinux info:

    ()[root@overcloud-0-novacompute-0 /]# getenforce 
    Enforcing

    ()[root@overcloud-0-novacompute-0 /]# sestatus 
    SELinux status:                 enabled
    SELinuxfs mount:                /sys/fs/selinux
    SELinux root directory:         /etc/selinux
    Loaded policy name:             targeted
    Current mode:                   enforcing
    Mode from config file:          enforcing
    Policy MLS status:              enabled
    Policy deny_unknown status:     allowed
    Memory protection checking:     actual (secure)
    Max kernel policy version:      31

SELinux labels of the running QEMU processes:

    ()[root@overcloud-0-novacompute-0 /]# ps -eZ | grep qemu
    system_u:system_r:svirt_t:s0:c496,c549 216371 ?  00:01:30 qemu-kvm
    system_u:system_r:svirt_t:s0:c190,c890 230724 ?  00:00:18 qemu-kvm

SELinux label for the QEMU binary file:

    ()[root@overcloud-0-novacompute-0 /]# ls -lZ /usr/libexec/qemu-kvm 
    -rwxr-xr-x. 1 root root system_u:object_r:container_ro_file_t:s0 16356584 Apr  6 20:47 /usr/libexec/qemu-kvm

SELinux label for the libvirtd process:

    ()[root@overcloud-0-novacompute-0 /]# ps -eZ | grep libvirtd
    system_u:system_r:spc_t:s0       209874 ?        00:00:01 libvirtd

SELinux label for the libvirtd binary file:

    ()[root@overcloud-0-novacompute-0 /]# ls -lZ /usr/sbin/libvirtd
    -rwxr-xr-x. 1 root root system_u:object_r:container_ro_file_t:s0 618304 Dec 20 01:11 /usr/sbin/libvirtd

Comment 38 Cédric Jeanneret 2020-09-11 08:13:27 UTC
Probably even CURRENT_RELEASE.

Comment 39 Anthony Green 2020-10-07 19:05:50 UTC
For what it's worth, I had this problem and was able to solve it by pulling in the newer podman.
OSP 16.1 appears to be incompatible with 8.2, so I force 8.1 and then cherry pick the 8.2 podman.  Here's what my personal all-in-one installer playbook has:


    - name: We need podman from 8.2, as suggested https://bugzilla.redhat.com/show_bug.cgi?id=1841822
      command: subscription-manager release --set=8.2

    - name: Pull podman from 8.2
      dnf:
        name: "podman"
        state: latest

    - name: Go back to 8.1
      command: subscription-manager release --set=8.1

Comment 40 Cédric Jeanneret 2020-10-08 05:13:03 UTC
Hello Anthony,

Well, osp-16.1 is to be shipped on rhel-8.2 - I do hope it is compatible.... Care to create relevant BZ for the issues you're facing?

Cheers,

C.

Comment 41 Cédric Jeanneret 2020-10-20 09:23:01 UTC
osp-15 is EOL (and apparently no ELS either), so this issue isn't relevant anymore:
https://access.redhat.com/support/policy/updates/openstack/platform/


Note You need to log in before you can comment on or make changes to this bug.