Bug 2279464

Summary: Instance creation with vTPM fails after restarting nova_virtqemud due to SELinux permission issue
Product: Red Hat OpenStack Reporter: yatanaka
Component: openstack-tripleo-heat-templatesAssignee: OSP Team <rhos-maint>
Status: CLOSED ERRATA QA Contact: James Parker <jparker>
Severity: high Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: alifshit, astupnik, bdobreli, bshephar, dhill, jparker, jpichon, kchamart, mariel, mburns, mrunge, parthee, pgodwin
Target Milestone: z4Keywords: Triaged, ZStream
Target Release: 17.1Flags: bdobreli: needinfo-
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-14.3.1-17.1.20240919130751.e7c7ce3.el9ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-11-21 09:30:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description yatanaka 2024-05-07 05:49:20 UTC
Description of problem:

I deployed overcloud according to the following document.

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html/configuring_the_compute_service_for_instance_creation/assembly_configuring-instance-security_vgpu#assembly_configuring-compute-nodes-to-provide-emulated-TPM-devices-for-instances_TPM

Then I rebooted the compute node.
After rebooting the compute node, I tried to create an instance with vTPM, but it failed due to a SELinux issue.

~~~
(central) [stack@undercloud ~]$  openstack server create --network yatanaka_network0 --image cirros-0.6.2 --flavor vtpm-flavor cirros_vtpm --host central-novacompute-1.yatanaka.example.com --os-compute-api-version 2.74

[root@central-novacompute-1 ~]# vi /var/log/containers/nova/nova-compute.log
2024-05-07 13:51:43.079 2 ERROR nova.compute.manager [req-e48b5e0f-2ced-41eb-9f14-58c8f19444d3 7dbca1b3b5d54daf96000e422f3acfda 7309ecd94e5245be928ef9e4c4ea83dc - default default] [instance: c4f226d7-03fa-4a12-bb89-22b2140c9983] Failed to build and run instance: libvirt.libvirtError: operation failed: swtpm died and reported:

  ====> Instance creation fails because swtpm is not running.

[root@central-novacompute-1 ~]# grep swtpm /var/log/audit/audit.log|grep AVC
type=AVC msg=audit(1715057384.359:465): avc:  denied  { write } for  pid=4861 comm="swtpm" path="/run/libvirt/qemu/swtpm/1-instance-00000054-swtpm.pid" dev="tmpfs" ino=2705 scontext=system_u:system_r:svirt_t:s0:c135,c269 tcontext=system_u:object_r:container_ro_file_t:s0 tclass=file permissive=0
type=AVC msg=audit(1715057384.360:466): avc:  denied  { write } for  pid=4861 comm="swtpm" name="swtpm" dev="tmpfs" ino=2704 scontext=system_u:system_r:svirt_t:s0:c135,c269 tcontext=system_u:object_r:container_ro_file_t:s0 tclass=dir permissive=0

  ====> The reason why swtpm couldn't start was the above SELinux error.
  ====> Because /run/libvirt/qemu/swtpm/1-instance-00000054-swtpm.pid is container_ro_file_t, swtpm cannot write to the file.

[root@central-novacompute-1 ~]# podman exec -it nova_virtqemud  mount |grep run
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,size=9812140k,nr_inodes=819200,mode=755,inode64)
none on /run/credentials/systemd-tmpfiles-setup-dev.service type ramfs (ro,nosuid,nodev,noexec,relatime,seclabel,mode=700)
none on /run/credentials/systemd-sysctl.service type ramfs (ro,nosuid,nodev,noexec,relatime,seclabel,mode=700)
none on /run/credentials/systemd-tmpfiles-setup.service type ramfs (ro,nosuid,nodev,noexec,relatime,seclabel,mode=700)
tmpfs on /run/netns type tmpfs (rw,nosuid,nodev,seclabel,size=9812140k,nr_inodes=819200,mode=755,inode64)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=3228348k,nr_inodes=807087,mode=700,inode64)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=3228348k,nr_inodes=807087,mode=700,uid=1000,gid=1000,inode64)
tmpfs on /run/systemd/journal/dev-log type tmpfs (rw,nosuid,nodev,seclabel,size=9812140k,nr_inodes=819200,mode=755,inode64)
tmpfs on /run/libvirt type tmpfs (rw,nosuid,nodev,seclabel,size=9812140k,nr_inodes=819200,mode=755,inode64)
tmpfs on /run/secrets type tmpfs (rw,seclabel,size=9812140k,nr_inodes=819200,mode=755,inode64)

[root@central-novacompute-1 ~]# podman exec -it nova_virtqemud  ls -lZd /run/libvirt/qemu/
drwxr-xr-x. 7 qemu qemu system_u:object_r:container_ro_file_t:s0 180 May  7 13:51 /run/libvirt/qemu/

  ===> I can see that the SELinux label of /run/libvirt/qemu/ is container_ro_file_t
~~~

If I restart tripleo_nova_virtlogd_wrapper.service, the SELinux context of /run/libvirt/qemu/ is changed to container_file_t and instance creation works.

~~~
[root@central-novacompute-1 ~]# systemctl restart tripleo_nova_virtlogd_wrapper.service

[root@central-novacompute-1 ~]# podman exec -it nova_virtqemud  ls -lZd /run/libvirt/qemu/
drwxr-xr-x. 7 qemu qemu system_u:object_r:container_file_t:s0 180 May  7 13:51 /run/libvirt/qemu/

(central) [stack@undercloud ~]$  openstack server create --network yatanaka_network0 --image cirros-0.6.2 --flavor vtpm-flavor cirros_vtpm --host central-novacompute-1.yatanaka.example.com --os-compute-api-version 2.74

(central) [stack@undercloud ~]$ openstack server list 
+--------------------------------------+-----------------------------------+---------+----------------------------------------------------+--------------------------+----------------+
| ID                                   | Name                              | Status  | Networks                                           | Image                    | Flavor         |
+--------------------------------------+-----------------------------------+---------+----------------------------------------------------+--------------------------+----------------+
| a28a531d-7ea2-4e03-ad8b-c81cde5f3a46 | cirros_vtpm                       | ACTIVE  | yatanaka_network0=192.168.0.235                    | cirros-0.6.2             | vtpm-flavor    |

[root@central-novacompute-1 ~]# podman exec -it nova_virtqemud  ls -laZ /run/libvirt/qemu/swtpm/
total 4
drwxrwx---. 2 qemu tss  system_u:object_r:container_file_t:s0         80 May  7 14:46 .
drwxr-xr-x. 7 qemu qemu system_u:object_r:container_file_t:s0        220 May  7 14:46 ..
-rw-r--r--. 1 root root system_u:object_r:container_file_t:s0          4 May  7 14:46 1-instance-00000060-swtpm.pid
srw-------. 1 qemu qemu system_u:object_r:svirt_image_t:s0:c728,c755   0 May  7 14:46 1-instance-00000060-swtpm.sock
~~~

If we restart nova_virtqemud, the SELinux context becomes container_ro_file_t again and instance creation fails.

~~~
[root@central-novacompute-1 ~]# systemctl restart tripleo_nova_virtqemud.service

[root@central-novacompute-1 ~]# podman exec -it nova_virtqemud  ls -laZ /run/libvirt/qemu/swtpm/
total 4
drwxrwx---. 2 qemu tss  system_u:object_r:container_ro_file_t:s0  80 May  7 13:58 .
drwxr-xr-x. 7 qemu qemu system_u:object_r:container_ro_file_t:s0 220 May  7 14:04 ..
-rw-r--r--. 1 root root system_u:object_r:container_ro_file_t:s0   4 May  7 13:58 2-instance-0000005a-swtpm.pid
srw-------. 1 qemu qemu system_u:object_r:container_ro_file_t:s0   0 May  7 13:58 2-instance-0000005a-swtpm.sock
~~~

It seems that the start of the nova_virtqemud container is the trigger of the change of the SELinux label.
But I couldn't find the root cause of the issue.
I'm wondering if this is RHEL (kernel, podman) issue or RHOSP issue.


Version-Release number of selected component (if applicable):
RHOSP 17.1.2

How reproducible:
Steps to Reproduce:
1. Deploy overcloud according to https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html/configuring_the_compute_service_for_instance_creation/assembly_configuring-instance-security_vgpu#assembly_configuring-compute-nodes-to-provide-emulated-TPM-devices-for-instances_TPM
2. Create an instance with vTPM. This succeeds.
3. Reboot the compute node and create a new instance with vTPM on the compute node. This fails.
4. Restart tripleo_nova_virtlogd_wrapper.service and create a instance with vTPM. This succeeds.
5. Restart tripleo_nova_virtqemud.service and create an instance with vTPM. This fails.


Actual results:
Instance creation fails.
SELinux label of /run/libvirt/qemu/swtpm/X-instance-XXXXXXXXXX-swtpm.pid is changed to container_ro_file_t when nova_virtqemud starts.


Expected results:
Instance creation succeeds.
SELinux label of /run/libvirt/qemu/swtpm/X-instance-XXXXXXXXXX-swtpm.pid is container_file_t always.


Additional info:
I found the following BZs which mentions the container_ro_file_t label, but they sounds a bit difference from this issue.
- https://bugzilla.redhat.com/show_bug.cgi?id=2122239
- https://bugzilla.redhat.com/show_bug.cgi?id=2219795

Comment 1 Kashyap Chamarthy 2024-05-08 13:35:05 UTC
Hi,

So as you see it's all about /var/log/swtpm location inside the container is labelled incorrectly as "container_ro_file_t"

It should be: "container_file_t"

I think this is already solved in some update.  I'm Ccing a couple of colleagues (Julie and Bogdan) who might know which update it is.


Also, this was previously discussed extensively in this bug that I filed in the past:

https://bugzilla.redhat.com/show_bug.cgi?id=2093956 -- 'swtpm' binary is denied write/"append" permissions to log files under /var/log/swtpm/

Comment 2 Julie Pichon 2024-05-08 14:26:31 UTC
From the SELinux side, as investigated in the other bug Kashyap linked, there isn't much that can be done since svirt_t can already write to container_file_t [1] and as indicated in the description here, once the label is correct the instance can start. It looks like an ordering issue based on which container starts first... A deployment SME should be able to confirm what is going on, and if there was a related patch.

[1] https://github.com/redhat-openstack/openstack-selinux/commit/61b604b10af6315bb570b71776b8ccdec884222

Comment 8 Bogdan Dobrelya 2024-05-09 11:59:38 UTC
Please try the pushed fix 451875

Comment 20 parthee 2024-08-08 09:19:10 UTC
Hello Kashyap & Bogdan,

Thanks for the follow-up.

I assuming that my customer should be good if the fix is included in 17.1.4 release because they haven't responded since 5th July and support case is auto-closed on 15th July.

Regards,
Partheeban

Comment 21 Yadnesh Kulkarni 2024-08-21 05:44:11 UTC
*** Bug 2250047 has been marked as a duplicate of this bug. ***

Comment 33 errata-xmlrpc 2024-11-21 09:30:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHOSP 17.1.4 (openstack-tripleo-heat-templates) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:9978

Comment 34 Bogdan Dobrelya 2024-12-11 13:15:10 UTC
*** Bug 2331316 has been marked as a duplicate of this bug. ***