2279464 – Instance creation with vTPM fails after restarting nova_virtqemud due to SELinux permission issue

Bug 2279464 - Instance creation with vTPM fails after restarting nova_virtqemud due to SELinux permission issue

Summary: Instance creation with vTPM fails after restarting nova_virtqemud due to SELi...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	17.1 (Wallaby)
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	z4
Target Release:	17.1
Assignee:	OSP Team
QA Contact:	James Parker
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	2250047 2331316 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-05-07 05:49 UTC by yatanaka
Modified:	2024-12-11 13:15 UTC (History)
CC List:	13 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-14.3.1-17.1.20240919130751.e7c7ce3.el9ost
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-11-21 09:30:22 UTC
Target Upstream Version:
Embargoed:
Flags:	bdobreli: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OSP-32037	None	None	None	2024-05-07 05:52:51 UTC
Red Hat Knowledge Base (Solution)	7068629	None	None	None	2024-06-28 13:04:51 UTC
Red Hat Product Errata	RHSA-2024:9978	None	None	None	2024-11-21 09:30:25 UTC

Description yatanaka 2024-05-07 05:49:20 UTC

Description of problem:

I deployed overcloud according to the following document.

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html/configuring_the_compute_service_for_instance_creation/assembly_configuring-instance-security_vgpu#assembly_configuring-compute-nodes-to-provide-emulated-TPM-devices-for-instances_TPM

Then I rebooted the compute node.
After rebooting the compute node, I tried to create an instance with vTPM, but it failed due to a SELinux issue.

~~~
(central) [stack@undercloud ~]$  openstack server create --network yatanaka_network0 --image cirros-0.6.2 --flavor vtpm-flavor cirros_vtpm --host central-novacompute-1.yatanaka.example.com --os-compute-api-version 2.74

[root@central-novacompute-1 ~]# vi /var/log/containers/nova/nova-compute.log
2024-05-07 13:51:43.079 2 ERROR nova.compute.manager [req-e48b5e0f-2ced-41eb-9f14-58c8f19444d3 7dbca1b3b5d54daf96000e422f3acfda 7309ecd94e5245be928ef9e4c4ea83dc - default default] [instance: c4f226d7-03fa-4a12-bb89-22b2140c9983] Failed to build and run instance: libvirt.libvirtError: operation failed: swtpm died and reported:

  ====> Instance creation fails because swtpm is not running.

[root@central-novacompute-1 ~]# grep swtpm /var/log/audit/audit.log|grep AVC
type=AVC msg=audit(1715057384.359:465): avc:  denied  { write } for  pid=4861 comm="swtpm" path="/run/libvirt/qemu/swtpm/1-instance-00000054-swtpm.pid" dev="tmpfs" ino=2705 scontext=system_u:system_r:svirt_t:s0:c135,c269 tcontext=system_u:object_r:container_ro_file_t:s0 tclass=file permissive=0
type=AVC msg=audit(1715057384.360:466): avc:  denied  { write } for  pid=4861 comm="swtpm" name="swtpm" dev="tmpfs" ino=2704 scontext=system_u:system_r:svirt_t:s0:c135,c269 tcontext=system_u:object_r:container_ro_file_t:s0 tclass=dir permissive=0

  ====> The reason why swtpm couldn't start was the above SELinux error.
  ====> Because /run/libvirt/qemu/swtpm/1-instance-00000054-swtpm.pid is container_ro_file_t, swtpm cannot write to the file.

[root@central-novacompute-1 ~]# podman exec -it nova_virtqemud  mount |grep run
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,size=9812140k,nr_inodes=819200,mode=755,inode64)
none on /run/credentials/systemd-tmpfiles-setup-dev.service type ramfs (ro,nosuid,nodev,noexec,relatime,seclabel,mode=700)
none on /run/credentials/systemd-sysctl.service type ramfs (ro,nosuid,nodev,noexec,relatime,seclabel,mode=700)
none on /run/credentials/systemd-tmpfiles-setup.service type ramfs (ro,nosuid,nodev,noexec,relatime,seclabel,mode=700)
tmpfs on /run/netns type tmpfs (rw,nosuid,nodev,seclabel,size=9812140k,nr_inodes=819200,mode=755,inode64)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=3228348k,nr_inodes=807087,mode=700,inode64)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=3228348k,nr_inodes=807087,mode=700,uid=1000,gid=1000,inode64)
tmpfs on /run/systemd/journal/dev-log type tmpfs (rw,nosuid,nodev,seclabel,size=9812140k,nr_inodes=819200,mode=755,inode64)
tmpfs on /run/libvirt type tmpfs (rw,nosuid,nodev,seclabel,size=9812140k,nr_inodes=819200,mode=755,inode64)
tmpfs on /run/secrets type tmpfs (rw,seclabel,size=9812140k,nr_inodes=819200,mode=755,inode64)

[root@central-novacompute-1 ~]# podman exec -it nova_virtqemud  ls -lZd /run/libvirt/qemu/
drwxr-xr-x. 7 qemu qemu system_u:object_r:container_ro_file_t:s0 180 May  7 13:51 /run/libvirt/qemu/

  ===> I can see that the SELinux label of /run/libvirt/qemu/ is container_ro_file_t
~~~

If I restart tripleo_nova_virtlogd_wrapper.service, the SELinux context of /run/libvirt/qemu/ is changed to container_file_t and instance creation works.

~~~
[root@central-novacompute-1 ~]# systemctl restart tripleo_nova_virtlogd_wrapper.service

[root@central-novacompute-1 ~]# podman exec -it nova_virtqemud  ls -lZd /run/libvirt/qemu/
drwxr-xr-x. 7 qemu qemu system_u:object_r:container_file_t:s0 180 May  7 13:51 /run/libvirt/qemu/

(central) [stack@undercloud ~]$  openstack server create --network yatanaka_network0 --image cirros-0.6.2 --flavor vtpm-flavor cirros_vtpm --host central-novacompute-1.yatanaka.example.com --os-compute-api-version 2.74

(central) [stack@undercloud ~]$ openstack server list 
+--------------------------------------+-----------------------------------+---------+----------------------------------------------------+--------------------------+----------------+
| ID                                   | Name                              | Status  | Networks                                           | Image                    | Flavor         |
+--------------------------------------+-----------------------------------+---------+----------------------------------------------------+--------------------------+----------------+
| a28a531d-7ea2-4e03-ad8b-c81cde5f3a46 | cirros_vtpm                       | ACTIVE  | yatanaka_network0=192.168.0.235                    | cirros-0.6.2             | vtpm-flavor    |

[root@central-novacompute-1 ~]# podman exec -it nova_virtqemud  ls -laZ /run/libvirt/qemu/swtpm/
total 4
drwxrwx---. 2 qemu tss  system_u:object_r:container_file_t:s0         80 May  7 14:46 .
drwxr-xr-x. 7 qemu qemu system_u:object_r:container_file_t:s0        220 May  7 14:46 ..
-rw-r--r--. 1 root root system_u:object_r:container_file_t:s0          4 May  7 14:46 1-instance-00000060-swtpm.pid
srw-------. 1 qemu qemu system_u:object_r:svirt_image_t:s0:c728,c755   0 May  7 14:46 1-instance-00000060-swtpm.sock
~~~

If we restart nova_virtqemud, the SELinux context becomes container_ro_file_t again and instance creation fails.

~~~
[root@central-novacompute-1 ~]# systemctl restart tripleo_nova_virtqemud.service

[root@central-novacompute-1 ~]# podman exec -it nova_virtqemud  ls -laZ /run/libvirt/qemu/swtpm/
total 4
drwxrwx---. 2 qemu tss  system_u:object_r:container_ro_file_t:s0  80 May  7 13:58 .
drwxr-xr-x. 7 qemu qemu system_u:object_r:container_ro_file_t:s0 220 May  7 14:04 ..
-rw-r--r--. 1 root root system_u:object_r:container_ro_file_t:s0   4 May  7 13:58 2-instance-0000005a-swtpm.pid
srw-------. 1 qemu qemu system_u:object_r:container_ro_file_t:s0   0 May  7 13:58 2-instance-0000005a-swtpm.sock
~~~

It seems that the start of the nova_virtqemud container is the trigger of the change of the SELinux label.
But I couldn't find the root cause of the issue.
I'm wondering if this is RHEL (kernel, podman) issue or RHOSP issue.


Version-Release number of selected component (if applicable):
RHOSP 17.1.2

How reproducible:
Steps to Reproduce:
1. Deploy overcloud according to https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html/configuring_the_compute_service_for_instance_creation/assembly_configuring-instance-security_vgpu#assembly_configuring-compute-nodes-to-provide-emulated-TPM-devices-for-instances_TPM
2. Create an instance with vTPM. This succeeds.
3. Reboot the compute node and create a new instance with vTPM on the compute node. This fails.
4. Restart tripleo_nova_virtlogd_wrapper.service and create a instance with vTPM. This succeeds.
5. Restart tripleo_nova_virtqemud.service and create an instance with vTPM. This fails.


Actual results:
Instance creation fails.
SELinux label of /run/libvirt/qemu/swtpm/X-instance-XXXXXXXXXX-swtpm.pid is changed to container_ro_file_t when nova_virtqemud starts.


Expected results:
Instance creation succeeds.
SELinux label of /run/libvirt/qemu/swtpm/X-instance-XXXXXXXXXX-swtpm.pid is container_file_t always.


Additional info:
I found the following BZs which mentions the container_ro_file_t label, but they sounds a bit difference from this issue.
- https://bugzilla.redhat.com/show_bug.cgi?id=2122239
- https://bugzilla.redhat.com/show_bug.cgi?id=2219795

Comment 1 Kashyap Chamarthy 2024-05-08 13:35:05 UTC

Hi,

So as you see it's all about /var/log/swtpm location inside the container is labelled incorrectly as "container_ro_file_t"

It should be: "container_file_t"

I think this is already solved in some update.  I'm Ccing a couple of colleagues (Julie and Bogdan) who might know which update it is.


Also, this was previously discussed extensively in this bug that I filed in the past:

https://bugzilla.redhat.com/show_bug.cgi?id=2093956 -- 'swtpm' binary is denied write/"append" permissions to log files under /var/log/swtpm/

Comment 2 Julie Pichon 2024-05-08 14:26:31 UTC

From the SELinux side, as investigated in the other bug Kashyap linked, there isn't much that can be done since svirt_t can already write to container_file_t [1] and as indicated in the description here, once the label is correct the instance can start. It looks like an ordering issue based on which container starts first... A deployment SME should be able to confirm what is going on, and if there was a related patch.

[1] https://github.com/redhat-openstack/openstack-selinux/commit/61b604b10af6315bb570b71776b8ccdec884222

Comment 8 Bogdan Dobrelya 2024-05-09 11:59:38 UTC

Please try the pushed fix 451875

Comment 20 parthee 2024-08-08 09:19:10 UTC

Hello Kashyap & Bogdan,

Thanks for the follow-up.

I assuming that my customer should be good if the fix is included in 17.1.4 release because they haven't responded since 5th July and support case is auto-closed on 15th July.

Regards,
Partheeban

Comment 21 Yadnesh Kulkarni 2024-08-21 05:44:11 UTC

*** Bug 2250047 has been marked as a duplicate of this bug. ***

Comment 33 errata-xmlrpc 2024-11-21 09:30:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHOSP 17.1.4 (openstack-tripleo-heat-templates) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:9978

Comment 34 Bogdan Dobrelya 2024-12-11 13:15:10 UTC

*** Bug 2331316 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.