Bug 2279464 - Instance creation with vTPM fails after restarting nova_virtqemud due to SELinux permission issue
Summary: Instance creation with vTPM fails after restarting nova_virtqemud due to SELi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 17.1 (Wallaby)
Hardware: All
OS: All
high
high
Target Milestone: z4
: 17.1
Assignee: OSP Team
QA Contact: James Parker
URL:
Whiteboard:
: 2250047 2331316 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-05-07 05:49 UTC by yatanaka
Modified: 2024-12-11 13:15 UTC (History)
13 users (show)

Fixed In Version: openstack-tripleo-heat-templates-14.3.1-17.1.20240919130751.e7c7ce3.el9ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-11-21 09:30:22 UTC
Target Upstream Version:
Embargoed:
bdobreli: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-32037 0 None None None 2024-05-07 05:52:51 UTC
Red Hat Knowledge Base (Solution) 7068629 0 None None None 2024-06-28 13:04:51 UTC
Red Hat Product Errata RHSA-2024:9978 0 None None None 2024-11-21 09:30:25 UTC

Description yatanaka 2024-05-07 05:49:20 UTC
Description of problem:

I deployed overcloud according to the following document.

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html/configuring_the_compute_service_for_instance_creation/assembly_configuring-instance-security_vgpu#assembly_configuring-compute-nodes-to-provide-emulated-TPM-devices-for-instances_TPM

Then I rebooted the compute node.
After rebooting the compute node, I tried to create an instance with vTPM, but it failed due to a SELinux issue.

~~~
(central) [stack@undercloud ~]$  openstack server create --network yatanaka_network0 --image cirros-0.6.2 --flavor vtpm-flavor cirros_vtpm --host central-novacompute-1.yatanaka.example.com --os-compute-api-version 2.74

[root@central-novacompute-1 ~]# vi /var/log/containers/nova/nova-compute.log
2024-05-07 13:51:43.079 2 ERROR nova.compute.manager [req-e48b5e0f-2ced-41eb-9f14-58c8f19444d3 7dbca1b3b5d54daf96000e422f3acfda 7309ecd94e5245be928ef9e4c4ea83dc - default default] [instance: c4f226d7-03fa-4a12-bb89-22b2140c9983] Failed to build and run instance: libvirt.libvirtError: operation failed: swtpm died and reported:

  ====> Instance creation fails because swtpm is not running.

[root@central-novacompute-1 ~]# grep swtpm /var/log/audit/audit.log|grep AVC
type=AVC msg=audit(1715057384.359:465): avc:  denied  { write } for  pid=4861 comm="swtpm" path="/run/libvirt/qemu/swtpm/1-instance-00000054-swtpm.pid" dev="tmpfs" ino=2705 scontext=system_u:system_r:svirt_t:s0:c135,c269 tcontext=system_u:object_r:container_ro_file_t:s0 tclass=file permissive=0
type=AVC msg=audit(1715057384.360:466): avc:  denied  { write } for  pid=4861 comm="swtpm" name="swtpm" dev="tmpfs" ino=2704 scontext=system_u:system_r:svirt_t:s0:c135,c269 tcontext=system_u:object_r:container_ro_file_t:s0 tclass=dir permissive=0

  ====> The reason why swtpm couldn't start was the above SELinux error.
  ====> Because /run/libvirt/qemu/swtpm/1-instance-00000054-swtpm.pid is container_ro_file_t, swtpm cannot write to the file.

[root@central-novacompute-1 ~]# podman exec -it nova_virtqemud  mount |grep run
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,size=9812140k,nr_inodes=819200,mode=755,inode64)
none on /run/credentials/systemd-tmpfiles-setup-dev.service type ramfs (ro,nosuid,nodev,noexec,relatime,seclabel,mode=700)
none on /run/credentials/systemd-sysctl.service type ramfs (ro,nosuid,nodev,noexec,relatime,seclabel,mode=700)
none on /run/credentials/systemd-tmpfiles-setup.service type ramfs (ro,nosuid,nodev,noexec,relatime,seclabel,mode=700)
tmpfs on /run/netns type tmpfs (rw,nosuid,nodev,seclabel,size=9812140k,nr_inodes=819200,mode=755,inode64)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=3228348k,nr_inodes=807087,mode=700,inode64)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=3228348k,nr_inodes=807087,mode=700,uid=1000,gid=1000,inode64)
tmpfs on /run/systemd/journal/dev-log type tmpfs (rw,nosuid,nodev,seclabel,size=9812140k,nr_inodes=819200,mode=755,inode64)
tmpfs on /run/libvirt type tmpfs (rw,nosuid,nodev,seclabel,size=9812140k,nr_inodes=819200,mode=755,inode64)
tmpfs on /run/secrets type tmpfs (rw,seclabel,size=9812140k,nr_inodes=819200,mode=755,inode64)

[root@central-novacompute-1 ~]# podman exec -it nova_virtqemud  ls -lZd /run/libvirt/qemu/
drwxr-xr-x. 7 qemu qemu system_u:object_r:container_ro_file_t:s0 180 May  7 13:51 /run/libvirt/qemu/

  ===> I can see that the SELinux label of /run/libvirt/qemu/ is container_ro_file_t
~~~

If I restart tripleo_nova_virtlogd_wrapper.service, the SELinux context of /run/libvirt/qemu/ is changed to container_file_t and instance creation works.

~~~
[root@central-novacompute-1 ~]# systemctl restart tripleo_nova_virtlogd_wrapper.service

[root@central-novacompute-1 ~]# podman exec -it nova_virtqemud  ls -lZd /run/libvirt/qemu/
drwxr-xr-x. 7 qemu qemu system_u:object_r:container_file_t:s0 180 May  7 13:51 /run/libvirt/qemu/

(central) [stack@undercloud ~]$  openstack server create --network yatanaka_network0 --image cirros-0.6.2 --flavor vtpm-flavor cirros_vtpm --host central-novacompute-1.yatanaka.example.com --os-compute-api-version 2.74

(central) [stack@undercloud ~]$ openstack server list 
+--------------------------------------+-----------------------------------+---------+----------------------------------------------------+--------------------------+----------------+
| ID                                   | Name                              | Status  | Networks                                           | Image                    | Flavor         |
+--------------------------------------+-----------------------------------+---------+----------------------------------------------------+--------------------------+----------------+
| a28a531d-7ea2-4e03-ad8b-c81cde5f3a46 | cirros_vtpm                       | ACTIVE  | yatanaka_network0=192.168.0.235                    | cirros-0.6.2             | vtpm-flavor    |

[root@central-novacompute-1 ~]# podman exec -it nova_virtqemud  ls -laZ /run/libvirt/qemu/swtpm/
total 4
drwxrwx---. 2 qemu tss  system_u:object_r:container_file_t:s0         80 May  7 14:46 .
drwxr-xr-x. 7 qemu qemu system_u:object_r:container_file_t:s0        220 May  7 14:46 ..
-rw-r--r--. 1 root root system_u:object_r:container_file_t:s0          4 May  7 14:46 1-instance-00000060-swtpm.pid
srw-------. 1 qemu qemu system_u:object_r:svirt_image_t:s0:c728,c755   0 May  7 14:46 1-instance-00000060-swtpm.sock
~~~

If we restart nova_virtqemud, the SELinux context becomes container_ro_file_t again and instance creation fails.

~~~
[root@central-novacompute-1 ~]# systemctl restart tripleo_nova_virtqemud.service

[root@central-novacompute-1 ~]# podman exec -it nova_virtqemud  ls -laZ /run/libvirt/qemu/swtpm/
total 4
drwxrwx---. 2 qemu tss  system_u:object_r:container_ro_file_t:s0  80 May  7 13:58 .
drwxr-xr-x. 7 qemu qemu system_u:object_r:container_ro_file_t:s0 220 May  7 14:04 ..
-rw-r--r--. 1 root root system_u:object_r:container_ro_file_t:s0   4 May  7 13:58 2-instance-0000005a-swtpm.pid
srw-------. 1 qemu qemu system_u:object_r:container_ro_file_t:s0   0 May  7 13:58 2-instance-0000005a-swtpm.sock
~~~

It seems that the start of the nova_virtqemud container is the trigger of the change of the SELinux label.
But I couldn't find the root cause of the issue.
I'm wondering if this is RHEL (kernel, podman) issue or RHOSP issue.


Version-Release number of selected component (if applicable):
RHOSP 17.1.2

How reproducible:
Steps to Reproduce:
1. Deploy overcloud according to https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html/configuring_the_compute_service_for_instance_creation/assembly_configuring-instance-security_vgpu#assembly_configuring-compute-nodes-to-provide-emulated-TPM-devices-for-instances_TPM
2. Create an instance with vTPM. This succeeds.
3. Reboot the compute node and create a new instance with vTPM on the compute node. This fails.
4. Restart tripleo_nova_virtlogd_wrapper.service and create a instance with vTPM. This succeeds.
5. Restart tripleo_nova_virtqemud.service and create an instance with vTPM. This fails.


Actual results:
Instance creation fails.
SELinux label of /run/libvirt/qemu/swtpm/X-instance-XXXXXXXXXX-swtpm.pid is changed to container_ro_file_t when nova_virtqemud starts.


Expected results:
Instance creation succeeds.
SELinux label of /run/libvirt/qemu/swtpm/X-instance-XXXXXXXXXX-swtpm.pid is container_file_t always.


Additional info:
I found the following BZs which mentions the container_ro_file_t label, but they sounds a bit difference from this issue.
- https://bugzilla.redhat.com/show_bug.cgi?id=2122239
- https://bugzilla.redhat.com/show_bug.cgi?id=2219795

Comment 1 Kashyap Chamarthy 2024-05-08 13:35:05 UTC
Hi,

So as you see it's all about /var/log/swtpm location inside the container is labelled incorrectly as "container_ro_file_t"

It should be: "container_file_t"

I think this is already solved in some update.  I'm Ccing a couple of colleagues (Julie and Bogdan) who might know which update it is.


Also, this was previously discussed extensively in this bug that I filed in the past:

https://bugzilla.redhat.com/show_bug.cgi?id=2093956 -- 'swtpm' binary is denied write/"append" permissions to log files under /var/log/swtpm/

Comment 2 Julie Pichon 2024-05-08 14:26:31 UTC
From the SELinux side, as investigated in the other bug Kashyap linked, there isn't much that can be done since svirt_t can already write to container_file_t [1] and as indicated in the description here, once the label is correct the instance can start. It looks like an ordering issue based on which container starts first... A deployment SME should be able to confirm what is going on, and if there was a related patch.

[1] https://github.com/redhat-openstack/openstack-selinux/commit/61b604b10af6315bb570b71776b8ccdec884222

Comment 8 Bogdan Dobrelya 2024-05-09 11:59:38 UTC
Please try the pushed fix 451875

Comment 20 parthee 2024-08-08 09:19:10 UTC
Hello Kashyap & Bogdan,

Thanks for the follow-up.

I assuming that my customer should be good if the fix is included in 17.1.4 release because they haven't responded since 5th July and support case is auto-closed on 15th July.

Regards,
Partheeban

Comment 21 Yadnesh Kulkarni 2024-08-21 05:44:11 UTC
*** Bug 2250047 has been marked as a duplicate of this bug. ***

Comment 33 errata-xmlrpc 2024-11-21 09:30:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHOSP 17.1.4 (openstack-tripleo-heat-templates) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:9978

Comment 34 Bogdan Dobrelya 2024-12-11 13:15:10 UTC
*** Bug 2331316 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.