Bug 2123209

Summary: CNV runs non-root VMs by default which removes cap_sys_nice from the launchers and caused the real time VM failed to boot up
Product: Container Native Virtualization (CNV) Reporter: Gu Nini <ngu>
Component: VirtualizationAssignee: Jordi Gil <jgil>
Status: CLOSED ERRATA QA Contact: Denys Shchedrivyi <dshchedr>
Severity: high Docs Contact:
Priority: high    
Version: 4.11.1CC: acardace, danken, jgil, kbidarka, lijin, lpivarc, mtosatti, ocohen, sgott, vromanso
Target Milestone: ---   
Target Release: 4.13.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: hco-bundle-registry-container-v4.13.0.rhel9-1639 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-18 02:55:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
virt-handler.log none

Description Gu Nini 2022-09-01 04:19:10 UTC
Created attachment 1908863 [details]
virt-handler.log

Description of problem:
After the env upgrade from OCP4.10.26 to OCP4.11.1/OCP-V4.11.0, the real time VM can't boot up since it's run as non-root VM by default. Please check the attached 'virt-handler.log' for details. 

Conclude what Vladik Romanovsky said about the root cause of the issue as follows:

'''
CNV runs non-root VMs by default now, this removes cap_sys_nice from the launchers. The problem is that CNV makes this switch before upstream KubeVirt did: https://github.com/kubevirt/kubevirt/blob/782b82aff8adc516d98421466ab9e43835efb89c/pkg/virt-controller/services/rendercontainer.go#L244
'''


Version-Release number of selected component (if applicable):
OpenShift Virtualization: 4.11.0
Openshift: 4.11.1

How reproducible:
100%

Steps to Reproduce:
1. Upgrade env to OCP4.11.1/OCP-V4.11.0
2. Try to boot up a real time VM created formerly
3.

Actual results:
It's found the VM failed to boot up successfully.

Expected results:
The VM could boot up without issue.

Additional info:

Comment 1 sgott 2022-09-01 12:42:13 UTC
*** Bug 2123207 has been marked as a duplicate of this bug. ***

Comment 4 Jordi Gil 2022-11-09 17:37:36 UTC
PR raised to address this issue.
https://github.com/kubevirt/kubevirt/pull/8750

Comment 5 Jordi Gil 2022-12-02 14:43:51 UTC
PR has been merged.

Comment 9 Denys Shchedrivyi 2023-03-27 22:39:48 UTC
Verified on v4.13.0.rhel9-1834, VM with RT kernel can successfully run:

Steps:
1) set worker-rt label on one of worker nodes
2) create MCP pointed to worker-rt node
3) create PerformanceProfile which enables RT kernel
4) wait for MCP to complete update

> $ oc get mcp
> NAME        CONFIG                                                UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
> worker-rt   rendered-worker-rt-a560b74067dfdce8a670390145f51439   True      False      False      1              1                   1                     0                      129m


 node switched to rt kernel
>  oc get node -o wide
> NAME                                STATUS   ROLES                  AGE     VERSION           INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                         CONTAINER-RUNTIME
> virt-den-413-l7hvr-worker-0-9r8gw   Ready    worker,worker-rt       6h37m   v1.26.2+dc93b13   192.168.2.82    <none>        Red Hat Enterprise Linux CoreOS 413.92.202303221220-0 (Plow)   5.14.0-284.1.1.rt14.286.el9_2.x86_64   cri-o://1.26.1-10.rhaos4.13.gitcb86088.el9

5) create vm with realtime kernel (the doc from comment #7):
> $ oc get vm -A
> NAMESPACE   NAME              AGE   STATUS    READY
> test-rt     fedora-realtime   59m   Running   True

> $ oc get pod
> NAME                                  READY   STATUS    RESTARTS   AGE
> virt-launcher-fedora-realtime-75lfr   2/2     Running   0          59m

> [fedora@fedora-realtime ~]$ uname -r
> 5.6.19-300.rt10.2.fc32.ccrma.x86_64+rt

Comment 12 errata-xmlrpc 2023-05-18 02:55:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.13.0 Images security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:3205