Bug 2123209 - CNV runs non-root VMs by default which removes cap_sys_nice from the launchers and caused the real time VM failed to boot up
Summary: CNV runs non-root VMs by default which removes cap_sys_nice from the launcher...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.11.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.13.0
Assignee: Jordi Gil
QA Contact: Denys Shchedrivyi
URL:
Whiteboard:
: 2123207 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-09-01 04:19 UTC by Gu Nini
Modified: 2023-05-18 02:56 UTC (History)
10 users (show)

Fixed In Version: hco-bundle-registry-container-v4.13.0.rhel9-1639
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-18 02:55:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
virt-handler.log (1.88 MB, text/plain)
2022-09-01 04:19 UTC, Gu Nini
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 8750 0 None Merged Fixes BZ-2123209: Real time VMs fail to change vCPU scheduler and priority in non-root deployments 2023-01-11 13:42:25 UTC
Red Hat Issue Tracker CNV-20983 0 None None None 2022-11-01 15:48:31 UTC
Red Hat Product Errata RHSA-2023:3205 0 None None None 2023-05-18 02:56:30 UTC

Description Gu Nini 2022-09-01 04:19:10 UTC
Created attachment 1908863 [details]
virt-handler.log

Description of problem:
After the env upgrade from OCP4.10.26 to OCP4.11.1/OCP-V4.11.0, the real time VM can't boot up since it's run as non-root VM by default. Please check the attached 'virt-handler.log' for details. 

Conclude what Vladik Romanovsky said about the root cause of the issue as follows:

'''
CNV runs non-root VMs by default now, this removes cap_sys_nice from the launchers. The problem is that CNV makes this switch before upstream KubeVirt did: https://github.com/kubevirt/kubevirt/blob/782b82aff8adc516d98421466ab9e43835efb89c/pkg/virt-controller/services/rendercontainer.go#L244
'''


Version-Release number of selected component (if applicable):
OpenShift Virtualization: 4.11.0
Openshift: 4.11.1

How reproducible:
100%

Steps to Reproduce:
1. Upgrade env to OCP4.11.1/OCP-V4.11.0
2. Try to boot up a real time VM created formerly
3.

Actual results:
It's found the VM failed to boot up successfully.

Expected results:
The VM could boot up without issue.

Additional info:

Comment 1 sgott 2022-09-01 12:42:13 UTC
*** Bug 2123207 has been marked as a duplicate of this bug. ***

Comment 4 Jordi Gil 2022-11-09 17:37:36 UTC
PR raised to address this issue.
https://github.com/kubevirt/kubevirt/pull/8750

Comment 5 Jordi Gil 2022-12-02 14:43:51 UTC
PR has been merged.

Comment 9 Denys Shchedrivyi 2023-03-27 22:39:48 UTC
Verified on v4.13.0.rhel9-1834, VM with RT kernel can successfully run:

Steps:
1) set worker-rt label on one of worker nodes
2) create MCP pointed to worker-rt node
3) create PerformanceProfile which enables RT kernel
4) wait for MCP to complete update

> $ oc get mcp
> NAME        CONFIG                                                UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
> worker-rt   rendered-worker-rt-a560b74067dfdce8a670390145f51439   True      False      False      1              1                   1                     0                      129m


 node switched to rt kernel
>  oc get node -o wide
> NAME                                STATUS   ROLES                  AGE     VERSION           INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                         CONTAINER-RUNTIME
> virt-den-413-l7hvr-worker-0-9r8gw   Ready    worker,worker-rt       6h37m   v1.26.2+dc93b13   192.168.2.82    <none>        Red Hat Enterprise Linux CoreOS 413.92.202303221220-0 (Plow)   5.14.0-284.1.1.rt14.286.el9_2.x86_64   cri-o://1.26.1-10.rhaos4.13.gitcb86088.el9

5) create vm with realtime kernel (the doc from comment #7):
> $ oc get vm -A
> NAMESPACE   NAME              AGE   STATUS    READY
> test-rt     fedora-realtime   59m   Running   True

> $ oc get pod
> NAME                                  READY   STATUS    RESTARTS   AGE
> virt-launcher-fedora-realtime-75lfr   2/2     Running   0          59m

> [fedora@fedora-realtime ~]$ uname -r
> 5.6.19-300.rt10.2.fc32.ccrma.x86_64+rt

Comment 12 errata-xmlrpc 2023-05-18 02:55:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.13.0 Images security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:3205


Note You need to log in before you can comment on or make changes to this bug.