Bug 2203291

Summary: kubevirt should allow runtimeclass to be configured in a pod
Product: Container Native Virtualization (CNV) Reporter: Marcelo Tosatti <mtosatti>
Component: InstallationAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED ERRATA QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: high    
Version: 4.12.0CC: dbasunag, gveitmic, igarcia, jortialc, kbidarka, ngu, nunnatsa, phoracek, sgott, stirabos, vromanso
Target Milestone: ---   
Target Release: 4.14.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.14.0.rhel9-863 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2217910 (view as bug list) Environment:
Last Closed: 2023-11-08 14:05:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2217910, 2217913    

Description Marcelo Tosatti 2023-05-11 19:09:55 UTC
Description of problem:

For DPDK type applications, the vCPU should not be interrupted or throttled
by the cgroup cpu quota limitations. By default k8s sets cpu quota limitations to positive
integer values, which throttles the vCPU.

To disable cpu quota for pods it is necessary to annotate the pod with

     cpu-quota.crio.io: "disable"

And set runtimeClassName to the performance profile runtimeClassName (as described 
at "Disabling CPU CFS quota" section of https://docs.openshift.com/container-platform/4.12/scalability_and_performance/cnf-low-latency-tuning.html).

However KubeVirt does not support setting of runtimeClassName.

In a discussion with Vladik, it appears an acceptable way to allow pods to 
set runtimeClassName would be for a scheduling policy to be created for VMs,
similarly to migration policies.


Version-Release number of selected component (if applicable):

4.12

How reproducible:

Always

Steps to Reproduce:
1. Start KubeVirt VM with cpu-quota.crio.io: "disable" annotation and runtimeclassname set (per cnf low latency tuning document above).
2. 
3.

Actual results:

cpu.cpu_quota_us value in the pod cgroup is not -1.

Expected results:

cpu.cpu_quota_us value in the pod cgroup is -1.


Additional info:

Comment 2 Marcelo Tosatti 2023-05-12 12:25:02 UTC
*** Bug 2192636 has been marked as a duplicate of this bug. ***

Comment 6 Kedar Bidarkar 2023-05-31 12:14:05 UTC
As per Petr from comment5 it appears it needs update in HCO first.

Simone, could you please take a look?

Comment 10 Simone Tiraboschi 2023-06-05 09:43:27 UTC
(In reply to Kedar Bidarkar from comment #6)
> As per Petr from comment5 it appears it needs update in HCO first.
> 
> Simone, could you please take a look?

Sure, a few questions (for the sake of inline documenting the new configuration option):
1. can the the value of defaultRuntimeClass be amended as a day two operations when we have existing VMIs?
2. if so, what's the impact on existing VMIs?
3. is it going to affect live migration with the target pod getting configured with the new value for defaultRuntimeClass?
4. is 4.14 enough or should we backport this down to 4.13?

Comment 11 Kedar Bidarkar 2023-06-06 10:05:25 UTC
Petr, feel you could help answer Simone's questions from comment10

Comment 15 Petr Horáček 2023-06-15 12:21:08 UTC
*** Bug 2185411 has been marked as a duplicate of this bug. ***

Comment 16 Ivan 2023-06-20 07:45:57 UTC
@stirabos , regarding your 4th question on comment #10;

4. is 4.14 enough or should we backport this down to 4.13? --> The end Partner needs this bug to be backported to 4.12 as it would be the version that they will Go Live in September.

Can you please let me know if you need me to file it or you can duplicate this one for 4.12?

Thanks in advance!

Comment 17 Simone Tiraboschi 2023-06-26 09:58:10 UTC
(In reply to Ivan from comment #16)
> Can you please let me know if you need me to file it or you can duplicate
> this one for 4.12?

OK, thanks.
We will handle the BZ and the backport process on our side.

Comment 18 SATHEESARAN 2023-07-10 12:03:49 UTC
Verified with CNV v4.14 interim build (HCO bundle: v4.14.0.rhel9-1154)

tl;dr: 
New config option: defaultRuntimeClass is introduced and it get propagated to kubevirt and VMI

Validated with the following test cases:
1. hco.spec.defaultRuntimeClass and kubevirt.spec.defaultRuntimeClass gives helpful information
to understand about the new option 'defaultRuntimeClass'

2. hco.spec.defaultRuntimeClass validates for the valid input value, which is a string.
Boolean or numerical values didn't work as expected.

3. When hco.spec.defaultRuntimeClass is set, the value propagates as expected to kubevirt and VMI
For the value to propagate to VMI, the performance profile has to be created and then the same
to be set on the hco.spec.defaultRuntimeClass

4. When hco.spec.defaultRuntimeClass is set, it affects only the newly created VM, restarted VM, 
migrated VM. Running VMs doesn't get affected with the new option.

With the above observations, marking this bug as VERIFIED

Comment 20 errata-xmlrpc 2023-11-08 14:05:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6817