2196459 – [DPDK checkup] Pods are scheduled on reserved instead of isolated CPUs

Bug 2196459 - [DPDK checkup] Pods are scheduled on reserved instead of isolated CPUs

Summary: [DPDK checkup] Pods are scheduled on reserved instead of isolated CPUs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.13.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.14.0
Assignee:	Orel Misan
QA Contact:	Yossi Segev
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2196224
TreeView+	depends on / blocked

Reported:	2023-05-09 09:09 UTC by Yossi Segev
Modified:	2023-11-08 14:06 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-08 14:05:46 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
DPDK checkup resources manifests (4.08 KB, application/zip) 2023-05-09 09:09 UTC, Yossi Segev	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	CNV-28721	0	None	None	None	2023-05-09 09:12:16 UTC
Red Hat Product Errata	RHSA-2023:6817	0	None	None	None	2023-11-08 14:06:11 UTC

Description Yossi Segev 2023-05-09 09:09:31 UTC

Created attachment 1963483 [details]
DPDK checkup resources manifests

Created attachment 1963483 [details]
DPDK checkup resources manifests

Description of problem:
When configuring a DPDK checkup job, the user sets (in the PerformanceProfile resource) the isolated CPUs on which the job's pods should be scheduled.
In practice, the pods are scheduled on the reserved CPUs, which are supposed to remain untouched and left for the OS to use.


Version-Release number of selected component (if applicable):
CNV 4.13.0
container-native-virtualization-kubevirt-dpdk-checkup-rhel9:v4.13.0-37


How reproducible:
100%


Steps to Reproduce:
1. Make sure the kubelet CPU manager is enabled (follow https://docs.openshift.com/container-platform/4.12/scalability_and_performance/using-cpu-manager.html#seting_up_cpu_manager_using-cpu-manager-and-topology_manager if necessary).

2. Create namespace for the job, and change context to the new namespace.
$ oc create ns dpdk-checkup-ns
$ oc project dpdk-checkup-ns

3. Label the worker nodes with "worker-dpdk" label.

4. Apply the resources manifests in the attached file in their numeric order:
$ oc apply -f 1-dpdk-checkup-resources.yaml
$ oc apply -f 2-dpdk-checkup-scc.yaml
...
change the resources according to your cluster.

Please note:
Due to https://bugzilla.redhat.com/show_bug.cgi?id=2193235, you cannot set which nodes will be used for scheduling the VM and the traffic generator.
Therefore, you must W/A it by either uncordoning 2 workers and leaving only one as schedulable, or removing the "dpdk-workers" label from 2 nodes and keeping it on only one node.

5. Follow the pods, and wait for the traffic generator and the VM virt-launcher pods to run:
$ oc get pods -w
NAME                                      READY   STATUS     RESTARTS   AGE
dpdk-checkup-zprz7                        1/1     Running    0          11s
kubevirt-dpdk-checkup-traffic-gen-h89m5   1/1     Running    0          7s
virt-launcher-dpdk-vmi-rg8nl-2fnjq        0/2     Init:0/2   0          7s
virt-launcher-dpdk-vmi-rg8nl-2fnjq        0/2     Init:1/2   0          9s
ocvirt-launcher-dpdk-vmi-rg8nl-2fnjq        0/2     PodInitializing   0          15s
virt-launcher-dpdk-vmi-rg8nl-2fnjq        2/2     Running           0          21s
virt-launcher-dpdk-vmi-rg8nl-2fnjq        2/2     Running           0          21s

6. In each of these pods, check which CPUs are used for scheduling:
ysegev@ysegev-fedora (dpdk-checkup) $ oc exec -it kubevirt-dpdk-checkup-traffic-gen-h89m5 -- cat /sys/fs/cgroup/cpuset/cpuset.cpus
2,4,6,8,42,44,46,48
ysegev@ysegev-fedora (dpdk-checkup) $ oc exec -it virt-launcher-dpdk-vmi-rg8nl-2fnjq -- cat /sys/fs/cgroup/cpuset/cpuset.cpus
10,12,14,16,50,52,54,56


Actual results:
The CPUs used for scheduling each of these pods are those which are set as "reserved" in the PerformanceProfile resource:
$ oc get performanceprofile profile-1 -ojsonpath={.spec.cpu} | jq
{
  "isolated": "20,22,24,26,28,30,32,34,36,38,60,62,64,66,68,70,72,74,76,78",
  "reserved": "0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,21,23,25,27,29,31,33,35,37,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,61,63,65,67,69,71,73,75,77,79"
}


Expected results:
The CPUs used for scheduling each of these pods should be from the "isolated" list.

Comment 1 Ram Lavi 2023-07-26 08:06:56 UTC

Should work with https://issues.redhat.com/browse/OCPBUGS-15102 workaround

Comment 2 Yossi Segev 2023-10-12 10:35:55 UTC

Verified by running the same scenario as in the bug description.

CNV 4.14.0
container-native-virtualization/kubevirt-dpdk-checkup-rhel9:v4.14.0-116

Comment 3 Yossi Segev 2023-10-12 10:36:41 UTC

Verified by running the same scenario as in the bug description.

CNV 4.14.0
container-native-virtualization/kubevirt-dpdk-checkup-rhel9:v4.14.0-116

Comment 5 errata-xmlrpc 2023-11-08 14:05:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6817

Note You need to log in before you can comment on or make changes to this bug.