2193234 – [DPDK checkup] Node selection from ConfigMap not applied

Bug 2193234 - [DPDK checkup] Node selection from ConfigMap not applied

Summary: [DPDK checkup] Node selection from ConfigMap not applied

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.13.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	4.14.0
Assignee:	Petr Horáček
QA Contact:	Nir Rozen
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2193235 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-05-04 19:24 UTC by Yossi Segev
Modified:	2023-11-08 14:05 UTC (History)
CC List:	1 user (show)
Fixed In Version:	v4.14.0.rhel9-1146
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-08 14:05:31 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
DPDK checkup manifests (4.08 KB, application/zip) 2023-05-04 19:24 UTC, Yossi Segev	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kiagnose kubevirt-dpdk-checkup pull 64	None	Merged	Add Node Affinity and Anti-Affinity logic	2023-07-09 15:07:51 UTC
Red Hat Issue Tracker	CNV-28582	None	None	None	2023-05-04 19:26:10 UTC
Red Hat Product Errata	RHSA-2023:6817	None	None	None	2023-11-08 14:05:41 UTC

Description Yossi Segev 2023-05-04 19:24:14 UTC

Created attachment 1962335 [details]
DPDK checkup manifests

Description of problem:
When running a DPDK checkup job, attempting to run the VM and the traffic generator on specific nodes is ignored.


Version-Release number of selected component (if applicable):
CNV 4.13.0
container-native-virtualization-kubevirt-dpdk-checkup-rhel9:v4.13.0-37


How reproducible:
100%


Steps to Reproduce:
1. Create namespace for the job, and change context to the new namespace.
$ oc create ns dpdk-checkup-ns
$ oc project dpdk-checkup-ns

2. Label the worker nodes with "worker-dpdk" label.

3. Apply the resources manifests in the attached file in their numeric order:
$ oc apply -f 1-dpdk-checkup-resources.yaml
$ oc apply -f 2-dpdk-checkup-scc.yaml
...
change the resources according to your cluster.
In the ConfigMap manifest - note these 2 params:
  spec.param.trafficGeneratorNodeSelector: "cnv-qe-infra-12.cnvqe2.lab.eng.rdu2.redhat.com"
  spec.param.DPDKLabelSelector: "cnv-qe-infra-12.cnvqe2.lab.eng.rdu2.redhat.com"

4. After applying the job itself (8-dpdk-checkup-job.yaml) - follow the pods that are created and the nodes they are created on (or wait for the job to finish, as this info will also appear in the result ConfigMap).
$ oc get cm dpdk-checkup-config -o yaml | grep "status.result" | grep Node
  status.result.DPDKVMNode: cnv-qe-infra-13.cnvqe2.lab.eng.rdu2.redhat.com
  status.result.trafficGeneratorNode: cnv-qe-infra-13.cnvqe2.lab.eng.rdu2.redhat.com

<BUG>
Although the setup ConfigMap is attempting to set cnv-qe-infra-12.cnvqe2.lab.eng.rdu2.redhat.com as the node for both the traffic generator and the VM, the actual used node is cnv-qe-infra-13.cnvqe2.lab.eng.rdu2.redhat.com.


Additional info:
Checking the log of the checkup job pod shows that these fields remain blank and not taken from the ConfigMap:
2023/05/04 14:05:11 "trafficGeneratorNodeLabelSelector": ""
2023/05/04 14:05:11 "trafficGeneratorPacketsPerSecond": "8m"
2023/05/04 14:05:11 "DPDKNodeLabelSelector": ""

Comment 1 Petr Horáček 2023-05-17 08:29:25 UTC

*** Bug 2193235 has been marked as a duplicate of this bug. ***

Comment 2 Yossi Segev 2023-10-12 11:16:39 UTC

Verified by running the same scenario as in the bug description.

CNV 4.14.0
container-native-virtualization/kubevirt-dpdk-checkup-rhel9:v4.14.0-116

checking the ConfigMap after the job is done shows that the pods were scheduled on the target node I set:

$ oc get cm dpdk-checkup-config -o yaml
apiVersion: v1
data:
  ...
  spec.param.trafficGenTargetNodeName: cnv-qe-19.cnvqe.lab.eng.rdu2.redhat.com
  ...
  spec.param.vmUnderTestTargetNodeName: cnv-qe-19.cnvqe.lab.eng.rdu2.redhat.com
  ...
  status.result.trafficGenActualNodeName: cnv-qe-19.cnvqe.lab.eng.rdu2.redhat.com
  ...
  status.result.vmUnderTestActualNodeName: cnv-qe-19.cnvqe.lab.eng.rdu2.redhat.com
  ...


Following the pods during the job run shows they are scheduled on the selected node:
$ oc get pods -o wide -w
NAME                                         READY   STATUS     RESTARTS   AGE   IP             NODE                                      NOMINATED NODE   READINESS GATES
virt-launcher-dpdk-traffic-gen-6mtfv-vl955   2/2     Running           0          78s   10.130.0.124   cnv-qe-19.cnvqe.lab.eng.rdu2.redhat.com   <none>           1/1
virt-launcher-vmi-under-test-6mtfv-gz8b2     2/2     Running           0          79s   10.130.0.121   cnv-qe-19.cnvqe.lab.eng.rdu2.redhat.com   <none>           1/1

Comment 4 errata-xmlrpc 2023-11-08 14:05:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6817

Note You need to log in before you can comment on or make changes to this bug.