Bug 2193234 - [DPDK checkup] Node selection from ConfigMap not applied
Summary: [DPDK checkup] Node selection from ConfigMap not applied
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 4.13.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.14.0
Assignee: Petr Horáček
QA Contact: Nir Rozen
URL:
Whiteboard:
: 2193235 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-04 19:24 UTC by Yossi Segev
Modified: 2023-11-08 14:05 UTC (History)
1 user (show)

Fixed In Version: v4.14.0.rhel9-1146
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-08 14:05:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
DPDK checkup manifests (4.08 KB, application/zip)
2023-05-04 19:24 UTC, Yossi Segev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kiagnose kubevirt-dpdk-checkup pull 64 0 None Merged Add Node Affinity and Anti-Affinity logic 2023-07-09 15:07:51 UTC
Red Hat Issue Tracker CNV-28582 0 None None None 2023-05-04 19:26:10 UTC
Red Hat Product Errata RHSA-2023:6817 0 None None None 2023-11-08 14:05:41 UTC

Description Yossi Segev 2023-05-04 19:24:14 UTC
Created attachment 1962335 [details]
DPDK checkup manifests

Description of problem:
When running a DPDK checkup job, attempting to run the VM and the traffic generator on specific nodes is ignored.


Version-Release number of selected component (if applicable):
CNV 4.13.0
container-native-virtualization-kubevirt-dpdk-checkup-rhel9:v4.13.0-37


How reproducible:
100%


Steps to Reproduce:
1. Create namespace for the job, and change context to the new namespace.
$ oc create ns dpdk-checkup-ns
$ oc project dpdk-checkup-ns

2. Label the worker nodes with "worker-dpdk" label.

3. Apply the resources manifests in the attached file in their numeric order:
$ oc apply -f 1-dpdk-checkup-resources.yaml
$ oc apply -f 2-dpdk-checkup-scc.yaml
...
change the resources according to your cluster.
In the ConfigMap manifest - note these 2 params:
  spec.param.trafficGeneratorNodeSelector: "cnv-qe-infra-12.cnvqe2.lab.eng.rdu2.redhat.com"
  spec.param.DPDKLabelSelector: "cnv-qe-infra-12.cnvqe2.lab.eng.rdu2.redhat.com"

4. After applying the job itself (8-dpdk-checkup-job.yaml) - follow the pods that are created and the nodes they are created on (or wait for the job to finish, as this info will also appear in the result ConfigMap).
$ oc get cm dpdk-checkup-config -o yaml | grep "status.result" | grep Node
  status.result.DPDKVMNode: cnv-qe-infra-13.cnvqe2.lab.eng.rdu2.redhat.com
  status.result.trafficGeneratorNode: cnv-qe-infra-13.cnvqe2.lab.eng.rdu2.redhat.com

<BUG>
Although the setup ConfigMap is attempting to set cnv-qe-infra-12.cnvqe2.lab.eng.rdu2.redhat.com as the node for both the traffic generator and the VM, the actual used node is cnv-qe-infra-13.cnvqe2.lab.eng.rdu2.redhat.com.


Additional info:
Checking the log of the checkup job pod shows that these fields remain blank and not taken from the ConfigMap:
2023/05/04 14:05:11 "trafficGeneratorNodeLabelSelector": ""
2023/05/04 14:05:11 "trafficGeneratorPacketsPerSecond": "8m"
2023/05/04 14:05:11 "DPDKNodeLabelSelector": ""

Comment 1 Petr Horáček 2023-05-17 08:29:25 UTC
*** Bug 2193235 has been marked as a duplicate of this bug. ***

Comment 2 Yossi Segev 2023-10-12 11:16:39 UTC
Verified by running the same scenario as in the bug description.

CNV 4.14.0
container-native-virtualization/kubevirt-dpdk-checkup-rhel9:v4.14.0-116

checking the ConfigMap after the job is done shows that the pods were scheduled on the target node I set:

$ oc get cm dpdk-checkup-config -o yaml
apiVersion: v1
data:
  ...
  spec.param.trafficGenTargetNodeName: cnv-qe-19.cnvqe.lab.eng.rdu2.redhat.com
  ...
  spec.param.vmUnderTestTargetNodeName: cnv-qe-19.cnvqe.lab.eng.rdu2.redhat.com
  ...
  status.result.trafficGenActualNodeName: cnv-qe-19.cnvqe.lab.eng.rdu2.redhat.com
  ...
  status.result.vmUnderTestActualNodeName: cnv-qe-19.cnvqe.lab.eng.rdu2.redhat.com
  ...


Following the pods during the job run shows they are scheduled on the selected node:
$ oc get pods -o wide -w
NAME                                         READY   STATUS     RESTARTS   AGE   IP             NODE                                      NOMINATED NODE   READINESS GATES
virt-launcher-dpdk-traffic-gen-6mtfv-vl955   2/2     Running           0          78s   10.130.0.124   cnv-qe-19.cnvqe.lab.eng.rdu2.redhat.com   <none>           1/1
virt-launcher-vmi-under-test-6mtfv-gz8b2     2/2     Running           0          79s   10.130.0.121   cnv-qe-19.cnvqe.lab.eng.rdu2.redhat.com   <none>           1/1

Comment 4 errata-xmlrpc 2023-11-08 14:05:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6817


Note You need to log in before you can comment on or make changes to this bug.