Bug 2177668 - [DPDK latency checkup] Traffic generator cannot start due to multiple environment vars with PCIDEVICE_ prefix
Summary: [DPDK latency checkup] Traffic generator cannot start due to multiple environ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 4.13.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.13.0
Assignee: Ram Lavi
QA Contact: Yossi Segev
URL:
Whiteboard:
Depends On: 2183205
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-13 10:48 UTC by Yossi Segev
Modified: 2023-05-18 02:58 UTC (History)
3 users (show)

Fixed In Version: v4.13.0.rhel9-1886
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-18 02:58:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kiagnose kubevirt-dpdk-checkup pull 95 0 None Merged [release-0.1] traffic-gen: Get pci env var by name 2023-03-16 06:42:34 UTC
Red Hat Issue Tracker CNV-26820 0 None None None 2023-03-13 10:49:27 UTC
Red Hat Product Errata RHSA-2023:3205 0 None None None 2023-05-18 02:58:25 UTC

Description Yossi Segev 2023-03-13 10:48:32 UTC
Description of problem:
When running the latency checkup job for testing DPDK, the traffic generator fails to start due to inability to locate the unique environment variable it is looking for.


Version-Release number of selected component (if applicable):
CNV 4.13.0
Letncy checkup: registry.redhat.io/container-native-virtualization/vm-network-latency-checkup-rhel9


How reproducible:
Always


Steps to Reproduce:
1. On a cluster with SR-IOV supported - create the following namespace:
$ oc create ns dpdk-checkup-ns
namespace/dpdk-checkup-ns created

2. Change the cluster context to be in the new namespace:
$ oc project dpdk-checkup-ns 
Now using project "dpdk-checkup-ns" on server "https://api.bm02-cnvqe2-rdu2.cnvqe2.lab.eng.rdu2.redhat.com:6443".

3. Apply the following resources, in order to run latency checkup job that tests DPDK (the resources are attached):
$ oc apply -f dpdk-latency-checkup-infra.yaml 
serviceaccount/dpdk-checkup-sa created
role.rbac.authorization.k8s.io/kiagnose-configmap-access created
rolebinding.rbac.authorization.k8s.io/kiagnose-configmap-access created
role.rbac.authorization.k8s.io/kubevirt-dpdk-checker created
rolebinding.rbac.authorization.k8s.io/kubevirt-dpdk-checker created
$ 
$ oc apply -f dpdk-latency-checkup-cm.yaml 
configmap/dpdk-checkup-config created
$ 

4. Start the latency checkup job using the attached resource:
$ oc apply -f dpdk-latency-checkup-job.yaml 
job.batch/dpdk-checkup created

5. While the job runs - find the traffic-generator pod:
$ oc get pods -n dpdk-checkup-ns 
NAME                                      READY   STATUS             RESTARTS      AGE
dpdk-checkup-xzcvt                        1/1     Running            0             25s
kubevirt-dpdk-checkup-traffic-gen-tzb2h   0/1     CrashLoopBackOff   1 (12s ago)   22s
virt-launcher-dpdk-vmi-v6l69-jd52z        0/2     PodInitializing    0             22s

6. Check the log of the traffic generator pod:
$ oc logs kubevirt-dpdk-checkup-traffic-gen-tzb2h --follow
setting params to trex_cfg.yaml
+ set_pci_addresses
++ get_pci_device_env_var
+++ grep PCIDEVICE_
+++ env
++ local 'pci_device_env_with_value=PCIDEVICE_OPENSHIFT_IO_INTEL_NICS_DPDK=0000:19:0a.1,0000:19:0a.0
PCIDEVICE_OPENSHIFT_IO_INTEL_NICS_DPDK_INFO={"0000:19:0a.0":{"generic":{"deviceID":"0000:19:0a.0"},"vfio":{"dev-mount":"/dev/vfio/186","mount":"/dev/vfio/vfio"}},"0000:19:0a.1":{"generic":{"deviceID":"0000:19:0a.1"},"vfio":{"dev-mount":"/dev/vfio/187","mount":"/dev/vfio/vfio"}}}'
+++ wc -l
+++ echo 'PCIDEVICE_OPENSHIFT_IO_INTEL_NICS_DPDK=0000:19:0a.1,0000:19:0a.0
PCIDEVICE_OPENSHIFT_IO_INTEL_NICS_DPDK_INFO={"0000:19:0a.0":{"generic":{"deviceID":"0000:19:0a.0"},"vfio":{"dev-mount":"/dev/vfio/186","mount":"/dev/vfio/vfio"}},"0000:19:0a.1":{"generic":{"deviceID":"0000:19:0a.1"},"vfio":{"dev-mount":"/dev/vfio/187","mount":"/dev/vfio/vfio"}}}'
++ '[' 2 '!=' 1 ']'
++ echo 'error: could not find pci device env var'
++ exit 1
+ local 'pci_device_env_name=error: could not find pci device env var'
+ IFS=,
+ read -r -a nics_array
/opt/scripts/set_traffic_gen_cfg_file.sh: line 73: error: could not find pci device env var: invalid variable name

Checking the log shows that the the flow looks for a single environment variable with a `PCIDEVICE_` prefix, but it finds 2, and because it cannot determine which is the relevant var - it fails.


Actual results:
<BUG> Traffic generator fails.

Expected results:
The generator should complete its role and generate traffic.

Additional info:
By checking the log of the traffic generator pod (pasted above), we can see that the source of this issue is that the the flow looks for a single environment variable with a `PCIDEVICE_` prefix, but it finds 2, and because it cannot determine which is the relevant var - it fails.

Comment 6 Yossi Segev 2023-05-03 16:36:53 UTC
Verified with latest DPDK checkup related images:
brew.registry.redhat.io/rh-osbs/container-native-virtualization-kubevirt-dpdk-checkup-rhel9:v4.13.0
quay.io/kiagnose/kubevirt-dpdk-checkup-traffic-gen:v0.1.1
quay.io/kiagnose/kubevirt-dpdk-checkup-vm:v0.1.1

Comment 7 Orel Misan 2023-05-03 16:45:58 UTC
@ysegev could you please state the full build tag of the checkup's image? (v4.13.0-XX)

Comment 8 Yossi Segev 2023-05-03 18:52:43 UTC
Re-verified, this time with this DPDK checkup image:
registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubevirt-dpdk-checkup-rhel9:v4.13.0-38

Comment 9 errata-xmlrpc 2023-05-18 02:58:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.13.0 Images security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:3205


Note You need to log in before you can comment on or make changes to this bug.