Bug 2178629

Summary: [DPDK latency checkup] Traffic generator cannot start due to error in scappy server
Product: Container Native Virtualization (CNV) Reporter: Ram Lavi <ralavi>
Component: NetworkingAssignee: Petr Horáček <phoracek>
Status: CLOSED ERRATA QA Contact: Yossi Segev <ysegev>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.13.0CC: omisan, ysegev
Target Milestone: ---   
Target Release: 4.13.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.13.0.rhel9-1886 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-18 02:58:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2183205    
Bug Blocks:    

Description Ram Lavi 2023-03-15 13:28:58 UTC
Description of problem:
When running the latency checkup job for testing DPDK, the traffic generator fails to start due to an error in the trex scappy server:



Version-Release number of selected component (if applicable):
CNV 4.13.0

How reproducible:
Always

Steps to Reproduce:
1. On a cluster with SR-IOV supported - create the following namespace:
$ oc create ns dpdk-checkup-ns
namespace/dpdk-checkup-ns created

2. Change the cluster context to be in the new namespace:
$ oc project dpdk-checkup-ns 
Now using project "dpdk-checkup-ns" on server "https://api.bm02-cnvqe2-rdu2.cnvqe2.lab.eng.rdu2.redhat.com:6443".

Apply the following resources, in order to run latency checkup job that tests DPDK (the resources are attached):
$ oc apply -f dpdk-latency-checkup-infra.yaml 
serviceaccount/dpdk-checkup-sa created
role.rbac.authorization.k8s.io/kiagnose-configmap-access created
rolebinding.rbac.authorization.k8s.io/kiagnose-configmap-access created
role.rbac.authorization.k8s.io/kubevirt-dpdk-checker created
rolebinding.rbac.authorization.k8s.io/kubevirt-dpdk-checker created
$ 
$ oc apply -f dpdk-latency-checkup-cm.yaml 
configmap/dpdk-checkup-config created
$ 

4. Start the latency checkup job using the attached resource:
$ oc apply -f dpdk-latency-checkup-job.yaml 
job.batch/dpdk-checkup created

5. While the job runs - find the traffic-generator pod:
$ oc get pod
NAME                                      READY   STATUS     RESTARTS     AGE
dpdk-checkup-pbtmm                        1/1     Running    0            19s
kubevirt-dpdk-checkup-traffic-gen-gnlmv   0/1     Error      1 (7s ago)   15s
virt-launcher-dpdk-vmi-cllzn-zsl7m        0/2     Init:1/2   0            15s

6. Check the log of the traffic generator pod (full log attached):
$ oc logs kubevirt-dpdk-checkup-traffic-gen-gnlmv
...
Starting Scapy server..... Scapy server failed to run
Output: b'Traceback (most recent call last):\n  File "/usr/lib64/python3.9/runpy.py", line 197, in _run_module_as_main\n    return _run_code(code, main_globals, None,\n  File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code\n    exec(code, run_globals)\n  File "/opt/trex/automation/trex_control_plane/interactive/trex/scapy_server/scapy_zmq_server.py", line 198, in <module>\n    sys.exit(main(args,port))\n  File "/opt/trex/automation/trex_control_plane/interactive/trex/scapy_server/scapy_zmq_server.py", line 185, in main\n    s = Scapy_server(args,port)\n  File "/opt/trex/automation/trex_control_plane/interactive/trex/scapy_server/scapy_zmq_server.py", line 102, in __init__\n    self.scapy_wrapper = Scapy_wrapper()\n  File "/opt/trex/automation/trex_control_plane/interactive/trex/scapy_server/scapy_zmq_server.py", line 26, in __init__\n    self.scapy_master = Scapy_service()\n  File "/opt/trex/automation/trex_control_plane/interactive/trex/scapy_server/scapy_service.py", line 467, in __init__\n    self.server_v_hashed = self._generate_version_hash(self.version_major,self.version_minor)\n  File "/opt/trex/automation/trex_control_plane/interactive/trex/scapy_server/scapy_service.py", line 820, in _generate_version_hash\n    m = hashlib.md5()\nValueError: [digital envelope routines] unsupported\n'
WARNING: tried to configure 2048 hugepages for socket 0, but result is: 0
WARNING: tried to configure 2048 hugepages for socket 1, but result is: 0
Could not start scapy daemon server, which is needed by GUI to create packets.
If you don't need it, use --no-scapy-server flag.
ERROR encountered while configuring TRex system

Actual results:
pod fails in crashloop

Expected results:
pod should not fail in crashloop

Additional info:
This did not happen on CNV4.12 cluster

Comment 5 Yossi Segev 2023-05-03 16:36:31 UTC
Verified with latest DPDK checkup related images:
brew.registry.redhat.io/rh-osbs/container-native-virtualization-kubevirt-dpdk-checkup-rhel9:v4.13.0
quay.io/kiagnose/kubevirt-dpdk-checkup-traffic-gen:v0.1.1
quay.io/kiagnose/kubevirt-dpdk-checkup-vm:v0.1.1

Comment 6 Orel Misan 2023-05-03 16:42:47 UTC
@y

Comment 7 Orel Misan 2023-05-03 16:44:18 UTC
@ysegev could you please state the full build tag of the checkup's image? (v4.13.0-XX)

Comment 9 Yossi Segev 2023-05-03 18:53:10 UTC
Re-verified, this time with this DPDK checkup image:
registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubevirt-dpdk-checkup-rhel9:v4.13.0-38

Comment 10 errata-xmlrpc 2023-05-18 02:58:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.13.0 Images security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:3205