Bug 2183205 - [DPDK latency checkup] Traffic generator cannot start due to missing dedicated ServiceAccount
Summary: [DPDK latency checkup] Traffic generator cannot start due to missing dedicate...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 4.13.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.13.0
Assignee: Orel Misan
QA Contact: Yossi Segev
URL:
Whiteboard:
Depends On:
Blocks: 2177668 2178629
TreeView+ depends on / blocked
 
Reported: 2023-03-30 15:01 UTC by Yossi Segev
Modified: 2023-05-18 02:58 UTC (History)
2 users (show)

Fixed In Version: CNV bundle v4.13.0.rhel9-2091
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-18 02:58:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kiagnose kubevirt-dpdk-checkup pull 102 0 None Merged Add a dedicated ServiceAccount for the traffic-gen Pod 2023-04-18 06:22:47 UTC
Github kiagnose kubevirt-dpdk-checkup pull 104 0 None Merged [Release 0.1] Add a dedicated ServiceAccount for the traffic-gen Pod 2023-04-27 16:00:15 UTC
Github kiagnose kubevirt-dpdk-checkup pull 106 0 None Merged README: Remove `volumes` field from SCC 2023-04-27 16:00:45 UTC
Red Hat Issue Tracker CNV-27614 0 None None None 2023-03-30 15:03:34 UTC
Red Hat Product Errata RHSA-2023:3205 0 None None None 2023-05-18 02:58:39 UTC

Description Yossi Segev 2023-03-30 15:01:17 UTC
Description of problem:
Description of problem:
When running the latency checkup job for testing DPDK, the traffic generator fails to start because there is no ServiceAccount dedicated for the generator pod needed capabilities.


Version-Release number of selected component (if applicable):
CNV 4.13.0
DPDK checkup: registry.redhat.io/container-native-virtualization/kubevirt-dpdk-checkup-rhel9:v4.13.0-32


How reproducible:
Always


Steps to Reproduce:
1. On a cluster with SR-IOV supported - create the following namespace:
$ oc create ns dpdk-checkup-ns
namespace/dpdk-checkup-ns created

2. Add the following security labels to the new namespace (under metadata.labels):
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/enforce-version: v1.24
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: v1.24
    security.openshift.io/scc.podSecurityLabelSync: "false"

(run `oc edit ns dpdk-checkup-ns` to add the labels)

3. Apply the attached SecurityContextConstraints manifests (scc.yaml and scc2.yaml)

4. Change the cluster context to be in the new namespace:
$ oc project dpdk-checkup-ns 
Now using project "dpdk-checkup-ns" on server "https://api.bm02-cnvqe2-rdu2.cnvqe2.lab.eng.rdu2.redhat.com:6443".

5. Apply the following resources, in order to run latency checkup job that tests DPDK (the resources are attached):
$ oc apply -f dpdk-latency-checkup-infra.yaml 
serviceaccount/dpdk-checkup-sa created
role.rbac.authorization.k8s.io/kiagnose-configmap-access created
rolebinding.rbac.authorization.k8s.io/kiagnose-configmap-access created
role.rbac.authorization.k8s.io/kubevirt-dpdk-checker created
rolebinding.rbac.authorization.k8s.io/kubevirt-dpdk-checker created
$ 
$ oc apply -f dpdk-latency-checkup-cm.yaml 
configmap/dpdk-checkup-config created
$ 

6. Start the latency checkup job using the attached resource:
$ oc apply -f dpdk-latency-checkup-job.yaml 
job.batch/dpdk-checkup created

7. Check the pods in the dpdk-checkup-ns namespace:
$ oc get pods -n dpdk-checkup-ns 
NAME                                 READY   STATUS    RESTARTS   AGE
dpdk-checkup-92dh9                   0/1     Error     0          4h5m
virt-launcher-dpdk-vmi-v679r-cfzwg   2/2     Running   0          4h5m


Actual results:
Checkup job pod gets to error state. From checking its log we see it fails to start the traffic generator pod:

cnv-qe-jenkins@cnv-qe-infra-01:~/yossi/dpdk/dpdk-checkup$ oc logs dpdk-checkup-92dh9 
2023/03/30 10:50:22 kubevirt-dpdk-checkup starting...
2023/03/30 10:50:22 Using the following config:
2023/03/30 10:50:22 "networkAttachmentDefinitionName": "dpdk-sriovnetwork"
2023/03/30 10:50:22 "trafficGeneratorRuntimeClassName": "performance-profile-1"
2023/03/30 10:50:22 "portBandwidthGB": "10"
2023/03/30 10:50:22 "trafficGeneratorNodeLabelSelector": ""
2023/03/30 10:50:22 "trafficGeneratorPacketsPerSecond": "14m"
2023/03/30 10:50:22 "DPDKNodeLabelSelector": ""
2023/03/30 10:50:22 "trafficGeneratorEastMacAddress": "50:34:e8:67:18:01"
2023/03/30 10:50:22 "trafficGeneratorWestMacAddress": "50:32:1b:21:f7:02"
2023/03/30 10:50:22 "DPDKEastMacAddress": "60:3d:c4:4d:78:01"
2023/03/30 10:50:22 "DPDKWestMacAddress": "60:73:c9:c1:f5:02"
2023/03/30 10:50:22 "trafficGeneratorImage": "quay.io/kiagnose/kubevirt-dpdk-checkup-traffic-gen:main"
2023/03/30 10:50:22 "vmContainerDiskImage": "quay.io/kiagnose/kubevirt-dpdk-checkup-vm:main"
2023/03/30 10:50:22 "testDuration": "5m0s"
2023/03/30 10:50:22 "verbose": true
2023/03/30 10:50:22 Creating VMI "dpdk-checkup-ns/dpdk-vmi-v679r"...
2023/03/30 10:50:22 envVars: map[DST_EAST_MAC_ADDRESS:60:3d:c4:4d:78:01 DST_WEST_MAC_ADDRESS:60:73:c9:c1:f5:02 NUM_OF_CPUS:8 NUM_OF_TRAFFIC_CPUS:6 PCI_DEVICES_VAR_NAME:PCIDEVICE_OPENSHIFT_IO_INTEL_NICS_DPDK PORT_BANDWIDTH_GB:10 SET_VERBOSE:TRUE SRC_EAST_MAC_ADDRESS:50:34:e8:67:18:01 SRC_WEST_MAC_ADDRESS:50:32:1b:21:f7:02]
2023/03/30 10:50:22 Creating traffic generator Pod dpdk-checkup-ns/kubevirt-dpdk-checkup-traffic-gen-d4n86..
2023/03/30 10:50:22 kubevirt-dpdk-checkup failed: setup: pods "kubevirt-dpdk-checkup-traffic-gen-d4n86" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider "pipelines-scc": Forbidden: not usable by user or serviceaccount, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.containers[0].securityContext.runAsUser: Invalid value: 0: must be in the ranges: [1000780000, 1000789999], spec.containers[0].securityContext.capabilities.add: Invalid value: "IPC_LOCK": capability may not be added, spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_ADMIN": capability may not be added, spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_RAW": capability may not be added, spec.containers[0].securityContext.capabilities.add: Invalid value: "SYS_RESOURCE": capability may not be added, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "containerized-data-importer": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "kubevirt-controller": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "bridge-marker": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "nfd-worker": Forbidden: not usable by user or serviceaccount, provider "hostpath-provisioner-csi": Forbidden: not usable by user or serviceaccount, provider "linux-bridge": Forbidden: not usable by user or serviceaccount, provider "kubevirt-handler": Forbidden: not usable by user or serviceaccount, provider "rook-ceph": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "rook-ceph-csi": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]


Expected results:
All pods run successfully, including checkupjob and traffic generator.

Comment 8 Yossi Segev 2023-04-20 09:38:47 UTC
Verifying this bug is blocked until https://issues.redhat.com/browse/OCPNODE-1538 is fixed.
Currently, due to this issue, starting the traffic generator fails with
  Warning  Failed          8s (x4 over 53s)   kubelet            Error: failed to run pre-start hook for container "kubevirt-dpdk-checkup-traffic-gen": set CPU load balancing: timed out waiting for the condition

Comment 9 Yossi Segev 2023-05-03 16:34:46 UTC
Verified with latest DPDK checkup related images:
brew.registry.redhat.io/rh-osbs/container-native-virtualization-kubevirt-dpdk-checkup-rhel9:v4.13.0
quay.io/kiagnose/kubevirt-dpdk-checkup-traffic-gen:v0.1.1
quay.io/kiagnose/kubevirt-dpdk-checkup-vm:v0.1.1

Comment 10 Orel Misan 2023-05-03 16:46:32 UTC
@ysegev could you please state the full build tag of the checkup's image? (v4.13.0-XX)

Comment 12 Yossi Segev 2023-05-03 18:53:18 UTC
Re-verified, this time with this DPDK checkup image:
registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubevirt-dpdk-checkup-rhel9:v4.13.0-38

Comment 13 errata-xmlrpc 2023-05-18 02:58:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.13.0 Images security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:3205


Note You need to log in before you can comment on or make changes to this bug.