Description of problem: When a checkup encounter a setup failure, the components created by the job are not deleted. Version-Release number of selected component (if applicable): kubevirt-hyperconverged-operator.v4.12.0 OpenShift Virtualization 4.12.0 kubevirt-hyperconverged-operator.v4.11.1 Succeeded Client Version: 4.12.0-rc.6 Kustomize Version: v4.5.7 Server Version: 4.12.0-rc.6 Kubernetes Version: v1.25.4+77bec7a How reproducible: Create a checkup configmap with a nonexistent node specified as the source node. The first virt-launcher pod will stay in pending mode and will never get to a running state and the actual checkup will never start. Steps to Reproduce: 1. create a Namespace oc new-project test-latency 2. Create a Bridge with this yaml: apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: br10 spec: desiredState: interfaces: - bridge: options: stp: enabled: false port: - name: ens9 ipv4: auto-dns: true dhcp: false enabled: false ipv6: auto-dns: true autoconf: false dhcp: false enabled: false name: br10 state: up type: linux-bridge nodeSelector: node-role.kubernetes.io/worker: '' 3. Create a NAD with this yaml: apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: name: bridge-network-nad spec: config: | { "cniVersion":"0.3.1", "name": "br10", "plugins": [ { "type": "cnv-bridge", "bridge": "br10" } ] } ~ 4. Create a service-account, role, and role-binding: cat <<EOF | kubectl apply -f - --- apiVersion: v1 kind: ServiceAccount metadata: name: vm-latency-checkup-sa --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: kubevirt-vm-latency-checker rules: - apiGroups: ["kubevirt.io"] resources: ["virtualmachineinstances"] verbs: ["get", "create", "delete"] - apiGroups: ["subresources.kubevirt.io"] resources: ["virtualmachineinstances/console"] verbs: ["get"] - apiGroups: ["k8s.cni.cncf.io"] resources: ["network-attachment-definitions"] verbs: ["get"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: kubevirt-vm-latency-checker subjects: - kind: ServiceAccount name: vm-latency-checkup-sa roleRef: kind: Role name: kubevirt-vm-latency-checker apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: kiagnose-configmap-access rules: - apiGroups: [ "" ] resources: [ "configmaps" ] verbs: ["get", "update"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: kiagnose-configmap-access subjects: - kind: ServiceAccount name: vm-latency-checkup-sa roleRef: kind: Role name: kiagnose-configmap-access apiGroup: rbac.authorization.k8s.io EOF 5. Create the ConfigMap with the "spec.param.max_desired_latency_milliseconds" filed set to 0: cat <<EOF | kubectl apply -f - --- apiVersion: v1 kind: ConfigMap metadata: name: kubevirt-vm-latency-checkup-config data: spec.timeout: 5m spec.param.network_attachment_definition_namespace: "manual-latency-check" spec.param.network_attachment_definition_name: "bridge-network-nad" spec.param.max_desired_latency_milliseconds: "0" spec.param.sample_duration_seconds: "5" spec.param.source_node: non-existent-node spec.param.target_node: cnv-qe-14.cnvqe.lab.eng.rdu2.redhat.com EOF 6. Create a job: cat <<EOF | kubectl apply -f - --- apiVersion: batch/v1 kind: Job metadata: name: kubevirt-vm-latency-checkup spec: backoffLimit: 0 template: spec: serviceAccountName: vm-latency-checkup-sa restartPolicy: Never containers: - name: vm-latency-checkup image: brew.registry.redhat.io/rh-osbs/container-native-virtualization-vm-network-latency-checkup:v4.12.0 securityContext: runAsUser: 1000 allowPrivilegeEscalation: false capabilities: drop: ["ALL"] runAsNonRoot: true seccompProfile: type: "RuntimeDefault" env: - name: CONFIGMAP_NAMESPACE value: test-latency - name: CONFIGMAP_NAME value: kubevirt-vm-latency-checkup-config EOF Actual results: When the job is deleted the pods and VMI's are not deleted: oc get all NAME READY STATUS RESTARTS AGE pod/latency-nonexistent-node-job-qt4wk 0/1 Error 0 74m pod/virt-launcher-latency-check-source-4fqgk 0/2 Pending 0 74m pod/virt-launcher-latency-check-target-smj9r 2/2 Running 0 74m NAME COMPLETIONS DURATION AGE job.batch/latency-nonexistent-node-job 0/1 74m 74m NAME AGE PHASE IP NODENAME READY virtualmachineinstance.kubevirt.io/latency-check-source 74m Scheduling False virtualmachineinstance.kubevirt.io/latency-check-target 74m Running 192.168.100.20 cnv-qe-14.cnvqe.lab.eng.rdu2.redhat.com True Expected results: All the resources created by the Job are deleted as the job gets deleted.
Verified on a PSI cluster: $ oc get csv -A | grep virt ... openshift-cnv kubevirt-hyperconverged-operator.v4.13.0 OpenShift Virtualization 4.13.0 kubevirt-hyperconverged-operator.v4.11.1 Succeeded Openshift version: 4.12.0 CNV version: 4.13.0 HCO image: brew.registry.redhat.io/rh-osbs/iib:418191 OCS version: 4.12.0 CNI type: OVNKubernetes Workers type: virtual
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.13.0 Images security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:3205