+++ This bug was initially created as a clone of Bug #2156902 +++ Target release: v4.12.1 Description of problem: When a checkup encounter a setup failure, the components created by the job are not deleted. Version-Release number of selected component (if applicable): kubevirt-hyperconverged-operator.v4.12.0 OpenShift Virtualization 4.12.0 kubevirt-hyperconverged-operator.v4.11.1 Succeeded Client Version: 4.12.0-rc.6 Kustomize Version: v4.5.7 Server Version: 4.12.0-rc.6 Kubernetes Version: v1.25.4+77bec7a How reproducible: Create a checkup configmap with a nonexistent node specified as the source node. The first virt-launcher pod will stay in pending mode and will never get to a running state and the actual checkup will never start. Steps to Reproduce: 1. create a Namespace oc new-project test-latency 2. Create a Bridge with this yaml: apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: br10 spec: desiredState: interfaces: - bridge: options: stp: enabled: false port: - name: ens9 ipv4: auto-dns: true dhcp: false enabled: false ipv6: auto-dns: true autoconf: false dhcp: false enabled: false name: br10 state: up type: linux-bridge nodeSelector: node-role.kubernetes.io/worker: '' 3. Create a NAD with this yaml: apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: name: bridge-network-nad spec: config: | { "cniVersion":"0.3.1", "name": "br10", "plugins": [ { "type": "cnv-bridge", "bridge": "br10" } ] } ~ 4. Create a service-account, role, and role-binding: cat <<EOF | kubectl apply -f - --- apiVersion: v1 kind: ServiceAccount metadata: name: vm-latency-checkup-sa --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: kubevirt-vm-latency-checker rules: - apiGroups: ["kubevirt.io"] resources: ["virtualmachineinstances"] verbs: ["get", "create", "delete"] - apiGroups: ["subresources.kubevirt.io"] resources: ["virtualmachineinstances/console"] verbs: ["get"] - apiGroups: ["k8s.cni.cncf.io"] resources: ["network-attachment-definitions"] verbs: ["get"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: kubevirt-vm-latency-checker subjects: - kind: ServiceAccount name: vm-latency-checkup-sa roleRef: kind: Role name: kubevirt-vm-latency-checker apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: kiagnose-configmap-access rules: - apiGroups: [ "" ] resources: [ "configmaps" ] verbs: ["get", "update"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: kiagnose-configmap-access subjects: - kind: ServiceAccount name: vm-latency-checkup-sa roleRef: kind: Role name: kiagnose-configmap-access apiGroup: rbac.authorization.k8s.io EOF 5. Create the ConfigMap with the "spec.param.max_desired_latency_milliseconds" filed set to 0: cat <<EOF | kubectl apply -f - --- apiVersion: v1 kind: ConfigMap metadata: name: kubevirt-vm-latency-checkup-config data: spec.timeout: 5m spec.param.network_attachment_definition_namespace: "manual-latency-check" spec.param.network_attachment_definition_name: "bridge-network-nad" spec.param.max_desired_latency_milliseconds: "0" spec.param.sample_duration_seconds: "5" spec.param.source_node: non-existent-node spec.param.target_node: cnv-qe-14.cnvqe.lab.eng.rdu2.redhat.com EOF 6. Create a job: cat <<EOF | kubectl apply -f - --- apiVersion: batch/v1 kind: Job metadata: name: kubevirt-vm-latency-checkup spec: backoffLimit: 0 template: spec: serviceAccountName: vm-latency-checkup-sa restartPolicy: Never containers: - name: vm-latency-checkup image: brew.registry.redhat.io/rh-osbs/container-native-virtualization-vm-network-latency-checkup:v4.12.0 securityContext: runAsUser: 1000 allowPrivilegeEscalation: false capabilities: drop: ["ALL"] runAsNonRoot: true seccompProfile: type: "RuntimeDefault" env: - name: CONFIGMAP_NAMESPACE value: test-latency - name: CONFIGMAP_NAME value: kubevirt-vm-latency-checkup-config EOF Actual results: When the job is deleted the pods and VMI's are not deleted: oc get all NAME READY STATUS RESTARTS AGE pod/latency-nonexistent-node-job-qt4wk 0/1 Error 0 74m pod/virt-launcher-latency-check-source-4fqgk 0/2 Pending 0 74m pod/virt-launcher-latency-check-target-smj9r 2/2 Running 0 74m NAME COMPLETIONS DURATION AGE job.batch/latency-nonexistent-node-job 0/1 74m 74m NAME AGE PHASE IP NODENAME READY virtualmachineinstance.kubevirt.io/latency-check-source 74m Scheduling False virtualmachineinstance.kubevirt.io/latency-check-target 74m Running 192.168.100.20 cnv-qe-14.cnvqe.lab.eng.rdu2.redhat.com True Expected results: All the resources created by the Job are deleted as the job gets deleted.
Verified on IBM BM cluster: $ oc get clusterversion (412_remove_closed_bug_2159397_from_code|…4⚑35) NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0-rc.8 True False 4d16h Cluster version is 4.12.0-rc.8 $ oc get csv -n openshift-cnv kubevirt-hyperconverged-operator.v4.12.1 OpenShift Virtualization 4.12.1 kubevirt-hyperconverged-operator.v4.11.1 Succeeded
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Virtualization 4.12.1 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:1023