Description of problem: The max_desired_latency_milliseconds field has no meaning. Version-Release number of selected component (if applicable): How reproducible: Create a ConfigMap with the "spec.param.max_desired_latency_milliseconds" field set to 0. The Job will finish successfully instead of failing. Steps to Reproduce: 1. create a Namespace oc new-project test-latency 2. Create a Bridge with this yaml: apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: br10 spec: desiredState: interfaces: - bridge: options: stp: enabled: false port: - name: ens9 ipv4: auto-dns: true dhcp: false enabled: false ipv6: auto-dns: true autoconf: false dhcp: false enabled: false name: br10 state: up type: linux-bridge nodeSelector: node-role.kubernetes.io/worker: '' 3. Create a NAD with this yaml: apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: name: bridge-network-nad spec: config: | { "cniVersion":"0.3.1", "name": "br10", "plugins": [ { "type": "cnv-bridge", "bridge": "br10" } ] } ~ 4. Create a service-account, role, and role-binding: cat <<EOF | kubectl apply -f - --- apiVersion: v1 kind: ServiceAccount metadata: name: vm-latency-checkup-sa --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: kubevirt-vm-latency-checker rules: - apiGroups: ["kubevirt.io"] resources: ["virtualmachineinstances"] verbs: ["get", "create", "delete"] - apiGroups: ["subresources.kubevirt.io"] resources: ["virtualmachineinstances/console"] verbs: ["get"] - apiGroups: ["k8s.cni.cncf.io"] resources: ["network-attachment-definitions"] verbs: ["get"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: kubevirt-vm-latency-checker subjects: - kind: ServiceAccount name: vm-latency-checkup-sa roleRef: kind: Role name: kubevirt-vm-latency-checker apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: kiagnose-configmap-access rules: - apiGroups: [ "" ] resources: [ "configmaps" ] verbs: ["get", "update"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: kiagnose-configmap-access subjects: - kind: ServiceAccount name: vm-latency-checkup-sa roleRef: kind: Role name: kiagnose-configmap-access apiGroup: rbac.authorization.k8s.io EOF 5. Create the ConfigMap with the "spec.param.max_desired_latency_milliseconds" filed set to 0: cat <<EOF | kubectl apply -f - --- apiVersion: v1 kind: ConfigMap metadata: name: kubevirt-vm-latency-checkup-config data: spec.timeout: 5m spec.param.network_attachment_definition_namespace: "manual-latency-check" spec.param.network_attachment_definition_name: "bridge-network-nad" spec.param.max_desired_latency_milliseconds: "0" spec.param.sample_duration_seconds: "5" EOF 6. Create a job: cat <<EOF | kubectl apply -n <target-namespace> -f - --- apiVersion: batch/v1 kind: Job metadata: name: kubevirt-vm-latency-checkup spec: backoffLimit: 0 template: spec: serviceAccountName: vm-latency-checkup-sa restartPolicy: Never containers: - name: vm-latency-checkup image: quay.io/kiagnose/kubevirt-vm-latency:main securityContext: runAsUser: 1000 allowPrivilegeEscalation: false capabilities: drop: ["ALL"] runAsNonRoot: true seccompProfile: type: "RuntimeDefault" env: - name: CONFIGMAP_NAMESPACE value: test-latency - name: CONFIGMAP_NAME value: kubevirt-vm-latency-checkup-config EOF Actual results: Check the job status - job should finish successfully. Check the ConfigMap results, would have all the parameters as a successful job: oc get cm latency-configmap -oyaml apiVersion: v1 data: spec.param.max_desired_latency_milliseconds: "0" spec.param.network_attachment_definition_name: checkup-nad spec.param.network_attachment_definition_namespace: test-checkup-framework spec.param.sample_duration_seconds: "5" spec.timeout: 300m status.completionTimestamp: "2022-12-25T14:21:01Z" status.failureReason: "" status.result.avgLatencyNanoSec: "491000" status.result.maxLatencyNanoSec: "652000" status.result.measurementDurationSec: "5" status.result.minLatencyNanoSec: "336000" status.result.sourceNode: master2 status.result.targetNode: master1 status.startTimestamp: "2022-12-25T14:20:09Z" status.succeeded: "true" kind: ConfigMap metadata: creationTimestamp: "2022-12-25T14:20:06Z" labels: created-by-dynamic-class-creator: "Yes" name: latency-configmap namespace: test-checkup-framework resourceVersion: "7798869" uid: dadebd03-c8b5-456b-9995-b201eb52415f Expected results: The job should fail.
The "spec.param.max_desired_latency_milliseconds" field is respected. The measured latency is treated as an integer, with units of milliseconds. When the measured latency is less than 1 [ms], it is considered as 0 [ms], thus the condition `actualMaxLatency > maxLatencyDesired` (which fails the checkup if true) is false and the checkup succeeds. https://github.com/kiagnose/kiagnose/blob/95d8c7995fabbb7be11f7cde0eca57dc01e222bb/checkups/kubevirt-vm-latency/vmlatency/internal/checkup/checkup.go#L157
Tested on a PSI cluster: $ oc get csv -A | grep virt ... openshift-cnv kubevirt-hyperconverged-operator.v4.13.0 OpenShift Virtualization 4.13.0 kubevirt-hyperconverged-operator.v4.11.1 Succeeded Openshift version: 4.12.0 CNV version: 4.13.0 HCO image: brew.registry.redhat.io/rh-osbs/iib:418191 OCS version: 4.12.0 CNI type: OVNKubernetes Workers type: virtual It still doesn't seem to respect the max_desired_latency_milliseconds field - I would expect such a checkup to fail because it didn't meet the requirements of the user. What actually happen is that the VMI's are not created, the virt-launcher pods are stuck in init status: $ oc logs virt-launcher-latency-check-source-vz5rd-kk5tl Error from server (BadRequest): container "compute" in pod "virt-launcher-latency-check-source-vz5rd-kk5tl" is waiting to start: PodInitializing After about 10 minutes the checkup job failes with: checkup failed: setup: failed to wait for VMI 'test-checkup-framework/latency-check-target-rlpmb' IP address to appear on status: timed out waiting for the condition
Verified on a fresh installation of PSI: kubevirt-hyperconverged-operator.v4.13.0 IIB - 425854 - version v4.13.0.rhel9-1385
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.13.0 Images security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:3205