Bug 2156392
| Summary: | In the VM latency checkup, the max_desired_latency_milliseconds field has no meaning when the measured latency is less than 1[ms] | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | awax |
| Component: | Networking | Assignee: | Orel Misan <omisan> |
| Status: | CLOSED ERRATA | QA Contact: | awax |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 4.12.0 | CC: | omisan, ralavi |
| Target Milestone: | --- | ||
| Target Release: | 4.13.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-05-18 02:56:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
awax
2022-12-26 18:41:10 UTC
Description of problem:
The max_desired_latency_milliseconds field has no meaning.
Version-Release number of selected component (if applicable):
How reproducible:
Create a ConfigMap with the "spec.param.max_desired_latency_milliseconds" field set to 0. The Job will finish successfully instead of failing.
Steps to Reproduce:
1. create a Namespace
oc new-project test-latency
2. Create a Bridge with this yaml:
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: br10
spec:
desiredState:
interfaces:
- bridge:
options:
stp:
enabled: false
port:
- name: ens9
ipv4:
auto-dns: true
dhcp: false
enabled: false
ipv6:
auto-dns: true
autoconf: false
dhcp: false
enabled: false
name: br10
state: up
type: linux-bridge
nodeSelector:
node-role.kubernetes.io/worker: ''
3. Create a NAD with this yaml:
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: bridge-network-nad
spec:
config: |
{
"cniVersion":"0.3.1",
"name": "br10",
"plugins": [
{
"type": "cnv-bridge",
"bridge": "br10"
}
]
}
~
4. Create a service-account, role, and role-binding:
cat <<EOF | kubectl apply -f -
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: vm-latency-checkup-sa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kubevirt-vm-latency-checker
rules:
- apiGroups: ["kubevirt.io"]
resources: ["virtualmachineinstances"]
verbs: ["get", "create", "delete"]
- apiGroups: ["subresources.kubevirt.io"]
resources: ["virtualmachineinstances/console"]
verbs: ["get"]
- apiGroups: ["k8s.cni.cncf.io"]
resources: ["network-attachment-definitions"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kubevirt-vm-latency-checker
subjects:
- kind: ServiceAccount
name: vm-latency-checkup-sa
roleRef:
kind: Role
name: kubevirt-vm-latency-checker
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kiagnose-configmap-access
rules:
- apiGroups: [ "" ]
resources: [ "configmaps" ]
verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kiagnose-configmap-access
subjects:
- kind: ServiceAccount
name: vm-latency-checkup-sa
roleRef:
kind: Role
name: kiagnose-configmap-access
apiGroup: rbac.authorization.k8s.io
EOF
5. Create the ConfigMap with the "spec.param.max_desired_latency_milliseconds" filed set to 0:
cat <<EOF | kubectl apply -f -
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kubevirt-vm-latency-checkup-config
data:
spec.timeout: 5m
spec.param.network_attachment_definition_namespace: "manual-latency-check"
spec.param.network_attachment_definition_name: "bridge-network-nad"
spec.param.max_desired_latency_milliseconds: "0"
spec.param.sample_duration_seconds: "5"
EOF
6. Create a job:
cat <<EOF | kubectl apply -n <target-namespace> -f -
---
apiVersion: batch/v1
kind: Job
metadata:
name: kubevirt-vm-latency-checkup
spec:
backoffLimit: 0
template:
spec:
serviceAccountName: vm-latency-checkup-sa
restartPolicy: Never
containers:
- name: vm-latency-checkup
image: quay.io/kiagnose/kubevirt-vm-latency:main
securityContext:
runAsUser: 1000
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
runAsNonRoot: true
seccompProfile:
type: "RuntimeDefault"
env:
- name: CONFIGMAP_NAMESPACE
value: test-latency
- name: CONFIGMAP_NAME
value: kubevirt-vm-latency-checkup-config
EOF
Actual results:
Check the job status - job should finish successfully.
Check the ConfigMap results, would have all the parameters as a successful job:
oc get cm latency-configmap -oyaml
apiVersion: v1
data:
spec.param.max_desired_latency_milliseconds: "0"
spec.param.network_attachment_definition_name: checkup-nad
spec.param.network_attachment_definition_namespace: test-checkup-framework
spec.param.sample_duration_seconds: "5"
spec.timeout: 300m
status.completionTimestamp: "2022-12-25T14:21:01Z"
status.failureReason: ""
status.result.avgLatencyNanoSec: "491000"
status.result.maxLatencyNanoSec: "652000"
status.result.measurementDurationSec: "5"
status.result.minLatencyNanoSec: "336000"
status.result.sourceNode: master2
status.result.targetNode: master1
status.startTimestamp: "2022-12-25T14:20:09Z"
status.succeeded: "true"
kind: ConfigMap
metadata:
creationTimestamp: "2022-12-25T14:20:06Z"
labels:
created-by-dynamic-class-creator: "Yes"
name: latency-configmap
namespace: test-checkup-framework
resourceVersion: "7798869"
uid: dadebd03-c8b5-456b-9995-b201eb52415f
Expected results:
The job should fail.
The "spec.param.max_desired_latency_milliseconds" field is respected. The measured latency is treated as an integer, with units of milliseconds. When the measured latency is less than 1 [ms], it is considered as 0 [ms], thus the condition `actualMaxLatency > maxLatencyDesired` (which fails the checkup if true) is false and the checkup succeeds. https://github.com/kiagnose/kiagnose/blob/95d8c7995fabbb7be11f7cde0eca57dc01e222bb/checkups/kubevirt-vm-latency/vmlatency/internal/checkup/checkup.go#L157 Tested on a PSI cluster: $ oc get csv -A | grep virt ... openshift-cnv kubevirt-hyperconverged-operator.v4.13.0 OpenShift Virtualization 4.13.0 kubevirt-hyperconverged-operator.v4.11.1 Succeeded Openshift version: 4.12.0 CNV version: 4.13.0 HCO image: brew.registry.redhat.io/rh-osbs/iib:418191 OCS version: 4.12.0 CNI type: OVNKubernetes Workers type: virtual It still doesn't seem to respect the max_desired_latency_milliseconds field - I would expect such a checkup to fail because it didn't meet the requirements of the user. What actually happen is that the VMI's are not created, the virt-launcher pods are stuck in init status: $ oc logs virt-launcher-latency-check-source-vz5rd-kk5tl Error from server (BadRequest): container "compute" in pod "virt-launcher-latency-check-source-vz5rd-kk5tl" is waiting to start: PodInitializing After about 10 minutes the checkup job failes with: checkup failed: setup: failed to wait for VMI 'test-checkup-framework/latency-check-target-rlpmb' IP address to appear on status: timed out waiting for the condition Verified on a fresh installation of PSI: kubevirt-hyperconverged-operator.v4.13.0 IIB - 425854 - version v4.13.0.rhel9-1385 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.13.0 Images security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:3205 |