Description of problem: Starting a CNV VM creates a VMI, which is tracked by its "state" field. The VMI state is typically expected to go from "Pending" to "Scheduling" to "Scheduled" and finally "Running". However, on creation, VMIs do not have a state defined, which translates to a blank state shown to the user. When starting only 1 VMI, the state is only blank for a fraction of a second, but this becomes more of a problem when starting large amounts of VMIs at once, which leads to VMIs having a blank state for up to a whole minute. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Start a VM/VMI 2. Very quickly look at the VMI state 3. Actual results: Blank state Expected results: Pending state Additional info:
Description of problem: I have run a load of start 1K VMS batch on the system - i've run 1K the command virtctl start {VMS}. There are 283 VMI that have no status : [root]# oc get vmi | grep -c Run 106 [root]# oc get vmi | grep -c Failed 330 [root]# oc get vmi | grep -c Scheduling 137 [root]# oc get vmi | grep -c Pending 144 Thus we have 283 VMI that have no status. Version-Release number of selected component (if applicable): NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-rc.4 True False 21d Cluster version is 4.6.0-rc.4 NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v2.5.0 OpenShift Virtualization 2.5.0 kubevirt-hyperconverged-operator.v2.4.3 Succeeded How reproducible: Always Steps to Reproduce: 1.Create 1K PVC 2.Create 1K VM 3.Run virtctl start VM_NAME 1K times Actual results: Some VMI has no status Expected results: All VMI has status Additional info: No status looks like the below vm-109 : oc get vmi | more NAME AGE PHASE IP NODENAME vm-1 6d2h Running 10.131.3.93 f25-h03-000-r730xd.rdu2.scalelab.redhat.com vm-10 6d2h Running 10.128.5.241 f25-h11-000-r730xd.rdu2.scalelab.redhat.com vm-100 6d2h Running 10.130.2.164 f25-h09-000-r730xd.rdu2.scalelab.redhat.com vm-1000 12m Pending vm-101 6d2h Running 10.128.5.253 f25-h11-000-r730xd.rdu2.scalelab.redhat.com vm-102 6d2h Running 10.128.5.254 f25-h11-000-r730xd.rdu2.scalelab.redhat.com vm-103 6d2h Running 10.129.2.7 f19-h27-000-r620.rdu2.scalelab.redhat.com vm-104 6d2h Running 10.130.4.193 f25-h07-000-r730xd.rdu2.scalelab.redhat.com vm-105 6d2h Running 10.128.4.10 f25-h11-000-r730xd.rdu2.scalelab.redhat.com vm-106 6d2h Running 10.129.4.23 f25-h05-000-r730xd.rdu2.scalelab.redhat.com vm-107 19m Failed vm-108 19m Failed vm-109 5m40s vm-11 6d2h Running 10.130.4.179 f25-h07-000-r730xd.rdu2.scalelab.redhat.com vm-110 5m17s vm-111 9m52s Pending vm-112 22m Failed vm-113 23m Failed vm-114 5m19s vm-115 4m40s
With these VMIs that don't have status, are you able to interact with them at all? e.g. stop the VMI, connect to it via console? My impression, just looking at the timestamps are that there's a point in time where VMIs (days earlier) were fine, a point where they were failing, a point where they're still pending and a point where we're just not getting status at all. With that, I'm of the impression that these VMIs never ran at all.
This PR might be also related to this issue, https://github.com/kubevirt/kubevirt/pull/4772
@kbidarka the PR you mentioned does reduce the amount of time a VMI can be in a blank state, but https://github.com/kubevirt/kubevirt/pull/4705 actually fixes the issue by creating VMIs as Pending right away.
@Guy, While working with populating multiple VM's during scale testing, were you able to reproduce this bug with 4.8.0 ?
The bug fix was not merged to the version I have tested, Once I will deploy a version with the fix in it I will verify it.
Verify with: virt-operator-container-v4.8.0-63 Create 1000 VMs and check the VM status. script: ---- # /bin/bash for number in {501..1500} do echo "vm: $number " cp vm-cirros.yaml vm-cirros-new.yaml sed -i 's|vm-cirros|vm-cirros-'"${number}"'|g' vm-cirros-new.yaml oc apply -f vm-cirros-new.yaml -n bug-status done exit 0 Vms are with status: Running,Scheduling,Pending,Scheduled Did not see any VM with no status.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2920