Bug 1903667
| Summary: | VMIs get created with an undefined state field, which is confusing to the user | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | guy chen <guchen> |
| Component: | Virtualization | Assignee: | Jed Lejosne <jlejosne> |
| Status: | CLOSED ERRATA | QA Contact: | guy chen <guchen> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 2.5.0 | CC: | cnv-qe-bugs, fdeutsch, ipinto, jlejosne, kbidarka, sgott |
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | hco-bundle-registry-container-v4.8.0-347 virt-operator-container-v4.8.0-58 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-27 14:21:17 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
guy chen
2020-12-02 15:39:44 UTC
Description of problem:
I have run a load of start 1K VMS batch on the system - i've run 1K the command virtctl start {VMS}.
There are 283 VMI that have no status :
[root]# oc get vmi | grep -c Run
106
[root]# oc get vmi | grep -c Failed
330
[root]# oc get vmi | grep -c Scheduling
137
[root]# oc get vmi | grep -c Pending
144
Thus we have 283 VMI that have no status.
Version-Release number of selected component (if applicable):
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.6.0-rc.4 True False 21d Cluster version is 4.6.0-rc.4
NAME DISPLAY VERSION REPLACES PHASE
kubevirt-hyperconverged-operator.v2.5.0 OpenShift Virtualization 2.5.0 kubevirt-hyperconverged-operator.v2.4.3 Succeeded
How reproducible:
Always
Steps to Reproduce:
1.Create 1K PVC
2.Create 1K VM
3.Run virtctl start VM_NAME 1K times
Actual results:
Some VMI has no status
Expected results:
All VMI has status
Additional info:
No status looks like the below vm-109 :
oc get vmi | more
NAME AGE PHASE IP NODENAME
vm-1 6d2h Running 10.131.3.93 f25-h03-000-r730xd.rdu2.scalelab.redhat.com
vm-10 6d2h Running 10.128.5.241 f25-h11-000-r730xd.rdu2.scalelab.redhat.com
vm-100 6d2h Running 10.130.2.164 f25-h09-000-r730xd.rdu2.scalelab.redhat.com
vm-1000 12m Pending
vm-101 6d2h Running 10.128.5.253 f25-h11-000-r730xd.rdu2.scalelab.redhat.com
vm-102 6d2h Running 10.128.5.254 f25-h11-000-r730xd.rdu2.scalelab.redhat.com
vm-103 6d2h Running 10.129.2.7 f19-h27-000-r620.rdu2.scalelab.redhat.com
vm-104 6d2h Running 10.130.4.193 f25-h07-000-r730xd.rdu2.scalelab.redhat.com
vm-105 6d2h Running 10.128.4.10 f25-h11-000-r730xd.rdu2.scalelab.redhat.com
vm-106 6d2h Running 10.129.4.23 f25-h05-000-r730xd.rdu2.scalelab.redhat.com
vm-107 19m Failed
vm-108 19m Failed
vm-109 5m40s
vm-11 6d2h Running 10.130.4.179 f25-h07-000-r730xd.rdu2.scalelab.redhat.com
vm-110 5m17s
vm-111 9m52s Pending
vm-112 22m Failed
vm-113 23m Failed
vm-114 5m19s
vm-115 4m40s
With these VMIs that don't have status, are you able to interact with them at all? e.g. stop the VMI, connect to it via console? My impression, just looking at the timestamps are that there's a point in time where VMIs (days earlier) were fine, a point where they were failing, a point where they're still pending and a point where we're just not getting status at all. With that, I'm of the impression that these VMIs never ran at all. This PR might be also related to this issue, https://github.com/kubevirt/kubevirt/pull/4772 @kbidarka the PR you mentioned does reduce the amount of time a VMI can be in a blank state, but https://github.com/kubevirt/kubevirt/pull/4705 actually fixes the issue by creating VMIs as Pending right away. @Guy, While working with populating multiple VM's during scale testing, were you able to reproduce this bug with 4.8.0 ? The bug fix was not merged to the version I have tested, Once I will deploy a version with the fix in it I will verify it. Verify with: virt-operator-container-v4.8.0-63
Create 1000 VMs and check the VM status.
script:
----
# /bin/bash
for number in {501..1500}
do
echo "vm: $number "
cp vm-cirros.yaml vm-cirros-new.yaml
sed -i 's|vm-cirros|vm-cirros-'"${number}"'|g' vm-cirros-new.yaml
oc apply -f vm-cirros-new.yaml -n bug-status
done
exit 0
Vms are with status: Running,Scheduling,Pending,Scheduled
Did not see any VM with no status.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2920 |