Bug 2171395
| Summary: | virt-controller crashes because of out-of-bound slice access in evacuation controller | |||
|---|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Igor Bezukh <ibezukh> | |
| Component: | Virtualization | Assignee: | Igor Bezukh <ibezukh> | |
| Status: | CLOSED ERRATA | QA Contact: | zhe peng <zpeng> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.13.0 | CC: | acardace, kbidarka | |
| Target Milestone: | --- | |||
| Target Release: | 4.13.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | hco-bundle-registry-container-v4.13.0.rhel9-1689 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2185068 (view as bug list) | Environment: | ||
| Last Closed: | 2023-05-18 02:57:49 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2185068 | |||
|
Description
Igor Bezukh
2023-02-20 09:46:27 UTC
test with build:CNV-v4.13.0.rhel9-1884
step:
1. check control-plane components not in worker nodes
$ oc get nodes
NAME STATUS ROLES AGE VERSION
c01-zpeng-413-dff6b-master-0 Ready control-plane,master 43h v1.26.2+dc93b13
c01-zpeng-413-dff6b-master-1 Ready control-plane,master 43h v1.26.2+dc93b13
c01-zpeng-413-dff6b-master-2 Ready control-plane,master 43h v1.26.2+dc93b13
c01-zpeng-413-dff6b-worker-0-fdmgv Ready worker 43h v1.26.2+dc93b13
c01-zpeng-413-dff6b-worker-0-j6bj6 Ready worker 43h v1.26.2+dc93b13
c01-zpeng-413-dff6b-worker-0-jfjgb Ready worker 43h v1.26.2+dc93b13
2. set migration config in kubevirt cr
migrations:
allowAutoConverge: false
allowPostCopy: false
completionTimeoutPerGiB: 800
parallelMigrationsPerCluster: 200
parallelOutboundMigrationsPerNode: 100
progressTimeout: 150
3. add label in worker node
$ oc label node c01-zpeng-413-dff6b-worker-0-fdmgv type=worker001
node/c01-zpeng-413-dff6b-worker-0-fdmgv labeled
4. create 5 vms and add nodeSelector
spec:
nodeSelector:
type: worker001
$ oc get vmi
NAME AGE PHASE IP NODENAME READY
vm-fedora1 17m Running 10.131.0.231 c01-zpeng-413-dff6b-worker-0-fdmgv True
vm-fedora2 15m Running 10.131.0.232 c01-zpeng-413-dff6b-worker-0-fdmgv True
vm-fedora3 11m Running 10.131.0.234 c01-zpeng-413-dff6b-worker-0-fdmgv True
vm-fedora4 8m13s Running 10.131.0.235 c01-zpeng-413-dff6b-worker-0-fdmgv True
vm-fedora5 4m37s Running 10.131.0.236 c01-zpeng-413-dff6b-worker-0-fdmgv True
5. Drain the worker node with label
$ oc adm cordon c01-zpeng-413-dff6b-worker-0-fdmgv
node/c01-zpeng-413-dff6b-worker-0-fdmgv cordoned
$ oc adm drain c01-zpeng-413-dff6b-worker-0-fdmgv --ignore-daemonsets=true --delete-emptydir-data=true
6. make sure there are 5 pending migration
$ oc get vmim
NAME PHASE VMI
kubevirt-evacuation-2hvw6 Scheduling vm-fedora1
kubevirt-evacuation-5tfgc Scheduling vm-fedora2
kubevirt-evacuation-6zkst Scheduling vm-fedora5
kubevirt-evacuation-gzwbx Scheduling vm-fedora4
kubevirt-evacuation-h2tlv Scheduling vm-fedora3
wait more than 10mins , observe the virt-controller pods status
$ oc get pods -n openshift-cnv | grep virt-controller
virt-controller-5cc6f78f8f-nvd59 1/1 Running 0 14m
virt-controller-5cc6f78f8f-s2wdb 1/1 Running 0 43h
no panic happened.
move to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.13.0 Images security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:3205 |