Bug 2212198 - both virt-controllers are crashing due to panic
Summary: both virt-controllers are crashing due to panic
Keywords:
Status: CLOSED DUPLICATE of bug 2185068
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.12.3
Hardware: x86_64
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.12.4
Assignee: sgott
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-04 16:33 UTC by Boaz
Modified: 2023-07-03 10:23 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-06 09:42:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CNV-29383 0 None None None 2023-06-04 16:37:25 UTC

Description Boaz 2023-06-04 16:33:14 UTC
I'm running a scale regression setup on :
=========================================
OCP 4.12.3
OpenShift Virtualization 4.12.3

I'm running a large-scale setup with 130 nodes running 6000 VMs using an external RHCS as storage.

during mass VM migration testing in which I initiated 2000 VMs migration, both virt-controllers started crashing in a loop due to panic, I found myself in a situation in which I was unable to initiate any actions, and currently unable to recover.
================================================================================
virt-controller-7887c7c647-8v4t4                       0/1     CrashLoopBackOff   40 (3m57s ago)   10d
virt-controller-7887c7c647-pnjpq                       0/1     CrashLoopBackOff   40 (2m59s ago)   10d
================================================================================
{"component":"virt-controller","level":"info","msg":"Starting disruption budget controller.","pos":"disruptionbudget.go:316","timestamp":"2023-06-04T16:14:40.853832Z"}
{"component":"virt-controller","level":"info","msg":"Starting snapshot controller.","pos":"snapshot_base.go:199","timestamp":"2023-06-04T16:14:40.853820Z"}
{"component":"virt-controller","level":"info","msg":"Starting clone controller","pos":"clone_base.go:149","timestamp":"2023-06-04T16:14:40.853885Z"}
{"component":"virt-controller","level":"info","msg":"Starting vmi controller.","pos":"vmi.go:229","timestamp":"2023-06-04T16:14:40.853842Z"}
{"component":"virt-controller","level":"info","msg":"Starting export controller.","pos":"export.go:290","timestamp":"2023-06-04T16:14:40.854063Z"}
{"component":"virt-controller","level":"info","msg":"TSC Freqency node update status: 0 updated, 129 skipped, 0 errors","pos":"nodetopologyupdater.go:44","timestamp":"2023-06-04T16:14:41.166980Z"}
{"component":"virt-controller","level":"info","msg":"certificate with common name 'virt-controller.openshift-cnv.pod.cluster.local' retrieved.","pos":"cert-manager.go:198","timestamp":"2023-06-04T16:14:43.537128Z"}
{"component":"virt-controller","level":"info","msg":"certificate with common name 'virt-controller.openshift-cnv.pod.cluster.local' retrieved.","pos":"cert-manager.go:198","timestamp":"2023-06-04T16:14:43.537270Z"}
{"component":"virt-controller","level":"info","msg":"certificate with common name 'export.kubevirt.io@1685870363' retrieved.","pos":"cert-manager.go:198","timestamp":"2023-06-04T16:14:43.537273Z"}
{"component":"virt-controller","level":"info","msg":"certificate with common name 'export.kubevirt.io@1685870363' retrieved.","pos":"cert-manager.go:198","timestamp":"2023-06-04T16:14:43.537395Z"}
E0604 16:14:43.755257       1 runtime.go:78] Observed a panic: runtime.boundsError{x:-2, y:0, signed:true, code:0x2} (runtime error: slice bounds out of range [:-2])
goroutine 1279 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1bcac20?, 0xc02b374e10})
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x86
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00096e260?})
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x75
panic({0x1bcac20, 0xc02b374e10})
	/usr/lib/golang/src/runtime/panic.go:884 +0x212
kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).sync(0xc003787880, 0xc003ead4b0, {0xc02b6635e0?, 0x4, 0x4}, {0xc00234e200?, 0x16, 0x20})
	/remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:415 +0x997
kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).execute(0xc003787880, {0xc003d12bb0, 0x9})
	/remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:335 +0x176
kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).Execute(0xc003787880)
	/remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:296 +0x108
kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).runWorker(0xc003333ea0?)
	/remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:286 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0005e69c0?, {0x212aa80, 0xc02b981da0}, 0x1, 0xc003cf8b40)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x1f776b8?, 0xc003333f88?)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x25
created by kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).Run
	/remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:278 +0x275
panic: runtime error: slice bounds out of range [:-2] [recovered]
	panic: runtime error: slice bounds out of range [:-2]

goroutine 1279 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00096e260?})
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0xd7
panic({0x1bcac20, 0xc02b374e10})
	/usr/lib/golang/src/runtime/panic.go:884 +0x212
kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).sync(0xc003787880, 0xc003ead4b0, {0xc02b6635e0?, 0x4, 0x4}, {0xc00234e200?, 0x16, 0x20})
	/remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:415 +0x997
kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).execute(0xc003787880, {0xc003d12bb0, 0x9})
	/remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:335 +0x176
kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).Execute(0xc003787880)
	/remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:296 +0x108
kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).runWorker(0xc003333ea0?)
	/remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:286 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0005e69c0?, {0x212aa80, 0xc02b981da0}, 0x1, 0xc003cf8b40)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x1f776b8?, 0xc003333f88?)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x25
created by kubevirt.io/kubevirt/pkg/virt-controller/watch/drain/evacuation.(*EvacuationController).Run
	/remote-source/app/pkg/virt-controller/watch/drain/evacuation/evacuation.go:278 +0x275
================================================================================
logs:

http://perf148h.perf.lab.eng.bos.redhat.com/share/BZ_logs/virt_controller_panic_during_migration.gz
================================================================================

Comment 1 Itamar Holder 2023-06-05 07:04:12 UTC
This is fixed in https://github.com/kubevirt/kubevirt/commit/5d1b049a5154c72e1b888da5ca392a9b97858995 which was merged to v58 about a week ago.
It should be fixed in the next z-stream.

Comment 2 Fabian Deutsch 2023-06-05 08:34:39 UTC
@iholder are you aware o can you think of any procedure once this bug is hit in order to get to a stable cluster again?

Comment 3 Kedar Bidarkar 2023-06-06 09:42:46 UTC

*** This bug has been marked as a duplicate of bug 2185068 ***


Note You need to log in before you can comment on or make changes to this bug.