1951413 – [vsphere] cluster not scaling down after load is removed [ panic in logs]

Bug 1951413 - [vsphere] cluster not scaling down after load is removed [ panic in logs]

Summary: [vsphere] cluster not scaling down after load is removed [ panic in logs]

Keywords:
Status:	CLOSED DUPLICATE of bug 1951029
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Joel Speed
QA Contact:	sunzhaohua
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-20 05:56 UTC by Milind Yadav
Modified:	2021-04-20 09:25 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-04-20 09:25:51 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Milind Yadav 2021-04-20 05:56:25 UTC

4.8.0-0.nightly-2021-04-19-225513

Step1 : create machineset with ‘0’ replicas
[miyadav@miyadav ~]$ oc create -f ManualRun/ms_zero.yaml
machineset.machine.openshift.io/miyadav-20-chhnh-worker-zero created
[miyadav@miyadav ~]$ oc get machineset
NAME DESIRED CURRENT READY AVAILABLE AGE
miyadav-20-chhnh-worker 2 2 2 2 45m
miyadav-20-chhnh-worker-zero 0 0 10s

Step2. Create clusterautoscaler
[miyadav@miyadav ~]$ oc create -f ManualRun/cas.yaml
clusterautoscaler.autoscaling.openshift.io/default created
[miyadav@miyadav ~]$ oc get clusterautoscaler
NAME AGE
default 8s

Step3. Create machineautoscaler referring to the machineset created earlier 
[miyadav@miyadav ~]$ vi MAS.yaml
[miyadav@miyadav ~]$ oc create -f MAS.yaml
machineautoscaler.autoscaling.openshift.io/mas1 created
[miyadav@miyadav ~]$ oc get machineautoscaler
NAME REF KIND REF NAME MIN MAX AGE
mas1 MachineSet miyadav-20-chhnh-worker-zero 1 4 8s

Step4 : create load to scale cluster 
[miyadav@miyadav ~]$ vi ManualRun/createLoad.yaml
[miyadav@miyadav ~]$ oc create -f ManualRun/createLoad.yaml
job.batch/work-queue-29mr8 created
cluster autoscaled successfully 

Step 5 : Notice cluster scaled successfully 
[miyadav@miyadav ~]$ oc get machines
NAME PHASE TYPE REGION ZONE AGE
miyadav-20-chhnh-master-0 Running 62m
miyadav-20-chhnh-master-1 Running 62m
miyadav-20-chhnh-master-2 Running 62m
miyadav-20-chhnh-worker-b5bsb Running 57m
miyadav-20-chhnh-worker-zero-9rbvs Running 10m
miyadav-20-chhnh-worker-zero-gldcz Running 10m
miyadav-20-chhnh-worker-zqbrm Running 57m

Step 6: Delete workload 
Expected : cluster would scale down 
Actual :
After I deleted workload the cluster couldnt scaled down , got below error (machine-controller logs ) 
I0420 05:34:47.360995    1 controller.go:218] miyadav-20-chhnh-worker-zero-9rbvs: reconciling machine triggers deleteI0420 05:34:47.360995    1 controller.go:218] miyadav-20-chhnh-worker-zero-9rbvs: reconciling machine triggers deletepanic: runtime error: invalid memory address or nil pointer dereference[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xbf5bfa]
goroutine 564 [running]:golang.org/x/time/rate.(*Limiter).WaitN(0xc000471e00, 0x0, 0x0, 0x1, 0x0, 0x0) /go/src/github.com/openshift/machine-api-operator/vendor/golang.org/x/time/rate/rate.go:237 +0xbagolang.org/x/time/rate.(*Limiter).Wait(...) /go/src/github.com/openshift/machine-api-operator/vendor/golang.org/x/time/rate/rate.go:219k8s.io/client-go/util/flowcontrol.(*tokenBucketRateLimiter).Wait(0xc00002cea0, 0x0, 0x0, 0xc0006588f0, 0xc0000daf30) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/client-go/util/flowcontrol/throttle.go:106 +0x4bk8s.io/client-go/rest.(*Request).tryThrottleWithInfo(0xc000e38870, 0x0, 0x0, 0x0, 0x0, 0x42, 0x40) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/client-go/rest/request.go:587 +0xa5k8s.io/client-go/rest.(*Request).tryThrottle(...) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/client-go/rest/request.go:613k8s.io/client-go/rest.(*Request).request(0xc000e38870, 0x0, 0x0, 0xc0000db520, 0x0, 0x0) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/client-go/rest/request.go:873 +0x2fck8s.io/client-go/rest.(*Request).Do(0xc000e38870, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/client-go/rest/request.go:980 +0xf1k8s.io/client-go/kubernetes/typed/core/v1.(*nodes).Patch(0xc0003b9ab0, 0x0, 0x0, 0xc00062dc80, 0x22, 0x2189da0, 0x26, 0xc001495ba0, 0x1f, 0x20, ...) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/client-go/kubernetes/typed/core/v1/node.go:186 +0x237k8s.io/kubectl/pkg/drain.(*CordonHelper).PatchOrReplaceWithContext(0xc0000db908, 0x0, 0x0, 0x2466408, 0xc0003ce160, 0x20d7300, 0x0, 0xc0000db948, 0x175742f, 0xc0004b8b25) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/kubectl/pkg/drain/cordon.go:102 +0x416k8s.io/kubectl/pkg/drain.RunCordonOrUncordon(0xc000a68340, 0xc0000bac00, 0xc000122301, 0xc00153df50, 0x22) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/kubectl/pkg/drain/default.go:60 +0xb3github.com/openshift/machine-api-operator/pkg/controller/machine.(*ReconcileMachine).drainNode(0xc000470460, 0xc00019e800, 0x219397a, 0x2a) /go/src/github.com/openshift/machine-api-operator/pkg/controller/machine/controller.go:420 +0x49dgithub.com/openshift/machine-api-operator/pkg/controller/machine.(*ReconcileMachine).Reconcile(0xc000470460, 0x24409b8, 0xc00042ff50, 0xc0006279b0, 0x15, 0xc00062d4a0, 0x22, 0xc00042ff50, 0xc000b46000, 0x1f7c660, ...) /go/src/github.com/openshift/machine-api-operator/pkg/controller/machine/controller.go:225 +0x2ddesigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0003d6000, 0x2440910, 0xc00004c040, 0x1ea9da0, 0xc00082cbc0) /go/src/github.com/openshift/machine-api-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298 +0x30dsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0003d6000, 0x2440910, 0xc00004c040, 0xc000770f00) /go/src/github.com/openshift/machine-api-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253 +0x205sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2(0xc00086bfc0, 0xc0003d6000, 0x2440910, 0xc00004c040) /go/src/github.com/openshift/machine-api-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214 +0x6bcreated by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /go/src/github.com/openshift/machine-api-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:210 +0x425


Additional info :
Will attach must-gather in a while

Comment 2 Joel Speed 2021-04-20 09:25:51 UTC

We are already aware of this issue.

This was introduced when we bumped the Kubernetes version of MAO to 1.21.

Updates to the drain library mean that it needs to have a context passed in, which we weren't doing.

This is a duplicate of 1951029

*** This bug has been marked as a duplicate of bug 1951029 ***

Note You need to log in before you can comment on or make changes to this bug.