Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1951413

Summary: [vsphere] cluster not scaling down after load is removed [ panic in logs]
Product: OpenShift Container Platform Reporter: Milind Yadav <miyadav>
Component: Cloud ComputeAssignee: Joel Speed <jspeed>
Cloud Compute sub component: Cluster Autoscaler QA Contact: sunzhaohua <zhsun>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs
Version: 4.8   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-20 09:25:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Milind Yadav 2021-04-20 05:56:25 UTC
4.8.0-0.nightly-2021-04-19-225513

Step1 : create machineset with ‘0’ replicas
[miyadav@miyadav ~]$ oc create -f ManualRun/ms_zero.yaml
machineset.machine.openshift.io/miyadav-20-chhnh-worker-zero created
[miyadav@miyadav ~]$ oc get machineset
NAME DESIRED CURRENT READY AVAILABLE AGE
miyadav-20-chhnh-worker 2 2 2 2 45m
miyadav-20-chhnh-worker-zero 0 0 10s

Step2. Create clusterautoscaler
[miyadav@miyadav ~]$ oc create -f ManualRun/cas.yaml
clusterautoscaler.autoscaling.openshift.io/default created
[miyadav@miyadav ~]$ oc get clusterautoscaler
NAME AGE
default 8s

Step3. Create machineautoscaler referring to the machineset created earlier 
[miyadav@miyadav ~]$ vi MAS.yaml
[miyadav@miyadav ~]$ oc create -f MAS.yaml
machineautoscaler.autoscaling.openshift.io/mas1 created
[miyadav@miyadav ~]$ oc get machineautoscaler
NAME REF KIND REF NAME MIN MAX AGE
mas1 MachineSet miyadav-20-chhnh-worker-zero 1 4 8s

Step4 : create load to scale cluster 
[miyadav@miyadav ~]$ vi ManualRun/createLoad.yaml
[miyadav@miyadav ~]$ oc create -f ManualRun/createLoad.yaml
job.batch/work-queue-29mr8 created
cluster autoscaled successfully 

Step 5 : Notice cluster scaled successfully 
[miyadav@miyadav ~]$ oc get machines
NAME PHASE TYPE REGION ZONE AGE
miyadav-20-chhnh-master-0 Running 62m
miyadav-20-chhnh-master-1 Running 62m
miyadav-20-chhnh-master-2 Running 62m
miyadav-20-chhnh-worker-b5bsb Running 57m
miyadav-20-chhnh-worker-zero-9rbvs Running 10m
miyadav-20-chhnh-worker-zero-gldcz Running 10m
miyadav-20-chhnh-worker-zqbrm Running 57m

Step 6: Delete workload 
Expected : cluster would scale down 
Actual :
After I deleted workload the cluster couldnt scaled down , got below error (machine-controller logs ) 
I0420 05:34:47.360995    1 controller.go:218] miyadav-20-chhnh-worker-zero-9rbvs: reconciling machine triggers deleteI0420 05:34:47.360995    1 controller.go:218] miyadav-20-chhnh-worker-zero-9rbvs: reconciling machine triggers deletepanic: runtime error: invalid memory address or nil pointer dereference[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xbf5bfa]
goroutine 564 [running]:golang.org/x/time/rate.(*Limiter).WaitN(0xc000471e00, 0x0, 0x0, 0x1, 0x0, 0x0) /go/src/github.com/openshift/machine-api-operator/vendor/golang.org/x/time/rate/rate.go:237 +0xbagolang.org/x/time/rate.(*Limiter).Wait(...) /go/src/github.com/openshift/machine-api-operator/vendor/golang.org/x/time/rate/rate.go:219k8s.io/client-go/util/flowcontrol.(*tokenBucketRateLimiter).Wait(0xc00002cea0, 0x0, 0x0, 0xc0006588f0, 0xc0000daf30) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/client-go/util/flowcontrol/throttle.go:106 +0x4bk8s.io/client-go/rest.(*Request).tryThrottleWithInfo(0xc000e38870, 0x0, 0x0, 0x0, 0x0, 0x42, 0x40) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/client-go/rest/request.go:587 +0xa5k8s.io/client-go/rest.(*Request).tryThrottle(...) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/client-go/rest/request.go:613k8s.io/client-go/rest.(*Request).request(0xc000e38870, 0x0, 0x0, 0xc0000db520, 0x0, 0x0) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/client-go/rest/request.go:873 +0x2fck8s.io/client-go/rest.(*Request).Do(0xc000e38870, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/client-go/rest/request.go:980 +0xf1k8s.io/client-go/kubernetes/typed/core/v1.(*nodes).Patch(0xc0003b9ab0, 0x0, 0x0, 0xc00062dc80, 0x22, 0x2189da0, 0x26, 0xc001495ba0, 0x1f, 0x20, ...) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/client-go/kubernetes/typed/core/v1/node.go:186 +0x237k8s.io/kubectl/pkg/drain.(*CordonHelper).PatchOrReplaceWithContext(0xc0000db908, 0x0, 0x0, 0x2466408, 0xc0003ce160, 0x20d7300, 0x0, 0xc0000db948, 0x175742f, 0xc0004b8b25) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/kubectl/pkg/drain/cordon.go:102 +0x416k8s.io/kubectl/pkg/drain.RunCordonOrUncordon(0xc000a68340, 0xc0000bac00, 0xc000122301, 0xc00153df50, 0x22) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/kubectl/pkg/drain/default.go:60 +0xb3github.com/openshift/machine-api-operator/pkg/controller/machine.(*ReconcileMachine).drainNode(0xc000470460, 0xc00019e800, 0x219397a, 0x2a) /go/src/github.com/openshift/machine-api-operator/pkg/controller/machine/controller.go:420 +0x49dgithub.com/openshift/machine-api-operator/pkg/controller/machine.(*ReconcileMachine).Reconcile(0xc000470460, 0x24409b8, 0xc00042ff50, 0xc0006279b0, 0x15, 0xc00062d4a0, 0x22, 0xc00042ff50, 0xc000b46000, 0x1f7c660, ...) /go/src/github.com/openshift/machine-api-operator/pkg/controller/machine/controller.go:225 +0x2ddesigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0003d6000, 0x2440910, 0xc00004c040, 0x1ea9da0, 0xc00082cbc0) /go/src/github.com/openshift/machine-api-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298 +0x30dsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0003d6000, 0x2440910, 0xc00004c040, 0xc000770f00) /go/src/github.com/openshift/machine-api-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253 +0x205sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2(0xc00086bfc0, 0xc0003d6000, 0x2440910, 0xc00004c040) /go/src/github.com/openshift/machine-api-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214 +0x6bcreated by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /go/src/github.com/openshift/machine-api-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:210 +0x425


Additional info :
Will attach must-gather in a while

Comment 2 Joel Speed 2021-04-20 09:25:51 UTC
We are already aware of this issue.

This was introduced when we bumped the Kubernetes version of MAO to 1.21.

Updates to the drain library mean that it needs to have a context passed in, which we weren't doing.

This is a duplicate of 1951029

*** This bug has been marked as a duplicate of bug 1951029 ***