Bug 1803639

Summary: balanceSimilarNodeGroups doesn't work
Product: OpenShift Container Platform Reporter: sunzhaohua <zhsun>
Component: Cloud ComputeAssignee: Alberto <agarcial>
Cloud Compute sub component: BareMetal Provider QA Contact: sunzhaohua <zhsun>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: agarcial, stbenjam
Version: 4.4Keywords: Regression
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1804826 (view as bug list) Environment:
Last Closed: 2020-08-27 22:35:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1804826    

Description sunzhaohua 2020-02-17 05:31:13 UTC
Description of problem:
balanceSimilarNodeGroups doesn't work


Version-Release number of selected component (if applicable):
4.4.0-0.nightly-2020-02-16-221315

How reproducible:
Always

Steps to Reproduce:
1. Create clusterautoscaler 
apiVersion: autoscaling.openshift.io/v1
kind: ClusterAutoscaler
metadata:
  name: default
spec:
  balanceSimilarNodeGroups: true
  scaleDown:
    delayAfterAdd: 10s
    delayAfterDelete: 10s
    delayAfterFailure: 10s
    enabled: true
    unneededTime: 10s

2. Create machineautoscalers
$ oc get machineautoscalers
NAME                  REF KIND     REF NAME                          MIN   MAX   AGE
machineautoscaler-a   MachineSet   zhsun44-c9lb2-worker-us-east-2a   1     12    122m
machineautoscaler-b   MachineSet   zhsun44-c9lb2-worker-us-east-2b   1     12    120m
machineautoscaler-c   MachineSet   zhsun44-c9lb2-worker-us-east-2c   1     12    32m

3. Add payload to scale up the cluster


Actual results:
Balance only in 1 group.Couldn't see the "splitting scale-up" message from the cluster-autoscaler.

I0217 04:07:42.367422       1 scale_up.go:431] Best option to resize: openshift-machine-api/zhsun44-c9lb2-worker-us-east-2b
I0217 04:07:42.367449       1 scale_up.go:435] Estimated 6 nodes needed in openshift-machine-api/zhsun44-c9lb2-worker-us-east-2b
I0217 04:07:42.367549       1 scale_up.go:540] Final scale-up plan: [{openshift-machine-api/zhsun44-c9lb2-worker-us-east-2b 1->7 (max: 12)}]
I0217 04:07:42.367576       1 scale_up.go:701] Scale-up: setting group openshift-machine-api/zhsun44-c9lb2-worker-us-east-2b size to 7


$ oc get machineset
NAME                              DESIRED   CURRENT   READY   AVAILABLE   AGE
zhsun44-c9lb2-worker-us-east-2a   1         1         1       1           3h12m
zhsun44-c9lb2-worker-us-east-2b   7         7         7       7           3h12m
zhsun44-c9lb2-worker-us-east-2c   1         1         1       1           3h12m


Expected results:
Balance in 3 groups.

Additional info:

Comment 3 sunzhaohua 2020-02-21 09:52:31 UTC
Verify failed.

clusterversion: 4.4.0-0.nightly-2020-02-21-045519

I0221 09:51:55.241296       1 scale_up.go:431] Best option to resize: openshift-machine-api/zhsun2-k9bts-w-b
I0221 09:51:55.241328       1 scale_up.go:435] Estimated 6 nodes needed in openshift-machine-api/zhsun2-k9bts-w-b
I0221 09:51:55.241474       1 scale_up.go:540] Final scale-up plan: [{openshift-machine-api/zhsun2-k9bts-w-b 1->7 (max: 12)}]
I0221 09:51:55.241497       1 scale_up.go:701] Scale-up: setting group openshift-machine-api/zhsun2-k9bts-w-b size to 7

Comment 4 Alberto 2020-02-21 10:00:01 UTC
>clusterversion: 4.4.0-0.nightly-2020-02-21-045519

Please verify against 4.5

Comment 5 sunzhaohua 2020-02-25 03:49:41 UTC
Sorry.
Verified.
clusterversion: 4.5.0-0.ci-2020-02-25-010652

I0225 03:41:01.976494       1 scale_up.go:431] Best option to resize: openshift-machine-api/zhsun45-tcth9-worker-us-east-2c
I0225 03:41:01.976518       1 scale_up.go:435] Estimated 23 nodes needed in openshift-machine-api/zhsun45-tcth9-worker-us-east-2c
I0225 03:41:01.976635       1 scale_up.go:532] Splitting scale-up between 3 similar node groups: {openshift-machine-api/zhsun45-tcth9-worker-us-east-2c, openshift-machine-api/zhsun45-tcth9-worker-us-east-2a, openshift-machine-api/zhsun45-tcth9-worker-us-east-2b}
I0225 03:41:01.976659       1 scale_up.go:540] Final scale-up plan: [{openshift-machine-api/zhsun45-tcth9-worker-us-east-2c 1->9 (max: 12)} {openshift-machine-api/zhsun45-tcth9-worker-us-east-2a 1->9 (max: 12)} {openshift-machine-api/zhsun45-tcth9-worker-us-east-2b 1->8 (max: 12)}]
I0225 03:41:01.976679       1 scale_up.go:701] Scale-up: setting group openshift-machine-api/zhsun45-tcth9-worker-us-east-2c size to 9
I0225 03:41:01.991680       1 scale_up.go:701] Scale-up: setting group openshift-machine-api/zhsun45-tcth9-worker-us-east-2a size to 9
I0225 03:41:02.006312       1 scale_up.go:701] Scale-up: setting group openshift-machine-api/zhsun45-tcth9-worker-us-east-2b size to 8

$ oc get machineset
NAME                              DESIRED   CURRENT   READY   AVAILABLE   AGE
zhsun45-tcth9-worker-us-east-2a   9         9         9       9           33m
zhsun45-tcth9-worker-us-east-2b   8         8         8       8           33m
zhsun45-tcth9-worker-us-east-2c   9         9         9       9           33m

Comment 6 Luke Meyer 2020-08-27 22:35:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409'