Bug 1633944

Summary: [CA] Using option "balance-similar-node-groups", cluster-autoscaler logs output incorrect
Product: OpenShift Container Platform Reporter: sunzhaohua <zhsun>
Component: Cloud ComputeAssignee: Alberto <agarcial>
Cloud Compute sub component: Other Providers QA Contact: sunzhaohua <zhsun>
Status: CLOSED DEFERRED Docs Contact:
Severity: low    
Priority: low CC: bperkins, jhou, mgugino
Version: 3.11.0   
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-19 00:16:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description sunzhaohua 2018-09-28 07:30:58 UTC
Description of problem:
Using option "balance-similar-node-groups", cluster-autoscaler logs output incorrect

Version-Release number of selected component (if applicable):
[ec2-user@ip-172-18-15-194 ~]$ oc version
oc v3.11.16
kubernetes v1.11.0+d4cacc0


How reproducible:
Always

Steps to Reproduce:
1. install cluster-autoscaler, pass "--balance-similar-node-groups=true" argument to the cluster-autoscaler
2. create some pods to scale up the cluster
3. $ oc logs -f cluster-autoscaler-7f55f47577-zr2px



Actual results:
In fact 3 groups are resized and nodes in different groups as well, but logs output resize only 1 group and all nodes needed in 1 group. So the log "Best option to resize: 186-ASG1" and "Estimated 3 nodes needed in 186-ASG1" is incorrect.


$ oc logs -f cluster-autoscaler-7f55f47577-zr2px
....
I0928 05:58:34.445708       1 scale_up.go:199] Best option to resize: 186-ASG1
I0928 05:58:34.445743       1 scale_up.go:203] Estimated 3 nodes needed in 186-ASG1
I0928 05:58:34.445832       1 scale_up.go:284] Splitting scale-up between 3 similar node groups: {186-ASG1, 186-ASG, 186-ASG2}
I0928 05:58:34.828711       1 scale_up.go:292] Final scale-up plan: [{186-ASG1 0->1 (max: 3)} {186-ASG 0->1 (max: 2)} {186-ASG2 0->1 (max: 5)}]
I0928 05:58:34.828752       1 scale_up.go:344] Scale-up: setting group 186-ASG1 size to 1
I0928 05:58:34.972487       1 aws_manager.go:305] Setting asg 186-ASG1 size to 1
I0928 05:58:35.241784       1 scale_up.go:344] Scale-up: setting group 186-ASG size to 1
I0928 05:58:35.242586       1 factory.go:33] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"cluster-autoscaler", Name:"cluster-autoscaler-status", UID:"f24d4666-c2c7-11e8-8056-0e44276ce1fe", APIVersion:"v1", ResourceVersion:"37834", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: group 186-ASG1 size set to 1
I0928 05:58:35.357646       1 aws_manager.go:305] Setting asg 186-ASG size to 1
I0928 05:58:35.589206       1 scale_up.go:344] Scale-up: setting group 186-ASG2 size to 1
I0928 05:58:35.589544       1 factory.go:33] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"cluster-autoscaler", Name:"cluster-autoscaler-status", UID:"f24d4666-c2c7-11e8-8056-0e44276ce1fe", APIVersion:"v1", ResourceVersion:"37834", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: group 186-ASG size set to 1
I0928 05:58:35.697493       1 aws_manager.go:305] Setting asg 186-ASG2 size to 1



Expected results:
output correct log 

Additional info:

Comment 2 Andrew McDermott 2019-08-27 10:43:19 UTC
On 3.11, this could be the same issue found here:
- https://bugzilla.redhat.com/show_bug.cgi?id=1731011
- https://bugzilla.redhat.com/show_bug.cgi?id=1733235

Comment 3 Michael Gugino 2020-05-19 00:16:14 UTC
This bug has been inactive for quite sometime.  There are no attached customer cases, we closing for now.  If you still require a fix, please feel free to update the BZ with additional information.