Bug 1801254 - Cluster Autoscaler ignoring machines unregister as nodes
Summary: Cluster Autoscaler ignoring machines unregister as nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.4.0
Assignee: Alberto
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-10 14:17 UTC by Alberto
Modified: 2020-05-15 16:08 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-15 16:08:17 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift kubernetes-autoscaler pull 122 None closed Bug 1801254: UPSTREAM: <carry>: openshift: Let Nodes() return the list of all machines 2020-09-09 12:40:01 UTC

Description Alberto 2020-02-10 14:17:52 UTC
Description of problem:

The autoscaler expects provider implementations nodeGroups to implement the Nodes() function to return the number of instances belonging to the group regardless of they have become a kubernetes node or not.
This information is then used to realise unregistered nodes https://github.com/kubernetes/autoscaler/blob/bf3a9fb52e3214dff0bea5ef2b97f17ad00a7702/cluster-autoscaler/clusterstate/clusterstate.go#L307-L311

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
The machine API autoscaler implementation of Nodes() filter instances by keeping only those which has a node associated.

Expected results:
The machine API autoscaler implementation of Nodes() return a list of the machines belonging to the group regardless of they have become nodes or not..

Additional info:

Comment 2 sunzhaohua 2020-02-17 09:03:31 UTC
Verified

version
4.4.0-0.nightly-2020-02-17-030345

Increase the log level then scale up and scale down the cluster, the logs look like:

I0217 08:57:33.711498       1 static_autoscaler.go:194] Starting main loop
I0217 08:57:33.711694       1 machineapi_controller.go:468] node "ip-10-0-164-120.us-east-2.compute.internal" is in nodegroup "zhsun4-8f8fk-worker-us-east-2c"
I0217 08:57:33.712126       1 machineapi_controller.go:468] node "ip-10-0-155-147.us-east-2.compute.internal" is in nodegroup "zhsun4-8f8fk-worker-us-east-2b"
I0217 08:57:33.713058       1 machineapi_controller.go:468] node "ip-10-0-164-251.us-east-2.compute.internal" is in nodegroup "zhsun4-8f8fk-worker-us-east-2c"
I0217 08:57:33.713137       1 machineapi_provider.go:62] discovered node group: openshift-machine-api/zhsun4-8f8fk-worker-us-east-2c (min: 1, max: 12, replicas: 5)
I0217 08:57:33.713157       1 machineapi_provider.go:62] discovered node group: openshift-machine-api/zhsun4-8f8fk-worker-us-east-2b (min: 1, max: 12, replicas: 1)
I0217 08:57:33.713192       1 machineapi_provider.go:62] discovered node group: openshift-machine-api/zhsun4-8f8fk-worker-us-east-2b (min: 1, max: 12, replicas: 1)
I0217 08:57:33.713202       1 machineapi_provider.go:62] discovered node group: openshift-machine-api/zhsun4-8f8fk-worker-us-east-2c (min: 1, max: 12, replicas: 5)
I0217 08:57:33.713238       1 machineapi_provider.go:62] discovered node group: openshift-machine-api/zhsun4-8f8fk-worker-us-east-2b (min: 1, max: 12, replicas: 1)
I0217 08:57:33.713249       1 machineapi_provider.go:62] discovered node group: openshift-machine-api/zhsun4-8f8fk-worker-us-east-2c (min: 1, max: 12, replicas: 5)
I0217 08:57:33.713280       1 machineapi_provider.go:62] discovered node group: openshift-machine-api/zhsun4-8f8fk-worker-us-east-2c (min: 1, max: 12, replicas: 5)
I0217 08:57:33.713300       1 machineapi_provider.go:62] discovered node group: openshift-machine-api/zhsun4-8f8fk-worker-us-east-2b (min: 1, max: 12, replicas: 1)
I0217 08:57:33.713402       1 machineapi_provider.go:62] discovered node group: openshift-machine-api/zhsun4-8f8fk-worker-us-east-2c (min: 1, max: 12, replicas: 5)
I0217 08:57:33.713443       1 machineapi_provider.go:62] discovered node group: openshift-machine-api/zhsun4-8f8fk-worker-us-east-2b (min: 1, max: 12, replicas: 1)
W0217 08:57:33.713563       1 machineapi_controller.go:305] Machine "zhsun4-8f8fk-worker-us-east-2c-xzvjm" has no providerID
I0217 08:57:33.713634       1 machineapi_controller.go:314] Status.NodeRef of machine "zhsun4-8f8fk-worker-us-east-2c-xzvjm" is currently nil
W0217 08:57:33.713709       1 machineapi_controller.go:305] Machine "zhsun4-8f8fk-worker-us-east-2c-jw75v" has no providerID
I0217 08:57:33.713727       1 machineapi_controller.go:314] Status.NodeRef of machine "zhsun4-8f8fk-worker-us-east-2c-jw75v" is currently nil
W0217 08:57:33.713738       1 machineapi_controller.go:305] Machine "zhsun4-8f8fk-worker-us-east-2c-92n2f" has no providerID
I0217 08:57:33.713747       1 machineapi_controller.go:314] Status.NodeRef of machine "zhsun4-8f8fk-worker-us-east-2c-92n2f" is currently nil
W0217 08:57:33.713758       1 machineapi_controller.go:305] Machine "zhsun4-8f8fk-worker-us-east-2c-dd4kl" has no providerID
I0217 08:57:33.713767       1 machineapi_controller.go:314] Status.NodeRef of machine "zhsun4-8f8fk-worker-us-east-2c-dd4kl" is currently nil
I0217 08:57:33.713777       1 machineapi_controller.go:333] nodegroup zhsun4-8f8fk-worker-us-east-2c has nodes [aws:///us-east-2c/i-01e6048050f92ed41 aws:///us-east-2c/i-05354377c10c6214a aws:///us-east-2c/i-0ce155e9e13f0e7d8]


I0217 08:43:02.181867       1 static_autoscaler.go:452] Starting scale down
I0217 08:43:02.181952       1 machineapi_provider.go:62] discovered node group: openshift-machine-api/zhsun4-8f8fk-worker-us-east-2c (min: 1, max: 12, replicas: 3)
I0217 08:43:02.181975       1 machineapi_provider.go:62] discovered node group: openshift-machine-api/zhsun4-8f8fk-worker-us-east-2b (min: 1, max: 12, replicas: 1)
I0217 08:43:02.181998       1 scale_down.go:776] No candidates for scale down
I0217 08:43:02.182060       1 machineapi_provider.go:62] discovered node group: openshift-machine-api/zhsun4-8f8fk-worker-us-east-2b (min: 1, max: 12, replicas: 1)
I0217 08:43:02.182082       1 machineapi_provider.go:62] discovered node group: openshift-machine-api/zhsun4-8f8fk-worker-us-east-2c (min: 1, max: 12, replicas: 3)
I0217 08:43:11.490495       1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0217 08:43:11.490587       1 machineapi_provider.go:62] discovered node group: openshift-machine-api/zhsun4-8f8fk-worker-us-east-2c (min: 1, max: 12, replicas: 3)
I0217 08:43:11.490601       1 machineapi_provider.go:62] discovered node group: openshift-machine-api/zhsun4-8f8fk-worker-us-east-2b (min: 1, max: 12, replicas: 1)
I0217 08:43:11.490756       1 machineapi_controller.go:333] nodegroup zhsun4-8f8fk-worker-us-east-2c has nodes [aws:///us-east-2c/i-05354377c10c6214a aws:///us-east-2c/i-01e6048050f92ed41 aws:///us-east-2c/i-0ce155e9e13f0e7d8]
I0217 08:43:11.490895       1 machineapi_controller.go:333] nodegroup zhsun4-8f8fk-worker-us-east-2b has nodes [aws:///us-east-2b/i-05a62a7cc58eef452]
I0217 08:43:11.490956       1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 428.11µs
I0217 08:43:12.193624       1 static_autoscaler.go:194] Starting main loop


Note You need to log in before you can comment on or make changes to this bug.