Bug 1846967

Summary:	Worker nodes have different amounts of memory
Product:	OpenShift Container Platform	Reporter:	OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component:	Cloud Compute	Assignee:	Joel Speed <jspeed>
Cloud Compute sub component:	Other Providers	QA Contact:	Milind Yadav <miyadav>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	medium	CC:	agarcial, fan-wxa, jspeed, mfuruta, rh-container
Version:	4.3.z
Target Milestone:	---
Target Release:	4.5.z
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: Memory capacity of instances of the same type across different failure domains may not be exactly the same Consequence: The autoscaler determine the nodegroups are different and does not balance workloads across different failure domains Fix: Allow a 256MB tolerance on memory capacity across nodegroups/failure domains Result: Autoscaler is more likely to balance workloads across failure domains when using balancesimilarnodegroups	Story Points:	---
Clone Of:
Clones:	1861732 (view as bug list)		Environment:
Last Closed:	2020-08-10 13:50:20 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1824215
Bug Blocks:	1861732

Comment 3 Milind Yadav 2020-07-30 09:35:07 UTC

VALIDATED ON - 

[miyadav@miyadav debug]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-07-29-230326   True        False         61m     Cluster version is 4.5.0-0.nightly-2020-07-29-230326


Test steps:
1. update machineset setting "instanceType: m5.xlarge"
$ oc get machineset
NAME                                  DESIRED   CURRENT   READY   AVAILABLE   AGE
miyadav-b967-hd977-worker-us-east-2a  1         1         1       1           70m
miyadav-b967-hd977-worker-us-east-2b   1         1         1       1           70m
miyadav-b967-hd977-worker-us-east-2c  1         1         1       1           70m
[miyadav@miyadav debug]$ oc get machines
NAME                                         PHASE     TYPE        REGION      ZONE         AGE
miyadav-b967-hd977-master-0                  Running   m4.xlarge   us-east-2   us-east-2a   76m
miyadav-b967-hd977-master-1                  Running   m4.xlarge   us-east-2   us-east-2b   76m
miyadav-b967-hd977-master-2                  Running   m4.xlarge   us-east-2   us-east-2c   76m
miyadav-b967-hd977-worker-us-east-2a-4m5db   Running   m5.xlarge   us-east-2   us-east-2a   4m40s
miyadav-b967-hd977-worker-us-east-2b-5467l   Running   m5.xlarge   us-east-2   us-east-2b   4m33s
miyadav-b967-hd977-worker-us-east-2c-6v7fh   Running   m5.xlarge   us-east-2   us-east-2c   7m31s

[miyadav@miyadav debug]$ oc get node | grep worker
ip-10-0-134-31.us-east-2.compute.internal    Ready    worker   106s    v1.18.3+012b3ec
ip-10-0-168-253.us-east-2.compute.internal   Ready    worker   83s     v1.18.3+012b3ec
ip-10-0-209-12.us-east-2.compute.internal    Ready    worker   5m27s   v1.18.3+012b3ec

miyadav@miyadav debug]$ oc get node ip-10-0-134-31.us-east-2.compute.internal ip-10-0-168-253.us-east-2.compute.internal ip-10-0-209-12.us-east-2.compute.internal -o yaml | grep memory
       
      memory: 14793144Ki
      memory: 15944120Ki
      message: kubelet has sufficient memory available
   
      memory: 14793128Ki
      memory: 15944104Ki
      message: kubelet has sufficient memory available
      
      memory: 14965176Ki
      memory: 16116152Ki


16116152Ki-15944120Ki=172048Ki

2. Create clusterautoscaler with "balanceSimilarNodeGroups: true"
---
apiVersion: "autoscaling.openshift.io/v1"
kind: "ClusterAutoscaler"
metadata:
  name: "default"
spec:
  balanceSimilarNodeGroups: true
  scaleDown:
    enabled: true
    delayAfterAdd: 10s
    delayAfterDelete: 10s
    delayAfterFailure: 10s
    unneededTime: 10s
3. Create 3 machineautoscalers for each machineset
---
apiVersion: "autoscaling.openshift.io/v1beta1"
kind: "MachineAutoscaler"
metadata:
  name: "worker-c"
  namespace: "openshift-machine-api"
spec:
  minReplicas: 1
  maxReplicas: 10
  scaleTargetRef:
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    name: miyadav-b967-hd977-worker-us-east-2a

NAME       REF KIND     REF NAME                               MIN   MAX   AGE
worker-a   MachineSet   miyadav-b967-hd977-worker-us-east-2a   1     10    9m35s
worker-b   MachineSet   miyadav-b967-hd977-worker-us-east-2b   1     10    9m44s
worker-c   MachineSet   miyadav-b967-hd977-worker-us-east-2c   1     10    9m53s
[miyadav@miyadav debug]$ 

4. Create workload
apiVersion: apps/v1
kind: Deployment
metadata:
  name: scale-up
  labels:
    app: scale-up
spec:
  replicas: 40
  selector:
    matchLabels:
      app: scale-up
  template:
    metadata:
      labels:
        app: scale-up
    spec:
      containers:
      - name: busybox
        image: docker.io/library/busybox
        resources:
          requests:
            memory: 4Gi
        command:
        - /bin/sh
        - "-c"
        - "echo 'this should be in the logs' && sleep 86400"
      terminationGracePeriodSeconds: 0
5. Check logs and machineset
I0730 09:23:25.482509       1 scale_up.go:452] Best option to resize: openshift-machine-api/miyadav-b967-hd977-worker-us-east-2a
I0730 09:23:25.482623       1 scale_up.go:456] Estimated 11 nodes needed in openshift-machine-api/miyadav-b967-hd977-worker-us-east-2a
I0730 09:23:26.068725       1 scale_up.go:562] Splitting scale-up between 3 similar node groups: {openshift-machine-api/miyadav-b967-hd977-worker-us-east-2a, openshift-machine-api/miyadav-b967-hd977-worker-us-east-2b, openshift-machine-api/miyadav-b967-hd977-worker-us-east-2c}
I0730 09:23:26.669007       1 scale_up.go:570] Final scale-up plan: [{openshift-machine-api/miyadav-b967-hd977-worker-us-east-2a 1->5 (max: 10)} {openshift-machine-api/miyadav-b967-hd977-worker-us-east-2b 1->5 (max: 10)} {openshift-machine-api/miyadav-b967-hd977-worker-us-east-2c 1->4 (max: 10)}]
I0730 09:23:26.669062       1 scale_up.go:659] Scale-up: setting group openshift-machine-api/miyadav-b967-hd977-worker-us-east-2a size to 5
I0730 09:23:27.274015       1 scale_up.go:659] Scale-up: setting group openshift-machine-api/miyadav-b967-hd977-worker-us-east-2b size to 5
I0730 09:23:27.877056       1 scale_up.go:659] Scale-up: setting group openshift-machine-api/miyadav-b967-hd977-worker-us-east-2c size to 4

[miyadav@miyadav debug]$ oc get machineset
NAME                                   DESIRED   CURRENT   READY   AVAILABLE   AGE
miyadav-b967-hd977-worker-us-east-2a   5         5         1       1           86m
miyadav-b967-hd977-worker-us-east-2b   5         5         1       1           86m
miyadav-b967-hd977-worker-us-east-2c   4         4         1       1           86m

Expected and actual - autoscaler working fine in balancedSimilarNodeGroup : true even when the diff in memory is nearly - 172048Ki

Comment 5 errata-xmlrpc 2020-08-10 13:50:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.5 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3188