Bug 1846967 - Worker nodes have different amounts of memory
Summary: Worker nodes have different amounts of memory
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.3.z
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: ---
: 4.5.z
Assignee: Joel Speed
QA Contact: Milind Yadav
URL:
Whiteboard:
Depends On: 1824215
Blocks: 1861732
TreeView+ depends on / blocked
 
Reported: 2020-06-15 09:44 UTC by OpenShift BugZilla Robot
Modified: 2020-08-10 13:50 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Memory capacity of instances of the same type across different failure domains may not be exactly the same Consequence: The autoscaler determine the nodegroups are different and does not balance workloads across different failure domains Fix: Allow a 256MB tolerance on memory capacity across nodegroups/failure domains Result: Autoscaler is more likely to balance workloads across failure domains when using balancesimilarnodegroups
Clone Of:
: 1861732 (view as bug list)
Environment:
Last Closed: 2020-08-10 13:50:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes-autoscaler pull 157 0 None closed [release-4.5] BUG 1846967: Allow small tolerance on memory capacity when comparing nodegroups 2020-09-08 11:44:55 UTC
Red Hat Product Errata RHBA-2020:3188 0 None None None 2020-08-10 13:50:43 UTC

Comment 3 Milind Yadav 2020-07-30 09:35:07 UTC
VALIDATED ON - 

[miyadav@miyadav debug]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-07-29-230326   True        False         61m     Cluster version is 4.5.0-0.nightly-2020-07-29-230326


Test steps:
1. update machineset setting "instanceType: m5.xlarge"
$ oc get machineset
NAME                                  DESIRED   CURRENT   READY   AVAILABLE   AGE
miyadav-b967-hd977-worker-us-east-2a  1         1         1       1           70m
miyadav-b967-hd977-worker-us-east-2b   1         1         1       1           70m
miyadav-b967-hd977-worker-us-east-2c  1         1         1       1           70m
[miyadav@miyadav debug]$ oc get machines
NAME                                         PHASE     TYPE        REGION      ZONE         AGE
miyadav-b967-hd977-master-0                  Running   m4.xlarge   us-east-2   us-east-2a   76m
miyadav-b967-hd977-master-1                  Running   m4.xlarge   us-east-2   us-east-2b   76m
miyadav-b967-hd977-master-2                  Running   m4.xlarge   us-east-2   us-east-2c   76m
miyadav-b967-hd977-worker-us-east-2a-4m5db   Running   m5.xlarge   us-east-2   us-east-2a   4m40s
miyadav-b967-hd977-worker-us-east-2b-5467l   Running   m5.xlarge   us-east-2   us-east-2b   4m33s
miyadav-b967-hd977-worker-us-east-2c-6v7fh   Running   m5.xlarge   us-east-2   us-east-2c   7m31s

[miyadav@miyadav debug]$ oc get node | grep worker
ip-10-0-134-31.us-east-2.compute.internal    Ready    worker   106s    v1.18.3+012b3ec
ip-10-0-168-253.us-east-2.compute.internal   Ready    worker   83s     v1.18.3+012b3ec
ip-10-0-209-12.us-east-2.compute.internal    Ready    worker   5m27s   v1.18.3+012b3ec

miyadav@miyadav debug]$ oc get node ip-10-0-134-31.us-east-2.compute.internal ip-10-0-168-253.us-east-2.compute.internal ip-10-0-209-12.us-east-2.compute.internal -o yaml | grep memory
       
      memory: 14793144Ki
      memory: 15944120Ki
      message: kubelet has sufficient memory available
   
      memory: 14793128Ki
      memory: 15944104Ki
      message: kubelet has sufficient memory available
      
      memory: 14965176Ki
      memory: 16116152Ki


16116152Ki-15944120Ki=172048Ki

2. Create clusterautoscaler with "balanceSimilarNodeGroups: true"
---
apiVersion: "autoscaling.openshift.io/v1"
kind: "ClusterAutoscaler"
metadata:
  name: "default"
spec:
  balanceSimilarNodeGroups: true
  scaleDown:
    enabled: true
    delayAfterAdd: 10s
    delayAfterDelete: 10s
    delayAfterFailure: 10s
    unneededTime: 10s
3. Create 3 machineautoscalers for each machineset
---
apiVersion: "autoscaling.openshift.io/v1beta1"
kind: "MachineAutoscaler"
metadata:
  name: "worker-c"
  namespace: "openshift-machine-api"
spec:
  minReplicas: 1
  maxReplicas: 10
  scaleTargetRef:
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    name: miyadav-b967-hd977-worker-us-east-2a

NAME       REF KIND     REF NAME                               MIN   MAX   AGE
worker-a   MachineSet   miyadav-b967-hd977-worker-us-east-2a   1     10    9m35s
worker-b   MachineSet   miyadav-b967-hd977-worker-us-east-2b   1     10    9m44s
worker-c   MachineSet   miyadav-b967-hd977-worker-us-east-2c   1     10    9m53s
[miyadav@miyadav debug]$ 

4. Create workload
apiVersion: apps/v1
kind: Deployment
metadata:
  name: scale-up
  labels:
    app: scale-up
spec:
  replicas: 40
  selector:
    matchLabels:
      app: scale-up
  template:
    metadata:
      labels:
        app: scale-up
    spec:
      containers:
      - name: busybox
        image: docker.io/library/busybox
        resources:
          requests:
            memory: 4Gi
        command:
        - /bin/sh
        - "-c"
        - "echo 'this should be in the logs' && sleep 86400"
      terminationGracePeriodSeconds: 0
5. Check logs and machineset
I0730 09:23:25.482509       1 scale_up.go:452] Best option to resize: openshift-machine-api/miyadav-b967-hd977-worker-us-east-2a
I0730 09:23:25.482623       1 scale_up.go:456] Estimated 11 nodes needed in openshift-machine-api/miyadav-b967-hd977-worker-us-east-2a
I0730 09:23:26.068725       1 scale_up.go:562] Splitting scale-up between 3 similar node groups: {openshift-machine-api/miyadav-b967-hd977-worker-us-east-2a, openshift-machine-api/miyadav-b967-hd977-worker-us-east-2b, openshift-machine-api/miyadav-b967-hd977-worker-us-east-2c}
I0730 09:23:26.669007       1 scale_up.go:570] Final scale-up plan: [{openshift-machine-api/miyadav-b967-hd977-worker-us-east-2a 1->5 (max: 10)} {openshift-machine-api/miyadav-b967-hd977-worker-us-east-2b 1->5 (max: 10)} {openshift-machine-api/miyadav-b967-hd977-worker-us-east-2c 1->4 (max: 10)}]
I0730 09:23:26.669062       1 scale_up.go:659] Scale-up: setting group openshift-machine-api/miyadav-b967-hd977-worker-us-east-2a size to 5
I0730 09:23:27.274015       1 scale_up.go:659] Scale-up: setting group openshift-machine-api/miyadav-b967-hd977-worker-us-east-2b size to 5
I0730 09:23:27.877056       1 scale_up.go:659] Scale-up: setting group openshift-machine-api/miyadav-b967-hd977-worker-us-east-2c size to 4

[miyadav@miyadav debug]$ oc get machineset
NAME                                   DESIRED   CURRENT   READY   AVAILABLE   AGE
miyadav-b967-hd977-worker-us-east-2a   5         5         1       1           86m
miyadav-b967-hd977-worker-us-east-2b   5         5         1       1           86m
miyadav-b967-hd977-worker-us-east-2c   4         4         1       1           86m

Expected and actual - autoscaler working fine in balancedSimilarNodeGroup : true even when the diff in memory is nearly - 172048Ki

Comment 5 errata-xmlrpc 2020-08-10 13:50:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.5 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3188


Note You need to log in before you can comment on or make changes to this bug.