Bug 1883978
Summary: | [oVirt] autoscaler detects nodes as unregistered and tries to delete them | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Gal Zaidman <gzaidman> |
Component: | Cloud Compute | Assignee: | Gal Zaidman <gzaidman> |
Cloud Compute sub component: | oVirt Provider | QA Contact: | Guilherme Santos <gdeolive> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | high | CC: | aoconnor, apjagtap |
Version: | 4.6 | ||
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:47:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Gal Zaidman
2020-09-30 16:27:45 UTC
*** Bug 1883979 has been marked as a duplicate of this bug. *** *** Bug 1881051 has been marked as a duplicate of this bug. *** *** Bug 1880136 has been marked as a duplicate of this bug. *** Verified on: openshift-4.6.0-0.nightly-2020-10-02-065738 Steps: 1. Have OCP with 3 masters and 3 workers 2. # cat cluster_autoscaler.yaml apiVersion: "autoscaling.openshift.io/v1" kind: "ClusterAutoscaler" metadata: name: "default" spec: podPriorityThreshold: -10 resourceLimits: maxNodesTotal: 9 cores: min: 24 max: 40 memory: min: 96 max: 256 scaleDown: enabled: true delayAfterAdd: 10s delayAfterDelete: 10s delayAfterFailure: 30s unneededTime: 30s 3. # oc create -f cluster_autoscaler.yaml 4. # cat machine_autoscaler.yaml apiVersion: "autoscaling.openshift.io/v1beta1" kind: "MachineAutoscaler" metadata: name: "primary-jnzvt-worker-0" namespace: "openshift-machine-api" spec: minReplicas: 3 maxReplicas: 6 scaleTargetRef: apiVersion: machine.openshift.io/v1beta1 kind: MachineSet name: primary-jnzvt-worker-0 5. # oc create -f machine_autoscaler.yaml 6. # oc apply -f - <<EOF apiVersion: v1 kind: Namespace metadata: name: autoscaler-demo EOF 7. # cat scale-up.yaml apiVersion: apps/v1 kind: Deployment metadata: name: scale-up labels: app: scale-up spec: replicas: 18 selector: matchLabels: app: scale-up template: metadata: labels: app: scale-up spec: containers: - name: origin-base image: openshift/origin-base resources: requests: memory: 2Gi command: - /bin/sh - "-c" - "echo 'this should be in the logs' && sleep 86400" terminationGracePeriodSeconds: 0 8. # oc apply -n autoscaler-demo -f scale-up.yaml 9. Waited around 1 hr and checked the results: # oc get pods -n autoscaler-demo && oc get machine -n openshift-machine-api NAME READY STATUS RESTARTS AGE scale-up-5fd5c67f64-24wk5 1/1 Running 0 78m scale-up-5fd5c67f64-2qd9c 1/1 Running 0 79m scale-up-5fd5c67f64-58tjg 1/1 Running 0 79m scale-up-5fd5c67f64-7fh9q 1/1 Running 0 78m scale-up-5fd5c67f64-7q5bc 1/1 Running 0 79m scale-up-5fd5c67f64-cdhv5 1/1 Running 0 79m scale-up-5fd5c67f64-cv4rl 1/1 Running 0 79m scale-up-5fd5c67f64-fkq9t 1/1 Running 0 79m scale-up-5fd5c67f64-grzl2 1/1 Running 0 79m scale-up-5fd5c67f64-jb57z 1/1 Running 0 79m scale-up-5fd5c67f64-jhnp9 1/1 Running 0 79m scale-up-5fd5c67f64-rtq82 1/1 Running 0 79m scale-up-5fd5c67f64-v5pnd 1/1 Running 0 79m scale-up-5fd5c67f64-vfq6r 1/1 Running 0 79m scale-up-5fd5c67f64-wv9ld 1/1 Running 0 79m scale-up-5fd5c67f64-xmgq6 1/1 Running 0 78m scale-up-5fd5c67f64-z4wtj 1/1 Running 0 79m scale-up-5fd5c67f64-zm4j9 1/1 Running 0 79m NAME PHASE TYPE REGION ZONE AGE primary-8hhkw-master-0 Running 3h51m primary-8hhkw-master-1 Running 3h51m primary-8hhkw-master-2 Running 3h51m primary-8hhkw-worker-0-4xvnf Running 129m primary-8hhkw-worker-0-pb5kv Running 3h40m primary-8hhkw-worker-0-pcntp Running 129m primary-8hhkw-worker-0-sxkl2 Running 78m 10. Changed the scale-up.yaml to scaled to 24 containers waited around 20 min and checked the results # oc get pods -n autoscaler-demo && oc get machine -n openshift-machine-api NAME READY STATUS RESTARTS AGE scale-up-5fd5c67f64-24wk5 1/1 Running 0 97m scale-up-5fd5c67f64-2qd9c 1/1 Running 0 99m scale-up-5fd5c67f64-58tjg 1/1 Running 0 99m scale-up-5fd5c67f64-7fh9q 1/1 Running 0 97m scale-up-5fd5c67f64-7q5bc 1/1 Running 0 99m scale-up-5fd5c67f64-8tx9c 1/1 Running 0 28m scale-up-5fd5c67f64-b9zk9 1/1 Running 0 28m scale-up-5fd5c67f64-cdhv5 1/1 Running 0 99m scale-up-5fd5c67f64-cv4rl 1/1 Running 0 99m scale-up-5fd5c67f64-fkq9t 1/1 Running 0 99m scale-up-5fd5c67f64-fkxhf 1/1 Running 0 28m scale-up-5fd5c67f64-grzl2 1/1 Running 0 99m scale-up-5fd5c67f64-jb57z 1/1 Running 0 99m scale-up-5fd5c67f64-jhnp9 1/1 Running 0 99m scale-up-5fd5c67f64-qxnfk 1/1 Running 0 27m scale-up-5fd5c67f64-rtq82 1/1 Running 0 99m scale-up-5fd5c67f64-v5pnd 1/1 Running 0 99m scale-up-5fd5c67f64-vfq6r 1/1 Running 0 99m scale-up-5fd5c67f64-wv9ld 1/1 Running 0 99m scale-up-5fd5c67f64-xm72p 1/1 Running 0 27m scale-up-5fd5c67f64-xmgq6 1/1 Running 0 97m scale-up-5fd5c67f64-xn6cb 1/1 Running 0 28m scale-up-5fd5c67f64-z4wtj 1/1 Running 0 99m scale-up-5fd5c67f64-zm4j9 1/1 Running 0 99m NAME PHASE TYPE REGION ZONE AGE primary-8hhkw-master-0 Running 4h10m primary-8hhkw-master-1 Running 4h10m primary-8hhkw-master-2 Running 4h10m primary-8hhkw-worker-0-4xvnf Running 148m primary-8hhkw-worker-0-pb5kv Running 4h primary-8hhkw-worker-0-pcntp Running 148m primary-8hhkw-worker-0-sxkl2 Running 97m primary-8hhkw-worker-0-zjr6g Running 28m 11. Checked the provider ID in the new machines: # oc describe {node,machine}/{primary-8hhkw-worker-0-sxkl2,primary-8hhkw-worker-0-zjr6g} -n openshift-machine-api | egrep "ProviderID|Provider ID" ProviderID: ovirt://c17bec45-2237-433c-9661-ccdce2aa3dbc ProviderID: ovirt://39bfbf06-d803-4bf6-ab1e-bb3a4324013a Provider ID: ovirt://c17bec45-2237-433c-9661-ccdce2aa3dbc Provider ID: ovirt://39bfbf06-d803-4bf6-ab1e-bb3a4324013a Results: Extra machines created successfully, provider ID matching and new machines + new containers stable over time Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |