Bug 1804738
| Summary: | Machine Autoscaler does not remove nodes idempotently | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Joel Speed <jspeed> | |
| Component: | Cloud Compute | Assignee: | Joel Speed <jspeed> | |
| Cloud Compute sub component: | Other Providers | QA Contact: | sunzhaohua <zhsun> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | unspecified | |||
| Priority: | unspecified | CC: | ademicev | |
| Version: | 4.4 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.5.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: | When scaling down, in certain scenarios, the autoscaler would remove more than the intended number of nodes, removing required capacity from the cluster and resulting in a scale up being required and interruption to workloads. | Story Points: | --- | |
| Clone Of: | ||||
| : | 1805153 1805160 (view as bug list) | Environment: | ||
| Last Closed: | 2020-07-13 17:16:11 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1805153 | |||
| 
        
          Description
        
        
          Joel Speed
        
        
        
        
        
          2020-02-19 14:27:19 UTC
        
       Verified 4.4.0-0.nightly-2020-02-21-045519 Only the machine associated with the unregistered node was deleted. $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsun2-k9bts-m-0 Running n1-standard-4 us-central1 us-central1-a 142m zhsun2-k9bts-m-1 Running n1-standard-4 us-central1 us-central1-b 142m zhsun2-k9bts-m-2 Running n1-standard-4 us-central1 us-central1-c 142m zhsun2-k9bts-w-a-45r2z Failed n1-standard-4 us-central1 us-central1-a 16m zhsun2-k9bts-w-a-dd9n2 Running n1-standard-4 us-central1 us-central1-a 136m zhsun2-k9bts-w-a-jc79x Running n1-standard-4 us-central1 us-central1-a 16m zhsun2-k9bts-w-a-z2c9x Running n1-standard-4 us-central1 us-central1-a 16m zhsun2-k9bts-w-b-hvkfh Running n1-standard-4 us-central1 us-central1-b 136m zhsun2-k9bts-w-c-g7h6v Running n1-standard-4 us-central1 us-central1-c 136m $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsun2-k9bts-m-0 Running n1-standard-4 us-central1 us-central1-a 158m zhsun2-k9bts-m-1 Running n1-standard-4 us-central1 us-central1-b 158m zhsun2-k9bts-m-2 Running n1-standard-4 us-central1 us-central1-c 158m zhsun2-k9bts-w-a-dd9n2 Running n1-standard-4 us-central1 us-central1-a 152m zhsun2-k9bts-w-a-jc79x Running n1-standard-4 us-central1 us-central1-a 32m zhsun2-k9bts-w-a-lh8jc Running n1-standard-4 us-central1 us-central1-a 8m26s zhsun2-k9bts-w-a-z2c9x Running n1-standard-4 us-central1 us-central1-a 32m zhsun2-k9bts-w-b-hvkfh Running n1-standard-4 us-central1 us-central1-b 152m zhsun2-k9bts-w-c-g7h6v Running n1-standard-4 us-central1 us-central1-c 152m vefified in 4.5 clusterversion: 4.5.0-0.ci-2020-02-25-010652 Only the machine associated with the unregistered node was deleted. $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsun45-tcth9-master-0 Running m4.xlarge us-east-2 us-east-2a 4h41m zhsun45-tcth9-master-1 Running m4.xlarge us-east-2 us-east-2b 4h41m zhsun45-tcth9-master-2 Running m4.xlarge us-east-2 us-east-2c 4h41m zhsun45-tcth9-worker-us-east-2a-8s4pd Failed m4.large us-east-2 us-east-2a 23m zhsun45-tcth9-worker-us-east-2a-92mmd Running m4.large us-east-2 us-east-2a 23m zhsun45-tcth9-worker-us-east-2a-fstfv Running m4.large us-east-2 us-east-2a 23m zhsun45-tcth9-worker-us-east-2a-km77d Running m4.large us-east-2 us-east-2a 4h35m zhsun45-tcth9-worker-us-east-2a-qtkcd Running m4.large us-east-2 us-east-2a 23m zhsun45-tcth9-worker-us-east-2a-z6rk8 Running m4.large us-east-2 us-east-2a 23m zhsun45-tcth9-worker-us-east-2a-znh2d Running m4.large us-east-2 us-east-2a 23m zhsun45-tcth9-worker-us-east-2b-v6mwk Running m4.large us-east-2 us-east-2b 4h35m zhsun45-tcth9-worker-us-east-2c-fs4r6 Running m4.large us-east-2 us-east-2c 4h35m [szh@localhost installer]$ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsun45-tcth9-master-0 Running m4.xlarge us-east-2 us-east-2a 4h55m zhsun45-tcth9-master-1 Running m4.xlarge us-east-2 us-east-2b 4h55m zhsun45-tcth9-master-2 Running m4.xlarge us-east-2 us-east-2c 4h55m zhsun45-tcth9-worker-us-east-2a-5679h Running m4.large us-east-2 us-east-2a 11m zhsun45-tcth9-worker-us-east-2a-92mmd Running m4.large us-east-2 us-east-2a 37m zhsun45-tcth9-worker-us-east-2a-fstfv Running m4.large us-east-2 us-east-2a 37m zhsun45-tcth9-worker-us-east-2a-km77d Running m4.large us-east-2 us-east-2a 4h49m zhsun45-tcth9-worker-us-east-2a-qtkcd Running m4.large us-east-2 us-east-2a 37m zhsun45-tcth9-worker-us-east-2a-z6rk8 Running m4.large us-east-2 us-east-2a 37m zhsun45-tcth9-worker-us-east-2a-znh2d Running m4.large us-east-2 us-east-2a 37m zhsun45-tcth9-worker-us-east-2b-v6mwk Running m4.large us-east-2 us-east-2b 4h49m zhsun45-tcth9-worker-us-east-2c-fs4r6 Running m4.large us-east-2 us-east-2c 4h49m Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |