Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1884195

Summary: Possible to delete 2 masters simultaneously if kubelet unreachable
Product: OpenShift Container Platform Reporter: Sam Batschelet <sbatsche>
Component: Cloud ComputeAssignee: Michael Gugino <mgugino>
Cloud Compute sub component: Other Providers QA Contact: sunzhaohua <zhsun>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: medium CC: agarcial, mgugino, mimccune, wking, zhsun
Version: 4.6   
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1840358 Environment:
Last Closed: 2021-03-03 04:40:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1840358    
Bug Blocks:    

Comment 1 sunzhaohua 2020-11-09 05:37:02 UTC
This bug's PR is dev-approved and not yet merged, so I'm following DPTP-660 to do pre-merge verification by using cluster-bot to launch a cluster with the open PR.

clusterversion: 4.5.0-0.ci.test-2020-11-09-040048-ci-ln-t9lmsrb
1.  Stop the kubelet on 2/3 master nodes
2.  Delete first stopped master via machine-api
3.  Delete second stopped master via machine-api,nither could be deleted.
$ oc get node
NAME                                         STATUS     ROLES    AGE   VERSION
ip-10-0-154-169.us-west-2.compute.internal   NotReady   master   75m   v1.18.3+10e5708
ip-10-0-170-160.us-west-2.compute.internal   Ready      worker   64m   v1.18.3+10e5708
ip-10-0-177-138.us-west-2.compute.internal   NotReady   master   75m   v1.18.3+10e5708
ip-10-0-184-104.us-west-2.compute.internal   Ready      worker   64m   v1.18.3+10e5708
ip-10-0-204-210.us-west-2.compute.internal   Ready      master   75m   v1.18.3+10e5708
ip-10-0-209-124.us-west-2.compute.internal   Ready      worker   64m   v1.18.3+10e5708

$ oc get machine
NAME                                                PHASE     TYPE        REGION      ZONE         AGE
ci-ln-t9lmsrb-d5d6b-2hj86-master-0                  Running   m5.xlarge   us-west-2   us-west-2a   77m
ci-ln-t9lmsrb-d5d6b-2hj86-master-1                  Running   m5.xlarge   us-west-2   us-west-2b   77m
ci-ln-t9lmsrb-d5d6b-2hj86-master-2                  Running   m5.xlarge   us-west-2   us-west-2a   77m
ci-ln-t9lmsrb-d5d6b-2hj86-worker-us-west-2a-9ds9f   Running   m4.xlarge   us-west-2   us-west-2a   68m
ci-ln-t9lmsrb-d5d6b-2hj86-worker-us-west-2a-lhbdd   Running   m4.xlarge   us-west-2   us-west-2a   68m
ci-ln-t9lmsrb-d5d6b-2hj86-worker-us-west-2b-krfnd   Running   m4.xlarge   us-west-2   us-west-2b   68m

$ oc delete machine ci-ln-t9lmsrb-d5d6b-2hj86-master-0
machine.machine.openshift.io "ci-ln-t9lmsrb-d5d6b-2hj86-master-0" deleted
^C
$ oc delete machine ci-ln-t9lmsrb-d5d6b-2hj86-master-2
machine.machine.openshift.io "ci-ln-t9lmsrb-d5d6b-2hj86-master-2" deleted
^C

So this bug is pre-merge-verified. After the PR gets merged, the bug will be moved to VERIFIED by the bot automatically, if not working, I will move to VERIFIED manually.

Comment 2 Michael McCune 2020-12-04 21:34:56 UTC
the 2133 PR associated with this issue has the necessary labels to merge but is waiting on CI and a discussion that is happening in the comments.

Comment 7 errata-xmlrpc 2021-03-03 04:40:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.5.33 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0428