Bug 1884195 - Possible to delete 2 masters simultaneously if kubelet unreachable
Summary: Possible to delete 2 masters simultaneously if kubelet unreachable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 4.5.z
Assignee: Michael Gugino
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On: 1840358
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-01 10:29 UTC by Sam Batschelet
Modified: 2021-03-03 04:40 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1840358
Environment:
Last Closed: 2021-03-03 04:40:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2133 0 None closed Bug 1884195: etcd-quorum-guard remove toleration timeouts 2021-02-11 14:44:39 UTC
Github openshift origin pull 25700 0 None closed Bug 1884195: test/extended/operators: add etcd-quorum-guard to unevictable whitelist 2021-02-11 14:44:39 UTC
Red Hat Product Errata RHSA-2021:0428 0 None None None 2021-03-03 04:40:56 UTC

Comment 1 sunzhaohua 2020-11-09 05:37:02 UTC
This bug's PR is dev-approved and not yet merged, so I'm following DPTP-660 to do pre-merge verification by using cluster-bot to launch a cluster with the open PR.

clusterversion: 4.5.0-0.ci.test-2020-11-09-040048-ci-ln-t9lmsrb
1.  Stop the kubelet on 2/3 master nodes
2.  Delete first stopped master via machine-api
3.  Delete second stopped master via machine-api,nither could be deleted.
$ oc get node
NAME                                         STATUS     ROLES    AGE   VERSION
ip-10-0-154-169.us-west-2.compute.internal   NotReady   master   75m   v1.18.3+10e5708
ip-10-0-170-160.us-west-2.compute.internal   Ready      worker   64m   v1.18.3+10e5708
ip-10-0-177-138.us-west-2.compute.internal   NotReady   master   75m   v1.18.3+10e5708
ip-10-0-184-104.us-west-2.compute.internal   Ready      worker   64m   v1.18.3+10e5708
ip-10-0-204-210.us-west-2.compute.internal   Ready      master   75m   v1.18.3+10e5708
ip-10-0-209-124.us-west-2.compute.internal   Ready      worker   64m   v1.18.3+10e5708

$ oc get machine
NAME                                                PHASE     TYPE        REGION      ZONE         AGE
ci-ln-t9lmsrb-d5d6b-2hj86-master-0                  Running   m5.xlarge   us-west-2   us-west-2a   77m
ci-ln-t9lmsrb-d5d6b-2hj86-master-1                  Running   m5.xlarge   us-west-2   us-west-2b   77m
ci-ln-t9lmsrb-d5d6b-2hj86-master-2                  Running   m5.xlarge   us-west-2   us-west-2a   77m
ci-ln-t9lmsrb-d5d6b-2hj86-worker-us-west-2a-9ds9f   Running   m4.xlarge   us-west-2   us-west-2a   68m
ci-ln-t9lmsrb-d5d6b-2hj86-worker-us-west-2a-lhbdd   Running   m4.xlarge   us-west-2   us-west-2a   68m
ci-ln-t9lmsrb-d5d6b-2hj86-worker-us-west-2b-krfnd   Running   m4.xlarge   us-west-2   us-west-2b   68m

$ oc delete machine ci-ln-t9lmsrb-d5d6b-2hj86-master-0
machine.machine.openshift.io "ci-ln-t9lmsrb-d5d6b-2hj86-master-0" deleted
^C
$ oc delete machine ci-ln-t9lmsrb-d5d6b-2hj86-master-2
machine.machine.openshift.io "ci-ln-t9lmsrb-d5d6b-2hj86-master-2" deleted
^C

So this bug is pre-merge-verified. After the PR gets merged, the bug will be moved to VERIFIED by the bot automatically, if not working, I will move to VERIFIED manually.

Comment 2 Michael McCune 2020-12-04 21:34:56 UTC
the 2133 PR associated with this issue has the necessary labels to merge but is waiting on CI and a discussion that is happening in the comments.

Comment 7 errata-xmlrpc 2021-03-03 04:40:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.5.33 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0428


Note You need to log in before you can comment on or make changes to this bug.