2095703 – machinedeletionhooks doesn't work in vsphere cluster and BM cluster

Bug 2095703 - machinedeletionhooks doesn't work in vsphere cluster and BM cluster

Summary: machinedeletionhooks doesn't work in vsphere cluster and BM cluster

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Etcd
Sub Component:
Version:	4.11
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Thomas Jungblut
QA Contact:	ge liu
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2094919 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-06-10 09:41 UTC by ge liu
Modified:	2022-08-10 11:17 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 11:17:24 UTC
Target Upstream Version:
Embargoed:
Flags:	tjungblu: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-etcd-operator pull 858	0	None	open	Bug 2095703: fix dualstack member removal	2022-06-16 08:44:55 UTC
Red Hat Product Errata	RHSA-2022:5069	0	None	None	None	2022-08-10 11:17:40 UTC

Comment 2 ge liu 2022-06-13 01:32:13 UTC

This feature seem have been disabled on vsphere according to pr: https://github.com/openshift/origin/pull/27236

Comment 3 Sandeep 2022-06-13 16:03:53 UTC

Same issue is observed on BM platform as well. 
ocp version : 4.11.0-0.nightly-arm64-2022-06-09-060907



steps followed: 
oc delete machine <adistefa-ipibm-vjwqv-master-0> before adding new machine. 


The machine gets deleted. 

oc get machines
NAME                                        PHASE     TYPE   REGION   ZONE   AGE
adistefa-ipibm-vjwqv-master-1               Running                          20h
adistefa-ipibm-vjwqv-worker-0-qhqc6         Running                          20h
adistefa-ipibm-vjwqv-worker-0-xgd4j         Running                          5h46m
adistefa-ipibm-vjwqv-worker-0-zv9qh         Running                          20h
adistefa-ipibm-vjwqv-worker-miyadav-vsdkm   Running                          5h13m
master-03                                   Running                          4h57m


oc get nodes
NAME                                                          STATUS   ROLES    AGE     VERSION
master-01.adistefa-ipibm.qeclusters.arm.eng.rdu2.redhat.com   Ready    master   20h     v1.24.0+bb9c2f1
master-03.adistefa-ipibm.qeclusters.arm.eng.rdu2.redhat.com   Ready    master   4h49m   v1.24.0+bb9c2f1
worker-00.adistefa-ipibm.qeclusters.arm.eng.rdu2.redhat.com   Ready    worker   20h     v1.24.0+bb9c2f1
worker-01.adistefa-ipibm.qeclusters.arm.eng.rdu2.redhat.com   Ready    worker   20h     v1.24.0+bb9c2f1
worker-02.adistefa-ipibm.qeclusters.arm.eng.rdu2.redhat.com   Ready    worker   5h38m   v1.24.0+bb9c2f1
worker-03.adistefa-ipibm.qeclusters.arm.eng.rdu2.redhat.com   Ready    worker   5h5m    v1.24.0+bb9c2f1


master-0 gets deleted (It should have been prevented). 


vim openshift-machine-api/pods/machine-api-controllers-85b9c7f7d6-4jbfp/machine-controller/machine-controller/logs/current.log

2022-06-13T13:39:11.850370075Z I0613 13:39:11.850296       1 controller.go:709] evicting pod openshift-etcd/revision-pruner-16-master-00.adistefa-ipibm.qeclusters.arm.eng.rdu2.redhat.com
2022-06-13T13:39:11.873421269Z I0613 13:39:11.873374       1 controller.go:432] Evicted pod from Nodepodrevision-pruner-16-master-00.adistefa-ipibm.qeclusters.arm.eng.rdu2.redhat.com/openshift-etcd
2022-06-13T13:39:11.873462751Z I0613 13:39:11.873449       1 controller.go:460] drain successful for machine "adistefa-ipibm-vjwqv-master-0"

Comment 14 Thomas Jungblut 2022-06-16 14:19:25 UTC

*** Bug 2094919 has been marked as a duplicate of this bug. ***

Comment 18 Sandeep 2022-06-24 12:27:26 UTC

checked on BM platform ocp version : 4.11.0-0.nightly-2022-06-23-153912


oc delete machine skundu-bm-ww96d-master-0
oc get machines
NAME                             PHASE      TYPE   REGION   ZONE   AGE
skundu-bm-ww96d-master-0         Deleting                          100m
skundu-bm-ww96d-master-1         Running                           101m
skundu-bm-ww96d-master-2         Running                           101m
skundu-bm-ww96d-worker-0-gkxrz   Running                           76m
skundu-bm-ww96d-worker-0-vsrg7   Running                           76m

machine continues to remain in "Deleting" state.


oc get nodes
NAME                                       STATUS   ROLES    AGE   VERSION
openshift-qe-013.lab.eng.rdu2.redhat.com   Ready    master   80m   v1.24.0+284d62a
openshift-qe-014.lab.eng.rdu2.redhat.com   Ready    master   80m   v1.24.0+284d62a
openshift-qe-015.lab.eng.rdu2.redhat.com   Ready    master   80m   v1.24.0+284d62a
openshift-qe-016.lab.eng.rdu2.redhat.com   Ready    worker   54m   v1.24.0+284d62a
openshift-qe-023.lab.eng.rdu2.redhat.com   Ready    worker   52m   v1.24.0+284d62a

As seen above, the node continues to remain in Ready state.


etcd operator logs:

skip removing the deletion hook from machine skundu-bm-ww96d-master-0 since its member is still present with any of: [{InternalIP } {InternalIP } {InternalIP fe80::f602:70ff:feb8:d8f0%eno1.194} {InternalIP 10.8.1.143} {InternalIP 2620:52:0:800:f602:70ff:feb8:d8f0} {InternalIP } {InternalIP } {InternalIP } {Hostname openshift-qe-013.lab.eng.rdu2.redhat.com} {InternalDNS openshift-qe-013.lab.eng.rdu2.redhat.com}]
I0624 12:04:02.561798       1 machinedeletionhooks.go:121] current members [ID:2648565165544474566 name:"openshift-qe-014.lab.eng.rdu2.redhat.com" peerURLs:"https://10.8.1.144:2380" clientURLs:"https://10.8.1.144:2379"  ID:16601722864613429937 name:"openshift-qe-013.lab.eng.rdu2.redhat.com" peerURLs:"https://10.8.1.143:2380" clientURLs:"https://10.8.1.143:2379"  ID:17820969981482329470 name:"openshift-qe-015.lab.eng.rdu2.redhat.com" peerURLs:"https://10.8.1.145:2380" clientURLs:"https://10.8.1.145:2379" ] with IPSet: map[10.8.1.143:{} 10.8.1.144:{} 10.8.1.145:{}]
I0624 12:04:02.561880       1 machinedeletionhooks.go:135] skip removing the deletion hook from machine skundu-bm-ww96d-master-0 since its member is still present with any of: [{InternalIP } {InternalIP } {InternalIP fe80::f602:70ff:feb8:d8f0%eno1.194} {InternalIP 10.8.1.143} {InternalIP 2620:52:0:800:f602:70ff:feb8:d8f0} {InternalIP } {InternalIP } {InternalIP } {Hostname openshift-qe-013.lab.eng.rdu2.redhat.com} {InternalDNS openshift-qe-013.lab.eng.rdu2.redhat.com}]


It works as expected on BM platform.

Comment 20 Sandeep 2022-06-24 17:17:38 UTC

The positive scenario of scaling up the etcd  works fine on the BM platform.

oc get machines
NAME                                 PHASE     TYPE   REGION   ZONE   AGE
adistefa-ipi2-tlxp7-master-1         Running                          4h12m
adistefa-ipi2-tlxp7-master-2         Running                          4h12m
adistefa-ipi2-tlxp7-master-new       Running                          131m

The new machine has successfully replaced the deleted machine.

oc get nodes
NAME                                                         STATUS   ROLES    AGE     VERSION
master-01.adistefa-ipi2.qeclusters.arm.eng.rdu2.redhat.com   Ready    master   3h52m   v1.24.0+284d62a
master-02.adistefa-ipi2.qeclusters.arm.eng.rdu2.redhat.com   Ready    master   3h52m   v1.24.0+284d62a
node-01.adistefa-ipi2.qeclusters.arm.eng.rdu2.redhat.com     Ready    master   121m    v1.24.0+284d62a

The new node has successfully replaced the deleted node.

control plane pods are also replicated on the new node.

oc get po -n openshift-etcd
etcd-master-01.adistefa-ipi2.qeclusters.arm.eng.rdu2.redhat.com                 5/5     Running     0          117m
etcd-master-02.adistefa-ipi2.qeclusters.arm.eng.rdu2.redhat.com                 5/5     Running     0          116m
etcd-node-01.adistefa-ipi2.qeclusters.arm.eng.rdu2.redhat.com                   5/5     Running     0          114m


oc get po -n openshift-kube-apiserver
kube-apiserver-master-01.adistefa-ipi2.qeclusters.arm.eng.rdu2.redhat.com         5/5     Running     0          125m
kube-apiserver-master-02.adistefa-ipi2.qeclusters.arm.eng.rdu2.redhat.com         5/5     Running     0          128m
kube-apiserver-node-01.adistefa-ipi2.qeclusters.arm.eng.rdu2.redhat.com           5/5     Running     0          135m

Comment 21 ge liu 2022-06-27 09:01:29 UTC

Verified with 4.11.0-0.nightly-2022-06-25-081133 on ipi vsphere, the hook works well

Comment 22 errata-xmlrpc 2022-08-10 11:17:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.