Bug 1832999

Summary: [OSP] Failed to drain node for machine
Product: OpenShift Container Platform Reporter: egarcia
Component: Cloud ComputeAssignee: egarcia
Cloud Compute sub component: OpenStack Provider QA Contact: sunzhaohua <zhsun>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: adduarte, agarcial, danken, dsanzmor, egarcia, jhou, m.andre, mfedosin, mgugino, pprinett, zhsun
Version: 4.4Keywords: UpcomingSprint
Target Milestone: ---   
Target Release: 4.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1810400 Environment:
Last Closed: 2020-07-14 01:43:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1810400, 1848755    
Bug Blocks:    

Comment 6 sunzhaohua 2020-07-06 05:34:54 UTC
Verified
Clusterversion: 4.4.0-0.nightly-2020-07-04-120349
1. Stop kubelet on a worker node that has a pod with local storage/data, such as alert-monitor-1 in namespace openshift-monitoring
2. Drain the node 
$ oc adm drain zhsunosp76-xfctt-worker-pgvsn --ignore-daemonsets --delete-local-data
node/zhsunosp76-xfctt-worker-pgvsn cordoned
WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-node-tuning-operator/tuned-zjplm, openshift-dns/dns-default-jhwkz, openshift-image-registry/node-ca-xn64c, openshift-machine-config-operator/machine-config-daemon-r4r4j, openshift-monitoring/node-exporter-5kckd, openshift-multus/multus-wm6bw, openshift-sdn/ovs-5qbsd, openshift-sdn/sdn-thnz9
evicting pod openshift-monitoring/alertmanager-main-1
evicting pod openshift-ingress/router-default-78c798c4c4-4q852
evicting pod openshift-monitoring/alertmanager-main-2
evicting pod openshift-monitoring/prometheus-k8s-0
evicting pod openshift-monitoring/thanos-querier-5b9dbdc464-mkc6r
pod/router-default-78c798c4c4-4q852 evicted
pod/alertmanager-main-2 evicted
pod/prometheus-k8s-0 evicted
pod/thanos-querier-5b9dbdc464-mkc6r evicted
pod/alertmanager-main-1 evicted
node/zhsunosp76-xfctt-worker-pgvsn evicted

3. Delete the machine associated with that node
$ oc delete machine zhsunosp76-xfctt-worker-pgvsn
machine.machine.openshift.io "zhsunosp76-xfctt-worker-pgvsn" deleted
4. check logs 

I0706 04:45:15.911697       1 controller.go:247] zhsunosp76-xfctt-worker-pgvsn: deleting node "zhsunosp76-xfctt-worker-pgvsn" for machine
I0706 04:45:15.912715       1 reflector.go:175] Starting reflector *v1.Node (10h32m54.332782437s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:224
I0706 04:45:15.912806       1 reflector.go:211] Listing and watching *v1.Node from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:224
I0706 04:45:16.053459       1 controller.go:261] zhsunosp76-xfctt-worker-pgvsn: machine deletion successful

Comment 8 errata-xmlrpc 2020-07-14 01:43:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2871