Hide Forgot
Description of problem: Control plane node migration cannot be performed due to a failure during node drain operation. Version-Release number of selected component (if applicable): OCP 4.10.0-0.nightly-2022-01-25-023600 OSP 16.1.7 How reproducible: always Steps to Reproduce: 1. Install OCP 4.10 2. Follow CP node migration procedure described here: https://github.com/openshift/installer/tree/master/docs/user/openstack#control-plane-node-migration Actual results: $ OS_CLOUD=overcloud ./cp_node_migration.sh ostest-kznkt-master-0 + declare -r node_name=ostest-kznkt-master-0 + declare server_id ++ openstack server list --all-projects -f value -c ID -c Name ++ grep ostest-kznkt-master-0 ++ cut '-d ' -f1 + server_id=6b5e7191-a2b8-41e7-9be0-d269ebc09e5c + readonly server_id + oc adm cordon ostest-kznkt-master-0 node/ostest-kznkt-master-0 cordoned + oc adm drain ostest-kznkt-master-0 --delete-emptydir-data --ignore-daemonsets node/ostest-kznkt-master-0 already cordoned error: unable to drain node "ostest-kznkt-master-0" due to error:cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): openshift-kube-apiserver/kube-apiserver-guard-ostest-kznkt-master-0, openshift-kube-controller-manager/kube-controller-manager-guard-ostest-kznkt-master-0, openshift-kube-scheduler/openshift-kube-scheduler-guard-ostest-kznkt-master-0, continuing command... There are pending nodes to be drained: ostest-kznkt-master-0 cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): openshift-kube-apiserver/kube-apiserver-guard-ostest-kznkt-master-0, openshift-kube-controller-manager/kube-controller-manager-guard-ostest-kznkt-master-0, openshift-kube-scheduler/openshift-kube-scheduler-guard-ostest-kznkt-master-0 Expected results: node successfully migrated Additional info: $ openstack server list --host compute-0.redhat.local --all +--------------------------------------+-----------------------------+--------+-------------------------------------------------+--------------------+--------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+-----------------------------+--------+-------------------------------------------------+--------------------+--------+ | 60749007-a8be-4897-bd05-afdf6728b347 | ostest-kznkt-worker-0-tjmws | ACTIVE | ostest-kznkt-openshift=10.196.0.205 | ostest-kznkt-rhcos | | | cd93d6a1-827f-4d3e-b6ff-df3f3e3e1ed0 | ostest-kznkt-bootstrap | ACTIVE | ostest-kznkt-openshift=10.196.1.20, 10.46.23.49 | ostest-kznkt-rhcos | | | 1b287a04-40f3-4ed0-a7c8-25800dd7d537 | ostest-kznkt-master-1 | ACTIVE | ostest-kznkt-openshift=10.196.3.84 | ostest-kznkt-rhcos | | +--------------------------------------+-----------------------------+--------+-------------------------------------------------+--------------------+--------+ $ openstack server list --host compute-1.redhat.local --all +--------------------------------------+-----------------------------+--------+-------------------------------------+--------------------+--------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+-----------------------------+--------+-------------------------------------+--------------------+--------+ | 18fd7016-2521-4b81-9e4f-84321e6edbfd | ostest-kznkt-worker-0-hlkp2 | ACTIVE | ostest-kznkt-openshift=10.196.3.245 | ostest-kznkt-rhcos | | | eaf32211-cb78-482a-a349-ec6c92ef7370 | ostest-kznkt-worker-0-7dqh2 | ACTIVE | ostest-kznkt-openshift=10.196.0.94 | ostest-kznkt-rhcos | | | 68e49ac2-fdbe-4e7e-a1fc-3c87005802c6 | ostest-kznkt-master-2 | ACTIVE | ostest-kznkt-openshift=10.196.2.51 | ostest-kznkt-rhcos | | | 6b5e7191-a2b8-41e7-9be0-d269ebc09e5c | ostest-kznkt-master-0 | ACTIVE | ostest-kznkt-openshift=10.196.1.253 | ostest-kznkt-rhcos | | +--------------------------------------+-----------------------------+--------+-------------------------------------+--------------------+--------+ $ oc get pods -o wide -A | grep master-0 | grep guard openshift-etcd etcd-quorum-guard-6d5548d4c4-jgzkh 1/1 Running 0 22h 10.196.1.253 ostest-kznkt-master-0 <none> <none> openshift-kube-apiserver kube-apiserver-guard-ostest-kznkt-master-0 1/1 Running 0 21h 10.128.0.44 ostest-kznkt-master-0 <none> <none> openshift-kube-controller-manager kube-controller-manager-guard-ostest-kznkt-master-0 1/1 Running 0 21h 10.128.0.42 ostest-kznkt-master-0 <none> <none> openshift-kube-scheduler openshift-kube-scheduler-guard-ostest-kznkt-master-0 1/1 Running 0 22h 10.128.0.34 ostest-kznkt-master-0 <none> <none>
This is probably worth checking again with a payload that contains the fix to Bug 2038481.
Setting blocker- because itβs a potential bug in the docs.
Do you mind testing the proposed patch in your environment before merge?
(In reply to Pierre Prinetti from comment #5) > Do you mind testing the proposed patch in your environment before merge? Tested and looking good
Verified in 4.11.0-0.nightly-2022-05-10-045003 on top of OSP 16.1.8. Control plane node migration is correctly done. 2022-05-11 13:20:23.559 | "vm_per_compute": { 2022-05-11 13:20:23.562 | "computehci-0.redhat.local": [ 2022-05-11 13:20:23.564 | "ostest-6pp4w-worker-0-7dnxv", 2022-05-11 13:20:23.567 | "ostest-6pp4w-master-2" 2022-05-11 13:20:23.570 | ], 2022-05-11 13:20:23.572 | "computehci-1.redhat.local": [ 2022-05-11 13:20:23.575 | "ostest-6pp4w-worker-0-twnxn", 2022-05-11 13:20:23.577 | "ostest-6pp4w-master-0" 2022-05-11 13:20:23.580 | ], 2022-05-11 13:20:23.583 | "computehci-2.redhat.local": [ 2022-05-11 13:20:23.585 | "ostest-6pp4w-worker-0-4tkgd", 2022-05-11 13:20:23.588 | "ostest-6pp4w-master-1" 2022-05-11 13:20:23.590 | ] 2022-05-11 13:20:23.593 | } ... 2022-05-11 13:21:18.461 | Going to migrate 'ostest-6pp4w-master-0' OCP node from 'computehci-1.redhat.local' OSP compute ... 2022-05-11 13:27:41.527 | "vm_per_compute_after": { 2022-05-11 13:27:41.529 | "computehci-0.redhat.local": [ 2022-05-11 13:27:41.532 | "ostest-6pp4w-worker-0-7dnxv", 2022-05-11 13:27:41.534 | "ostest-6pp4w-master-2" 2022-05-11 13:27:41.537 | ], 2022-05-11 13:27:41.539 | "computehci-1.redhat.local": [ 2022-05-11 13:27:41.541 | "ostest-6pp4w-worker-0-twnxn" 2022-05-11 13:27:41.544 | ], 2022-05-11 13:27:41.546 | "computehci-2.redhat.local": [ 2022-05-11 13:27:41.548 | "ostest-6pp4w-worker-0-4tkgd", 2022-05-11 13:27:41.551 | "ostest-6pp4w-master-1", 2022-05-11 13:27:41.553 | "ostest-6pp4w-master-0" 2022-05-11 13:27:41.555 | ] 2022-05-11 13:27:41.558 | }
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069