Patch has been merged: https://github.com/openstack-k8s-operators/osp-director-operator/pull/483
Verified , Tested with fix on master: https://github.com/openstack-k8s-operators/osp-director-operator/pull/483 [root@controller-2 ~]# echo c > /proc/sysrq-trigger -fenice_kubevirt-525400351cf5 (stonith:fence_kubevirt): Started controller-1 * ip-10.0.0.10 (ocf::heartbeat:IPaddr2): Started controller-1 * ip-172.17.0.10 (ocf::heartbeat:IPaddr2): Started controller-0 * ip-172.18.0.10 (ocf::heartbeat:IPaddr2): Started controller-2 (UNCLEAN) * ip-172.19.0.10 (ocf::heartbeat:IPaddr2): Started controller-0 * Container bundle set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmkla test]: * haproxy-bundle-podman-0 (ocf::heartbeat:podman): Started controller-1 * haproxy-bundle-podman-1 (ocf::heartbeat:podman): Started controller-2 (UNCLEA N) * haproxy-bundle-podman-2 (ocf::heartbeat:podman): Started controller-0 * Container bundle set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklat est]: * galera-bundle-0 (ocf::heartbeat:galera): Master controller-1 * galera-bundle-1 (ocf::heartbeat:galera): FAILED Master controller-2 (UNCLEAN) * galera-bundle-2 (ocf::heartbeat:galera): Master controller-0 * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmk latest]: * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started controller-1 * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): FAILED controller-2 (UNCLEAN ) * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started controller-0 * Container bundle set: redis-bundle [cluster.common.tag/rhosp16-openstack-redis:pcmklatest ]: * redis-bundle-0 (ocf::heartbeat:redis): Master controller-1 * redis-bundle-1 (ocf::heartbeat:redis): FAILED controller-2 (UNCLEAN) * redis-bundle-2 (ocf::heartbeat:redis): Slave controller-0 * stonith-fence_kubevirt-525400dfecbd (stonith:fence_kubevirt): Started controller-1 * Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcm klatest]: * ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): FAILED Master controller-2 (UNCLEAN) * ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave controller-0 * ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-1 * Container bundle: openstack-cinder-volume [cluster.common.tag/rhosp16-openstack-cinder-vo lume:pcmklatest]: * openstack-cinder-volume-podman-0 (ocf::heartbeat:podman): Started controller-0 #wait for recovery : from /var/log/pacemaker.log: #we see resources are logged as down : Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (stage6) warning: Scheduling Node cont roller-2 for STONITH Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (native_stop_constraints) info: ip-192. 168.25.100_stop_0 is implicit after controller-2 is fenced Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (native_stop_constraints) info: stonith -fence_kubevirt-52540032e8e5_stop_0 is implicit because controller-2 is fenced Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (native_stop_constraints) info: ip-172. 18.0.10_stop_0 is implicit after controller-2 is fenced Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (native_stop_constraints) info: haproxy -bundle-podman-1_stop_0 is implicit after controller-2 is fenced Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (native_stop_constraints) notice: Stop of failed resource galera-bundle-podman-1 is implicit after controller-2 is fenced Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (native_stop_constraints) info: galera-bundle-1_stop_0 is implicit because controller-2 is fenced Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (native_stop_constraints) notice: Stop of failed resource rabbitmq-bundle-podman-1 is implicit after controller-2 is fenced Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (native_stop_constraints) info: rabbitmq-bundle-1_stop_0 is implicit because controller-2 is fenced Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (native_stop_constraints) notice: Stop of failed resource redis-bundle-podman-1 is implicit after controller-2 is fenced Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (native_stop_constraints) info: redis-bundle-1_stop_0 is implicit because controller-2 is fenced Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (native_stop_constraints) notice: Stop of failed resource ovn-dbs-bundle-podman-0 is implicit after controller-2 is fenced Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (native_stop_constraints) info: ovn-dbs-bundle-0_stop_0 is implicit because controller-2 is fenced Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (fence_guest) info: Implying guest node galera-bundle-1 is down (action 253) after controller-2 fencing Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (fence_guest) info: Implying guest node ovn-dbs-bundle-0 is down (action 254) after controller-2 fencing Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (fence_guest) info: Implying guest : #controller-2 is stonithed; rebooted: Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-controld [45826] (te_fence_node) notice: Requesting fe ncing (reboot) of node controller-2 | action=1 timeout=60000 Jan 09 09:41:34 controller-1.osptest.test.metalkube.org pacemaker-fenced [45822] (handle_request) notice: Client pacema ker-controld.45826.abf35c6a wants to fence (reboot) 'controller-2' with device '(any)' Jan 09 09:41:53 controller-1.osptest.test.metalkube.org pacemaker-fenced [45822] (process_remote_stonith_exec) notic e: Action 'reboot' targeting controller-2 using stonith-fence_kubevirt-525400351cf5 on behalf of pacemaker-controld.45826@con troller-1: OK | rc=0 #fencing is coplete : Jan 09 09:41:53 controller-1.osptest.test.metalkube.org pacemaker-controld [45826] (cib_fencing_updated) info: Fencing update 620 for controller-2: complete Jan 09 09:41:53 controller-1.osptest.test.metalkube.org pacemaker-controld [45826] (cib_fencing_updated) info: Fencing update 622 for controller-2: complete Jan 09 09:43:14 controller-1.osptest.test.metalk #resources are coming online after node has been fenced : Jan 09 10:00:43 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (determine_online_status_fencing) info: Node controller-2 is active Jan 09 10:00:43 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (determine_online_status) info: Node co ntroller-2 is online Jan 09 10:00:43 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (log_list_item) info: Container bundl e set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmklatest]: haproxy-bundle-podman-1 (ocf::heartbeat:po dman): Started controller-2 Jan 09 10:00:43 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (log_list_item) info: Container bundl e set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest]: galera-bundle-1 (ocf::heartbeat:galera): Master controller-2 Jan 09 10:00:43 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (log_list_item) info: Container bundl e set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]: rabbitmq-bundle-1 (ocf::heartbeat:ra bbitmq-cluster): Started controller-2 Jan 09 10:00:43 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (log_list_item) info: Container bundl e set: redis-bundle [cluster.common.tag/rhosp16-openstack-redis:pcmklatest]: redis-bundle-1 (ocf::heartbeat:redis): S lave controller-2 Jan 09 10:00:43 controller-1.osptest.test.metalkube.org pacemaker-schedulerd[45825] (log_list_item) info: Container bundl e set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcmklatest]: ovn-dbs-bundle-0 (ocf::ovn:ovndb-se rvers): Slave controller-2 # pcs status indicates all cluster resources are back to normal: cotroller-1 rabbitmq-bundle-0@controller-1 rabbitmq-bundle-1@controller-2 rabbitmq-bundle-2@con troller-0 redis-bundle-0@controller-1 redis-bundle-1@controller-2 redis-bundle-2@controller-0 ] Active Resources: * ip-192.168.25.100 (ocf::heartbeat:IPaddr2): Started controller-0 * stonith-fence_kubevirt-52540032e8e5 (stonith:fence_kubevirt): Started controller-0 * stonith-fence_kubevirt-525400351cf5 (stonith:fence_kubevirt): Started controller-1 * ip-10.0.0.10 (ocf::heartbeat:IPaddr2): Started controller-1 * ip-172.17.0.10 (ocf::heartbeat:IPaddr2): Started controller-0 * ip-172.18.0.10 (ocf::heartbeat:IPaddr2): Started controller-1 * ip-172.19.0.10 (ocf::heartbeat:IPaddr2): Started controller-0 * Container bundle set: haproxy-bundle [cluster.common.tag/rhosp16-openstack-haproxy:pcmkla ller test]: * haproxy-bundle-podman-0 (ocf::heartbeat:podman): Started controller-1 * haproxy-bundle-podman-1 (ocf::heartbeat:podman): Started controller-2 * haproxy-bundle-podman-2 (ocf::heartbeat:podman): Started controller-0 * Container bundle set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklat est]: * galera-bundle-0 (ocf::heartbeat:galera): Master controller-1 * galera-bundle-1 (ocf::heartbeat:galera): Master controller-2 * galera-bundle-2 (ocf::heartbeat:galera): Master controller-0 * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmk latest]: * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started controller-1 * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-2 * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started controller-0 * Container bundle set: redis-bundle [cluster.common.tag/rhosp16-openstack-redis:pcmklatest ]: * redis-bundle-0 (ocf::heartbeat:redis): Master controller-1 * redis-bundle-1 (ocf::heartbeat:redis): Slave controller-2 * redis-bundle-2 (ocf::heartbeat:redis): Slave controller-0 * stonith-fence_kubevirt-525400dfecbd (stonith:fence_kubevirt): Started controller-1 * Container bundle set: ovn-dbs-bundle [cluster.common.tag/rhosp16-openstack-ovn-northd:pcm klatest]: * ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Slave controller-2 * ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Master controller-0 * ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-1 * Container bundle: openstack-cinder-volume [cluster.common.tag/rhosp16-openstack-cinder-vo lume:pcmklatest]: * openstack-cinder-volume-podman-0 (ocf::heartbeat:podman): Started controller-0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Release of containers for OSP 16.2 director operator tech preview), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0842