Description of problem: During an 16.1 to 16.2 update we have: Debug: Exec[rabbitmq-ready](provider=posix): Executing check 'rabbitmqctl eval \"rabbit_nodes:is_running(node(), rabbit).\" | grep -q true && rabbitmqctl eval \"rabbit_mnesia:is_clustered().\" | grep -q true' Debug: Executing: 'rabbitmqctl eval \"rabbit_nodes:is_running(node(), rabbit).\" | grep -q true && rabbitmqctl eval \"rabbit_mnesia:is_clustered().\" | grep -q true' Debug: Prefetching rabbitmqctl resources for rabbitmq_user Debug: Executing: '/usr/sbin/rabbitmqctl -q status' Debug: Command failed, retrying Debug: Executing: '/usr/sbin/rabbitmqctl -q status' Debug: Command failed, retrying Debug: Executing: '/usr/sbin/rabbitmqctl -q status' Debug: Command failed, retrying Debug: Executing: '/usr/sbin/rabbitmqctl -q status' Debug: Command failed, retrying Debug: Executing: '/usr/sbin/rabbitmqctl -q status' Debug: Command failed, retrying ... Debug: Executing: '/usr/sbin/rabbitmqctl -q status' Debug: Command failed, retrying Debug: Storing state Info: Creating state file /var/lib/puppet/state/state.yaml Debug: Pruned old state cache entries in 0.00 seconds Debug: Stored state in 0.01 seconds during "Wait for containers to start for step 2 using paunch", so during the common deploy step tasks. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I'm re-opening it because we need a special kb article and an update to the documentation for affected customers. Let's say you've update osp161. some time ago and ended up with the affected version of pacemaker. Then you need to follow a special procedure to update to a non affected pacemaker *before* update or else your update will fail in strange way. For 16.2 the new pacemaker is already there, but for 16.1 we may already have customer affect. The bz for the 16.1 is https://bugzilla.redhat.com/show_bug.cgi?id=1972369. So even when the pacemaker package is updated, we will still need the special procedure as the issue happen *before* yum upgrade is run (ie the new version is not *stopping* the container).
Just fyi, an easy way to check if it the issue happened is to run this: grep 'error: Shutdown Escalation just popped in state' /var/log/pacemaker/pacemaker.log example: zgrep 'Shutdown Escalation' controller-0/var/log/pacemaker/* controller-0/var/log/pacemaker/pacemaker.log.gz:Jun 18 03:02:57 controller-0 pacemaker-controld [23552] (crm_timer_popped) error: Shutdown Escalation just popped in state S_NOT_DC! | input=I_STOP time=1200000ms
From the undercloud: . ~/stackrc plan=$(openstack stack list -f value -c 'Stack Name') tripleo-ansible-inventory \ --plan "${plan}" \ --ansible_ssh_user heat-admin \ --static-yaml-inventory \ inventory.yaml ansible -i inventory.yaml 'pacemaker[0]' -b -m shell -a 'podman exec $(podman ps|awk "/bundle/{print \$NF;exit}" ) rpm -qa |awk "/^pacemaker-[0-9]/"' controller-0 | CHANGED | rc=0 >> pacemaker-2.0.3-5.el8_2.4.x86_64
Hi Sofer, I've (slightly) edited and published the Red Hat knowledge base solution: https://access.redhat.com/solutions/6175352 Can you please look over it to double-check that everything looks right? Many thanks, Vlada
The documentation changes are published on the Customer Portal: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/keeping_red_hat_openstack_platform_updated/index#known-issues-that-might-block-an-update-keeping-updated Thank you
Hey Vlada, nothing to say on the review it's perfect. Thanks a lot.
Since https://bugzilla.redhat.com/show_bug.cgi?id=1972369 is released and that was the root cause of this all and we have docs for this as well, I am closing this one out.
*** Bug 2092520 has been marked as a duplicate of this bug. ***
*** Bug 2100193 has been marked as a duplicate of this bug. ***