Bug 1973660

Summary: [update] from 16.1 to 16.2 breaks trying to configure the rabbitmq service.
Product: Red Hat OpenStack Reporter: Sofer Athlan-Guyot <sathlang>
Component: openstack-tripleo-heat-templatesAssignee: Sofer Athlan-Guyot <sathlang>
Status: CLOSED CURRENTRELEASE QA Contact: Jason Grosso <jgrosso>
Severity: urgent Docs Contact: Vlada Grosu <vgrosu>
Priority: urgent    
Version: 16.1 (Train)CC: enothen, jeckersb, jpretori, kgilliga, mburns, michele, morazi, vgrosu
Target Milestone: rcKeywords: Reopened, TestBlocker, TestOnly, Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-04 10:08:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sofer Athlan-Guyot 2021-06-18 12:14:46 UTC
Description of problem:

During an 16.1 to 16.2 update we have:

Debug: Exec[rabbitmq-ready](provider=posix): Executing check 'rabbitmqctl eval \"rabbit_nodes:is_running(node(), rabbit).\" | grep -q true && rabbitmqctl eval \"rabbit_mnesia:is_clustered().\" | grep -q true'
Debug: Executing: 'rabbitmqctl eval \"rabbit_nodes:is_running(node(), rabbit).\" | grep -q true && rabbitmqctl eval \"rabbit_mnesia:is_clustered().\" | grep -q true'
Debug: Prefetching rabbitmqctl resources for rabbitmq_user
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
...
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
Debug: Storing state
Info: Creating state file /var/lib/puppet/state/state.yaml
Debug: Pruned old state cache entries in 0.00 seconds
Debug: Stored state in 0.01 seconds

during "Wait for containers to start for step 2 using paunch", so during the common deploy step tasks.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 6 Sofer Athlan-Guyot 2021-06-24 17:09:39 UTC
I'm re-opening it because we need a special kb article and an update to the documentation for affected customers.

Let's say you've update osp161. some time ago and ended up with the affected version of pacemaker.

Then you need to follow a special procedure to update to a non affected pacemaker *before* update
or else your update will fail in strange way.

For 16.2 the new pacemaker is already there, but for 16.1 we may already have customer affect.

The bz for the 16.1 is https://bugzilla.redhat.com/show_bug.cgi?id=1972369.

So even when the pacemaker package is updated, we will still need the special procedure as the 
issue happen *before* yum upgrade is run (ie the new version is not *stopping* the container).

Comment 7 Sofer Athlan-Guyot 2021-06-24 19:39:39 UTC
Just fyi, an easy way to check if it the issue happened is to run this:


 grep 'error: Shutdown Escalation just popped in state' /var/log/pacemaker/pacemaker.log

example:

zgrep 'Shutdown Escalation' controller-0/var/log/pacemaker/*
controller-0/var/log/pacemaker/pacemaker.log.gz:Jun 18 03:02:57 controller-0 pacemaker-controld  [23552] (crm_timer_popped)     error: Shutdown Escalation just popped in state S_NOT_DC! | input=I_STOP time=1200000ms

Comment 11 Sofer Athlan-Guyot 2021-07-09 12:07:28 UTC
From the undercloud:
. ~/stackrc
plan=$(openstack stack list  -f value -c 'Stack Name')

tripleo-ansible-inventory \
    --plan "${plan}" \
    --ansible_ssh_user heat-admin \
    --static-yaml-inventory \
    inventory.yaml

ansible -i inventory.yaml 'pacemaker[0]' -b -m shell -a 'podman exec  $(podman ps|awk "/bundle/{print \$NF;exit}" ) rpm -qa |awk "/^pacemaker-[0-9]/"'
controller-0 | CHANGED | rc=0 >>
pacemaker-2.0.3-5.el8_2.4.x86_64

Comment 12 Vlada Grosu 2021-07-14 12:11:43 UTC
Hi Sofer,

I've (slightly) edited and published the Red Hat knowledge base solution: https://access.redhat.com/solutions/6175352

Can you please look over it to double-check that everything looks right?

Many thanks,
Vlada

Comment 15 Sofer Athlan-Guyot 2021-07-28 16:15:37 UTC
Hey Vlada,
 
nothing to say on the review it's perfect.

Thanks a lot.

Comment 16 Michele Baldessari 2021-08-04 10:08:44 UTC
Since https://bugzilla.redhat.com/show_bug.cgi?id=1972369 is released and that was the root cause of this all and we have docs for this as well, I am closing this one out.

Comment 18 Luca Miccini 2022-06-23 10:41:18 UTC
*** Bug 2092520 has been marked as a duplicate of this bug. ***

Comment 19 Luca Miccini 2022-06-30 15:16:24 UTC
*** Bug 2100193 has been marked as a duplicate of this bug. ***