Bug 1973660 - [update] from 16.1 to 16.2 breaks trying to configure the rabbitmq service.
Summary: [update] from 16.1 to 16.2 breaks trying to configure the rabbitmq service.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 16.2 (Train on RHEL 8.4)
Assignee: Sofer Athlan-Guyot
QA Contact: Jason Grosso
Vlada Grosu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-18 12:14 UTC by Sofer Athlan-Guyot
Modified: 2021-08-04 10:08 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-04 10:08:44 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 6175352 0 None None None 2021-07-08 12:11:39 UTC

Description Sofer Athlan-Guyot 2021-06-18 12:14:46 UTC
Description of problem:

During an 16.1 to 16.2 update we have:

Debug: Exec[rabbitmq-ready](provider=posix): Executing check 'rabbitmqctl eval \"rabbit_nodes:is_running(node(), rabbit).\" | grep -q true && rabbitmqctl eval \"rabbit_mnesia:is_clustered().\" | grep -q true'
Debug: Executing: 'rabbitmqctl eval \"rabbit_nodes:is_running(node(), rabbit).\" | grep -q true && rabbitmqctl eval \"rabbit_mnesia:is_clustered().\" | grep -q true'
Debug: Prefetching rabbitmqctl resources for rabbitmq_user
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
...
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
Debug: Storing state
Info: Creating state file /var/lib/puppet/state/state.yaml
Debug: Pruned old state cache entries in 0.00 seconds
Debug: Stored state in 0.01 seconds

during "Wait for containers to start for step 2 using paunch", so during the common deploy step tasks.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 6 Sofer Athlan-Guyot 2021-06-24 17:09:39 UTC
I'm re-opening it because we need a special kb article and an update to the documentation for affected customers.

Let's say you've update osp161. some time ago and ended up with the affected version of pacemaker.

Then you need to follow a special procedure to update to a non affected pacemaker *before* update
or else your update will fail in strange way.

For 16.2 the new pacemaker is already there, but for 16.1 we may already have customer affect.

The bz for the 16.1 is https://bugzilla.redhat.com/show_bug.cgi?id=1972369.

So even when the pacemaker package is updated, we will still need the special procedure as the 
issue happen *before* yum upgrade is run (ie the new version is not *stopping* the container).

Comment 7 Sofer Athlan-Guyot 2021-06-24 19:39:39 UTC
Just fyi, an easy way to check if it the issue happened is to run this:


 grep 'error: Shutdown Escalation just popped in state' /var/log/pacemaker/pacemaker.log

example:

zgrep 'Shutdown Escalation' controller-0/var/log/pacemaker/*
controller-0/var/log/pacemaker/pacemaker.log.gz:Jun 18 03:02:57 controller-0 pacemaker-controld  [23552] (crm_timer_popped)     error: Shutdown Escalation just popped in state S_NOT_DC! | input=I_STOP time=1200000ms

Comment 11 Sofer Athlan-Guyot 2021-07-09 12:07:28 UTC
From the undercloud:
. ~/stackrc
plan=$(openstack stack list  -f value -c 'Stack Name')

tripleo-ansible-inventory \
    --plan "${plan}" \
    --ansible_ssh_user heat-admin \
    --static-yaml-inventory \
    inventory.yaml

ansible -i inventory.yaml 'pacemaker[0]' -b -m shell -a 'podman exec  $(podman ps|awk "/bundle/{print \$NF;exit}" ) rpm -qa |awk "/^pacemaker-[0-9]/"'
controller-0 | CHANGED | rc=0 >>
pacemaker-2.0.3-5.el8_2.4.x86_64

Comment 12 Vlada Grosu 2021-07-14 12:11:43 UTC
Hi Sofer,

I've (slightly) edited and published the Red Hat knowledge base solution: https://access.redhat.com/solutions/6175352

Can you please look over it to double-check that everything looks right?

Many thanks,
Vlada

Comment 15 Sofer Athlan-Guyot 2021-07-28 16:15:37 UTC
Hey Vlada,
 
nothing to say on the review it's perfect.

Thanks a lot.

Comment 16 Michele Baldessari 2021-08-04 10:08:44 UTC
Since https://bugzilla.redhat.com/show_bug.cgi?id=1972369 is released and that was the root cause of this all and we have docs for this as well, I am closing this one out.


Note You need to log in before you can comment on or make changes to this bug.