1973660 – [update] from 16.1 to 16.2 breaks trying to configure the rabbitmq service.

Bug 1973660 - [update] from 16.1 to 16.2 breaks trying to configure the rabbitmq service.

Summary: [update] from 16.1 to 16.2 breaks trying to configure the rabbitmq service.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	16.1 (Train)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	16.2 (Train on RHEL 8.4)
Assignee:	Sofer Athlan-Guyot
QA Contact:	Jason Grosso
Docs Contact:	Vlada Grosu
URL:
Whiteboard:
Duplicates (2):	2092520 2100193 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-18 12:14 UTC by Sofer Athlan-Guyot
Modified:	2022-11-11 01:56 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-08-04 10:08:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OSP-5304	0	None	None	None	2022-03-17 19:24:53 UTC
Red Hat Knowledge Base (Solution)	6175352	0	None	None	None	2021-07-08 12:11:39 UTC

Description Sofer Athlan-Guyot 2021-06-18 12:14:46 UTC

Description of problem:

During an 16.1 to 16.2 update we have:

Debug: Exec[rabbitmq-ready](provider=posix): Executing check 'rabbitmqctl eval \"rabbit_nodes:is_running(node(), rabbit).\" | grep -q true && rabbitmqctl eval \"rabbit_mnesia:is_clustered().\" | grep -q true'
Debug: Executing: 'rabbitmqctl eval \"rabbit_nodes:is_running(node(), rabbit).\" | grep -q true && rabbitmqctl eval \"rabbit_mnesia:is_clustered().\" | grep -q true'
Debug: Prefetching rabbitmqctl resources for rabbitmq_user
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
...
Debug: Executing: '/usr/sbin/rabbitmqctl -q status'
Debug: Command failed, retrying
Debug: Storing state
Info: Creating state file /var/lib/puppet/state/state.yaml
Debug: Pruned old state cache entries in 0.00 seconds
Debug: Stored state in 0.01 seconds

during "Wait for containers to start for step 2 using paunch", so during the common deploy step tasks.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 6 Sofer Athlan-Guyot 2021-06-24 17:09:39 UTC

I'm re-opening it because we need a special kb article and an update to the documentation for affected customers.

Let's say you've update osp161. some time ago and ended up with the affected version of pacemaker.

Then you need to follow a special procedure to update to a non affected pacemaker *before* update
or else your update will fail in strange way.

For 16.2 the new pacemaker is already there, but for 16.1 we may already have customer affect.

The bz for the 16.1 is https://bugzilla.redhat.com/show_bug.cgi?id=1972369.

So even when the pacemaker package is updated, we will still need the special procedure as the 
issue happen *before* yum upgrade is run (ie the new version is not *stopping* the container).

Comment 7 Sofer Athlan-Guyot 2021-06-24 19:39:39 UTC

Just fyi, an easy way to check if it the issue happened is to run this:


 grep 'error: Shutdown Escalation just popped in state' /var/log/pacemaker/pacemaker.log

example:

zgrep 'Shutdown Escalation' controller-0/var/log/pacemaker/*
controller-0/var/log/pacemaker/pacemaker.log.gz:Jun 18 03:02:57 controller-0 pacemaker-controld  [23552] (crm_timer_popped)     error: Shutdown Escalation just popped in state S_NOT_DC! | input=I_STOP time=1200000ms

Comment 11 Sofer Athlan-Guyot 2021-07-09 12:07:28 UTC

From the undercloud:
. ~/stackrc
plan=$(openstack stack list  -f value -c 'Stack Name')

tripleo-ansible-inventory \
    --plan "${plan}" \
    --ansible_ssh_user heat-admin \
    --static-yaml-inventory \
    inventory.yaml

ansible -i inventory.yaml 'pacemaker[0]' -b -m shell -a 'podman exec  $(podman ps|awk "/bundle/{print \$NF;exit}" ) rpm -qa |awk "/^pacemaker-[0-9]/"'
controller-0 | CHANGED | rc=0 >>
pacemaker-2.0.3-5.el8_2.4.x86_64

Comment 12 Vlada Grosu 2021-07-14 12:11:43 UTC

Hi Sofer,

I've (slightly) edited and published the Red Hat knowledge base solution: https://access.redhat.com/solutions/6175352

Can you please look over it to double-check that everything looks right?

Many thanks,
Vlada

Comment 14 Vlada Grosu 2021-07-20 14:08:55 UTC

The documentation changes are published on the Customer Portal: 
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/keeping_red_hat_openstack_platform_updated/index#known-issues-that-might-block-an-update-keeping-updated

Thank you

Comment 15 Sofer Athlan-Guyot 2021-07-28 16:15:37 UTC

Hey Vlada,
 
nothing to say on the review it's perfect.

Thanks a lot.

Comment 16 Michele Baldessari 2021-08-04 10:08:44 UTC

Since https://bugzilla.redhat.com/show_bug.cgi?id=1972369 is released and that was the root cause of this all and we have docs for this as well, I am closing this one out.

Comment 18 Luca Miccini 2022-06-23 10:41:18 UTC

*** Bug 2092520 has been marked as a duplicate of this bug. ***

Comment 19 Luca Miccini 2022-06-30 15:16:24 UTC

*** Bug 2100193 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.