Bug 1985067 - While performing a minor update, the process timedout and it looks like pacemaker can't determine address for bundles
Summary: While performing a minor update, the process timedout and it looks like pacem...
Keywords:
Status: CLOSED DUPLICATE of bug 2015325
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
: ---
Assignee: OSP Team
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-22 18:36 UTC by David Hill
Modified: 2021-11-30 07:55 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-30 07:55:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-6388 0 None None None 2021-11-15 13:03:45 UTC
Red Hat Knowledge Base (Solution) 6209712 0 None None None 2021-07-22 18:37:02 UTC

Description David Hill 2021-07-22 18:36:53 UTC
Description of problem:
While performing a minor update, the process timedout and it looks like pacemaker can't  determine address for bundles:

Jul 19 18:01:58 overcloud-controller-2 crmd[313631]: warning: Input I_ELECTION_DC received in state S_INTEGRATION from do_election_check
Jul 19 18:01:58 overcloud-controller-2 corosync[313564]: [TOTEM ] A new membership (10.10.10.10:7732) was formed. Members left: 1
Jul 19 18:01:58 overcloud-controller-2 corosync[313564]: [QUORUM] Members[2]: 2 3
Jul 19 18:01:58 overcloud-controller-2 corosync[313564]: [MAIN  ] Completed service synchronization, ready to provide service.
Jul 19 18:01:58 overcloud-controller-2 pacemakerd[313576]:  notice: Node overcloud-controller-0 state is now lost
Jul 19 18:01:59 overcloud-controller-2 dnsmasq[180147]: read /var/lib/neutron/dhcp/90292b43-3cd9-4c98-b008-013208e4d9e4/addn_hosts - 4 addresses
Jul 19 18:01:59 overcloud-controller-2 dnsmasq-dhcp[180147]: read /var/lib/neutron/dhcp/90292b43-3cd9-4c98-b008-013208e4d9e4/host
Jul 19 18:01:59 overcloud-controller-2 dnsmasq-dhcp[180147]: read /var/lib/neutron/dhcp/90292b43-3cd9-4c98-b008-013208e4d9e4/opts
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Node galera-bundle-2 state is now member
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Node redis-bundle-0 state is now lost
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]: warning: No reason to expect node redis-bundle-0 to be down
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Stonith/shutdown of redis-bundle-0 not matched
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Node rabbitmq-bundle-1 state is now lost
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]: warning: No reason to expect node rabbitmq-bundle-1 to be down
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Stonith/shutdown of rabbitmq-bundle-1 not matched
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Node galera-bundle-0 state is now lost
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]: warning: No reason to expect node galera-bundle-0 to be down
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Stonith/shutdown of galera-bundle-0 not matched
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Node redis-bundle-2 state is now member
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Node rabbitmq-bundle-0 state is now member
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Node overcloud-controller-0 state is now lost
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]: warning: No reason to expect node 1 to be down
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Stonith/shutdown of overcloud-controller-0 not matched
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]: warning: Blind faith: not fencing unseen nodes
Jul 19 18:02:00 overcloud-controller-2 cib[313626]: warning: A-Sync reply to crmd failed: No message of desired type
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice:  * Start      rabbitmq-bundle-1     ( overcloud-controller-2 )   due to unrunnable rabbitmq-bundle-docker-1 start (blocked)
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice:  * Start      rabbitmq:1            (  rabbitmq-bundle-1 )   due to unrunnable rabbitmq-bundle-docker-1 start (blocked)
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice:  * Start      rabbitmq-bundle-2     ( overcloud-controller-2 )   due to unrunnable rabbitmq-bundle-docker-2 start (blocked)
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice:  * Start      rabbitmq:2            (  rabbitmq-bundle-2 )   due to unrunnable rabbitmq-bundle-docker-2 start (blocked)
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice:  * Start      galera-bundle-0       ( overcloud-controller-1 )   due to unrunnable galera-bundle-docker-0 start (blocked)
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice:  * Start      galera:0              (    galera-bundle-0 )   due to unrunnable galera-bundle-docker-0 start (blocked)
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice:  * Start      redis-bundle-0        ( overcloud-controller-2 )   due to unrunnable redis-bundle-docker-0 start (blocked)
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice:  * Start      redis:0               (     redis-bundle-0 )   due to unrunnable redis-bundle-docker-0 start (blocked)
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:   error: Could not determine address for bundle connection rabbitmq-bundle-1
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:   error: Could not determine address for bundle connection rabbitmq-bundle-2
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:   error: Could not determine address for bundle connection galera-bundle-0
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:   error: Could not determine address for bundle connection redis-bundle-0
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice: Calculated transition 0, saving inputs in /var/lib/pacemaker/pengine/pe-input-2328.bz2



Version-Release number of selected component (if applicable):
pacemaker-libs-1.1.19-8.el7_6.2.x86_64

How reproducible:
This environment

Steps to Reproduce:
1. Minor update timedout
2. bundles are not able to start due to "Could not determine address for bundle connection"
3.

Actual results:
Minor update failure

Expected results:
No failures.

Additional info:

Comment 4 David Hill 2021-07-28 20:09:23 UTC
The resource was banned and running "pcs resource clear rabbitmq-bundle" solved this .   The issue we have now is that this rabbitmq won't join the cluster and I'm wondering at this stage if simply re-starting the minor update procedure would solve this.

Comment 5 Brandon Sawyers 2021-07-28 20:14:52 UTC
Will the update run with pcs not being in a healthy state, though?


Note You need to log in before you can comment on or make changes to this bug.