Bug 1806248 - In OSP16 HA, one rabbitmq bundle is in stopped after disruption : simultaneous controller hard reset
Summary: In OSP16 HA, one rabbitmq bundle is in stopped after disruption : simultaneou...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rabbitmq-server
Version: 16.0 (Train)
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Peter Lemenkov
QA Contact: pkomarov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-23 09:28 UTC by pkomarov
Modified: 2020-07-19 21:21 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-19 21:21:44 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description pkomarov 2020-02-23 09:28:57 UTC
Description of problem:
In OSP16 HA, one rabbitmq bundle is in stopped after disruption : simultaneous controller hard reset 
system does not recover on it's own 

Version-Release number of selected component (if applicable):
RHOS_TRUNK-16.0-RHEL-8-20200213.n.1

Steps to Reproduce:
1. simultaneous controller hard reset (echo b>/proc/sysrq-trigger)
2. check all pacemaker for rabbitmq resource
3.

Actual results:
one rabbitmq resource is stopped

Expected results:
rabbitmq resource cluster is all up

Additional info:

Comment 1 pkomarov 2020-02-23 10:02:20 UTC
sosreports and all overcloud /var/log at : 
http://rhos-release.virt.bos.redhat.com/log/pkomarov_sosreports/BZ_1806248/

Comment 2 pkomarov 2020-02-23 10:04:31 UTC
Bz discovered by Tobiko: 
tobiko.tests.faults.ha.test_cloud_recovery.RebootNodesTest.test_reboot_controllers_recovery

Tobiko: https://tobiko.readthedocs.io/en/latest/


Note You need to log in before you can comment on or make changes to this bug.