Bug 1887606 - rabbitmq pcs resource stuck in stopped after a non-main-vip ip node restart , regression since puddle RHOS-16.1-RHEL-8-20201007.n.0
Summary: rabbitmq pcs resource stuck in stopped after a non-main-vip ip node restart ,...
Keywords:
Status: CLOSED DUPLICATE of bug 1986998
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rabbitmq-server
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Peter Lemenkov
QA Contact: pkomarov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-12 22:26 UTC by pkomarov
Modified: 2022-08-10 10:37 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-19 08:24:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-7209 0 None None None 2022-08-10 10:37:18 UTC

Description pkomarov 2020-10-12 22:26:38 UTC
Description of problem:
rabbitmq pcs resource stuck in stopped after a non-main-vip ip node restart , regression since puddle RHOS-16.1-RHEL-8-20201007.n.0 

Version-Release number of selected component (if applRHOS-16.1-RHEL-8-20201007.n.0icable):


How reproducible:
100%

Steps to Reproduce:
reproducer : 
#find the main-vip node
        . /home/stack/overcloudrc && echo $OS_AUTH_URL | cut -d ':' -f2 | cut -d '/' -f3
#hard reset non main-vip controllers:
       ip a |grep "{{ hostvars['main_vip_uc']['value'] }}" ||
         (sleep 5s && echo b > /proc/sysrq-trigger)


Actual results:
after reboot two rabbitmq pcs resources are stopped

Expected results:
after reboot all rabbitmq pcs resource should be started

Additional info:
test run logs : 
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/Phase3/view/OSP%2016.1/view/PidOne/job/DFG-pidone-sanity-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-ansible-sts-sanity/18/artifact/ansible_sts_results/04_HARD_RESET_CONTROLLER_NON_VIP.log
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/Phase3/view/OSP%2016.1/view/PidOne/job/DFG-pidone-sanity-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-ansible-sts-sanity/17/artifact/ansible_sts_results/04_HARD_RESET_CONTROLLER_NON_VIP.log

logs and files are here : 
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/Phase3/view/OSP%2016.1/view/PidOne/job/DFG-pidone-sanity-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-ansible-sts-sanity/17/artifact/

https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/Phase3/view/OSP%2016.1/view/PidOne/job/DFG-pidone-sanity-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-ansible-sts-sanity/18/artifact/

Comment 1 pkomarov 2020-10-12 22:30:23 UTC
after ~24 min the resource is stuck in this state: 

[0;33m  * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]:[0m
[0;33m    * rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	Starting controller-0[0m
[0;33m    * rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	Stopped controller-1[0m
[0;33m    * rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	Stopped controller-2[0m


Note You need to log in before you can comment on or make changes to this bug.