Bug 1887606
| Summary: | rabbitmq pcs resource stuck in stopped after a non-main-vip ip node restart , regression since puddle RHOS-16.1-RHEL-8-20201007.n.0 | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | pkomarov |
| Component: | rabbitmq-server | Assignee: | Peter Lemenkov <plemenko> |
| Status: | CLOSED DUPLICATE | QA Contact: | pkomarov |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.1 (Train) | CC: | apevec, jeckersb, lhh, lmiccini, michele |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-08-19 08:24:06 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
after ~24 min the resource is stuck in this state: [0;33m * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]:[0m [0;33m * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Starting controller-0[0m [0;33m * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Stopped controller-1[0m [0;33m * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Stopped controller-2[0m |
Description of problem: rabbitmq pcs resource stuck in stopped after a non-main-vip ip node restart , regression since puddle RHOS-16.1-RHEL-8-20201007.n.0 Version-Release number of selected component (if applRHOS-16.1-RHEL-8-20201007.n.0icable): How reproducible: 100% Steps to Reproduce: reproducer : #find the main-vip node . /home/stack/overcloudrc && echo $OS_AUTH_URL | cut -d ':' -f2 | cut -d '/' -f3 #hard reset non main-vip controllers: ip a |grep "{{ hostvars['main_vip_uc']['value'] }}" || (sleep 5s && echo b > /proc/sysrq-trigger) Actual results: after reboot two rabbitmq pcs resources are stopped Expected results: after reboot all rabbitmq pcs resource should be started Additional info: test run logs : https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/Phase3/view/OSP%2016.1/view/PidOne/job/DFG-pidone-sanity-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-ansible-sts-sanity/18/artifact/ansible_sts_results/04_HARD_RESET_CONTROLLER_NON_VIP.log https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/Phase3/view/OSP%2016.1/view/PidOne/job/DFG-pidone-sanity-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-ansible-sts-sanity/17/artifact/ansible_sts_results/04_HARD_RESET_CONTROLLER_NON_VIP.log logs and files are here : https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/Phase3/view/OSP%2016.1/view/PidOne/job/DFG-pidone-sanity-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-ansible-sts-sanity/17/artifact/ https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/Phase3/view/OSP%2016.1/view/PidOne/job/DFG-pidone-sanity-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-ansible-sts-sanity/18/artifact/