Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1715315

Summary: Rabbitmq broker crashed every couple of minutes in OSP14
Product: Red Hat OpenStack Reporter: Chen <cchen>
Component: rabbitmq-serverAssignee: Peter Lemenkov <plemenko>
Status: CLOSED ERRATA QA Contact: pkomarov
Severity: urgent Docs Contact:
Priority: high    
Version: 14.0 (Rocky)CC: apevec, jeckersb, lhh, michele, plemenko
Target Milestone: z4Keywords: Triaged, ZStream
Target Release: 14.0 (Rocky)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rabbitmq-server-3.6.16-4.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-06 16:53:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1699993    
Bug Blocks:    
Attachments:
Description Flags
erl_crash.dump file none

Description Chen 2019-05-30 06:20:51 UTC
Created attachment 1575081 [details]
erl_crash.dump file

Description of problem:

Rabbitmq broker crashed every couple of minutes in OSP14

Version-Release number of selected component (if applicable):

OSP14
14.0-136:pcmklatest

How reproducible:

100% in customer site for all the 3 controller nodes

Steps to Reproduce:
1.
2.
3.

Actual results:

Broker crashed and erl_crash.dump generated	

Expected results:


Additional info:

Comment 1 Chen 2019-06-03 10:40:47 UTC
Hi,

Is there any update for this continous crash issue ?

Best Regards,
Chen

Comment 2 Peter Lemenkov 2019-06-04 07:43:22 UTC
Hello,
We have several other issues which are quite similar. So far our best shot was to increase ARP cache size (see bug 1653242 comment 15 for example). We still investigating this. Meanwhile could you please ask the customer to increase ARP cache to rule out that one.

Comment 3 Chen 2019-06-04 07:52:16 UTC
Hi Peter,

Thank you very much for your reply !

Sure fully understood. I will ask the customer to try to increase ARP cache.

Best Regards,
Chen

Comment 5 Peter Lemenkov 2019-06-14 14:49:53 UTC
See bug 1699993 for the same issue.

Comment 9 Chen 2019-07-31 03:51:59 UTC
Hi Peter,

Thank you very much for your response.

Best Regards,
Chen

Comment 14 pkomarov 2019-10-23 21:55:19 UTC
Verified , 

[stack@undercloud-0 ~]$ rhos-release -L
Installed repositories (rhel-7.7):
  14
  ceph-3
  ceph-osd-3
  rhel-7.7
[stack@undercloud-0 ~]$ cat core_puddle_version 
2019-10-21.1

[root@controller-0 ~]# grep 'Oct 23 14:01:33' -A 99999 /var/log/cluster/corosync.log|grep 'ocf::heartbeat:rabbitmq-cluster'|grep rabbitmq-bundle|grep -v Started||echo 'no rabbit problems were found'
no rabbit problems were found

[root@controller-0 ~]# tail -n 1 /var/log/cluster/corosync.log
Oct 23 21:51:54 [26538] controller-0        cib:     info: cib_process_request:	Completed cib_modify operation for section nodes: OK (rc=0, origin=controller-2/crm_attribute/4, version=0.84.4)

[root@controller-0 ~]# docker exec -it  rabbitmq-bundle-docker-0 bash
()[root@controller-0 /]# rpm -q rabbitmq-server
rabbitmq-server-3.6.16-4.el7ost.noarch

Comment 16 errata-xmlrpc 2019-11-06 16:53:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3747