Bug 1264688

Summary: rabbitmq server failed to start after network upgrade
Product: Red Hat OpenStack Reporter: bigswitch <rhosp-bugs-internal>
Component: rabbitmq-serverAssignee: John Eckersberg <jeckersb>
Status: CLOSED DUPLICATE QA Contact: yeylon <yeylon>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: apevec, dblack, lhh, srevivo, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-09-23 17:49:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description bigswitch 2015-09-20 18:43:05 UTC
Description of problem:
This is a redhat openstack deployment with three controller and two compute nodes. I did a software upgrade on the network, which will impact network connectivity. after the upgrade, I notice rabbitmq-server is no longer running on all three controller. Attempt to restart rabbitmq-server failed, and have to reboot all controller to recover.

[heat-admin@overcloud-controller-1 ~]$ systemctl restart rabbitmq-serversudo 
Job for rabbitmq-server.service failed. See 'systemctl status rabbitmq-server.service' and 'journalctl -xn' for details.
[heat-admin@overcloud-controller-1 ~]$ sudo su
[root@overcloud-controller-1 heat-admin]# systecmctl sttatus rerabbitabibitmq-server
rabbitmq-server.service - RabbitMQ broker
   Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; disabled)
  Drop-In: /etc/systemd/system/rabbitmq-server.service.d
           limits.conf
   Active: failed (Result: exit-code) since Sun 2015-09-20 13:35:07 EDT; 14s ago
  Process: 1673 ExecStop=/usr/lib/rabbitmq/bin/rabbitmqctl stop (code=exited, status=2)
  Process: 1455 ExecStart=/usr/lib/rabbitmq/bin/rabbitmq-server (code=exited, status=1/FAILURE)
 Main PID: 1455 (code=exited, status=1/FAILURE)

Sep 20 13:35:07 overcloud-controller-1.localdomain rabbitmqctl[1673]: attempted to contact: ['rabbit@overcloud-controller-1']
Sep 20 13:35:07 overcloud-controller-1.localdomain rabbitmqctl[1673]: rabbit@overcloud-controller-1:
Sep 20 13:35:07 overcloud-controller-1.localdomain rabbitmqctl[1673]: * connected to epmd (port 4369) on overcloud-controller-1
Sep 20 13:35:07 overcloud-controller-1.localdomain rabbitmqctl[1673]: * epmd reports: node 'rabbit' not running at all
Sep 20 13:35:07 overcloud-controller-1.localdomain rabbitmqctl[1673]: no other nodes on overcloud-controller-1
Sep 20 13:35:07 overcloud-controller-1.localdomain rabbitmqctl[1673]: * suggestion: start the node
Sep 20 13:35:07 overcloud-controller-1.localdomain rabbitmqctl[1673]: current node details:
Sep 20 13:35:07 overcloud-controller-1.localdomain rabbitmqctl[1673]: - node name: 'rabbitmqctl1673@overcloud-controller-1'
Sep 20 13:35:07 overcloud-controller-1.localdomain rabbitmqctl[1673]: - home dir: /var/lib/rabbitmq
Sep 20 13:35:07 overcloud-controller-1.localdomain rabbitmqctl[1673]: - cookie hash: Fj5dxyBiV5yRiwBxWLYhdQ==
Sep 20 13:35:07 overcloud-controller-1.localdomain systemd[1]: rabbitmq-server.service: control process exited, code=exited status=2
Sep 20 13:35:07 overcloud-controller-1.localdomain systemd[1]: Failed to start RabbitMQ broker.
Sep 20 13:35:07 overcloud-controller-1.localdomain systemd[1]: Unit rabbitmq-server.service entered failed state.
[root@overcloud-controller-1 heat-admin]#
Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Did a network upgrade which will cause connectivity issue
2. Notice rabbitmq-server stopped running
3. Unable to restart rabbitmq server

Actual results:


Expected results:


Additional info:

Comment 1 bigswitch 2015-09-20 18:45:35 UTC
sosreport from all three controller is at 

https://bigswitch.box.com/s/05sukvk0jzi5g5rsroyilpq583uy64m6

Comment 3 John Eckersberg 2015-09-23 17:49:05 UTC

*** This bug has been marked as a duplicate of bug 1264083 ***