Bug 1557513

Summary:	[OSP10 UPDATES] RMQ fails to start after minor update and reboot
Product:	Red Hat OpenStack	Reporter:	John Eckersberg <jeckersb>
Component:	puppet-rabbitmq	Assignee:	John Eckersberg <jeckersb>
Status:	CLOSED ERRATA	QA Contact:	pkomarov
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	10.0 (Newton)	CC:	apevec, augol, chjones, jeckersb, jjoyce, jschluet, lhh, mbultel, michele, nlevinki, pkomarov, slinaber, srevivo, tvignaud, yprokule
Target Milestone:	zstream	Keywords:	Reopened, Triaged, ZStream
Target Release:	10.0 (Newton)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	puppet-rabbitmq-5.6.0-4.el7ost	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1533406
Clones:	1557519 (view as bug list)		Environment:
Last Closed:	2018-11-26 18:00:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1533406, 1536064
Bug Blocks:	1557519, 1557522, 1647474, 1647587, 1647593, 1654041, 1654042

Comment 1 Chris Jones 2018-06-08 12:36:51 UTC

OSP11 is now EOL. This was fixed in OSP12.

Comment 2 John Eckersberg 2018-11-06 19:25:07 UTC

Re-opening this, this should be against OSP10 and is still a problem.

Downstream review - https://code.engineering.redhat.com/gerrit/#/c/132928/

Comment 3 John Eckersberg 2018-11-14 15:16:25 UTC

Fixed in puppet-rabbitmq-5.6.0-3.el7ost - https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=800198

Comment 9 pkomarov 2018-11-18 16:10:10 UTC

Verified, 

[stack@undercloud-0 ~]$ openstack stack list
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| ID                                   | Stack Name | Stack Status    | Creation Time        | Updated Time         |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| 47aca0f1-a2cd-4592-86e6-3d6948dde115 | overcloud  | UPDATE_COMPLETE | 2018-11-18T10:29:09Z | 2018-11-18T13:29:34Z |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
[stack@undercloud-0 ~]$ cat core_puddle_version 
2018-11-15.2[stack@undercloud-0 ~]$ rhos-release -L
Installed repositories (rhel-7.6):
  10
  ceph-2
  ceph-osd-2
  rhel-7.6

#test reproducer: 
ansible controller -mshell -b -a'pcs cluster stop --request-timeout=300'
ansible controller -mreboot
ansible controller -mshell -b -a'pcs cluster start --wait=300'

[stack@undercloud-0 ~]$ ansible controller -b -mshell -a'hostname -s;uptime;pcs status'
 [WARNING]: Found both group and host with same name: undercloud

controller-1 | SUCCESS | rc=0 >>
controller-1
 16:05:09 up 3 min,  1 user,  load average: 5.26, 3.80, 1.63
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Sun Nov 18 16:05:09 2018
Last change: Sun Nov 18 14:18:37 2018 by root via cibadmin on controller-0

3 nodes configured
19 resources configured

Online: [ controller-0 controller-1 controller-2 ]

Full list of resources:

 ip-192.168.24.12	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-172.17.1.15	(ocf::heartbeat:IPaddr2):	Started controller-1
 Clone Set: haproxy-clone [haproxy]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: galera-master [galera]
     Masters: [ controller-0 controller-1 controller-2 ]
 ip-10.0.0.106	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-172.17.3.15	(ocf::heartbeat:IPaddr2):	Started controller-2
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ controller-2 ]
     Slaves: [ controller-0 controller-1 ]
 ip-172.17.4.17	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-172.17.1.17	(ocf::heartbeat:IPaddr2):	Started controller-2
 openstack-cinder-volume	(systemd:openstack-cinder-volume):	Started controller-0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

controller-0 | SUCCESS | rc=0 >>
controller-0
 16:05:09 up 3 min,  1 user,  load average: 5.16, 3.56, 1.53
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Sun Nov 18 16:05:09 2018
Last change: Sun Nov 18 14:18:37 2018 by root via cibadmin on controller-0

3 nodes configured
19 resources configured

Online: [ controller-0 controller-1 controller-2 ]

Full list of resources:

 ip-192.168.24.12	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-172.17.1.15	(ocf::heartbeat:IPaddr2):	Started controller-1
 Clone Set: haproxy-clone [haproxy]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: galera-master [galera]
     Masters: [ controller-0 controller-1 controller-2 ]
 ip-10.0.0.106	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-172.17.3.15	(ocf::heartbeat:IPaddr2):	Started controller-2
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ controller-2 ]
     Slaves: [ controller-0 controller-1 ]
 ip-172.17.4.17	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-172.17.1.17	(ocf::heartbeat:IPaddr2):	Started controller-2
 openstack-cinder-volume	(systemd:openstack-cinder-volume):	Started controller-0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

controller-2 | SUCCESS | rc=0 >>
controller-2
 16:05:09 up 4 min,  1 user,  load average: 3.28, 2.69, 1.20
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Sun Nov 18 16:05:10 2018
Last change: Sun Nov 18 14:18:37 2018 by root via cibadmin on controller-0

3 nodes configured
19 resources configured

Online: [ controller-0 controller-1 controller-2 ]

Full list of resources:

 ip-192.168.24.12	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-172.17.1.15	(ocf::heartbeat:IPaddr2):	Started controller-1
 Clone Set: haproxy-clone [haproxy]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: galera-master [galera]
     Masters: [ controller-0 controller-1 controller-2 ]
 ip-10.0.0.106	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-172.17.3.15	(ocf::heartbeat:IPaddr2):	Started controller-2
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ controller-2 ]
     Slaves: [ controller-0 controller-1 ]
 ip-172.17.4.17	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-172.17.1.17	(ocf::heartbeat:IPaddr2):	Started controller-2
 openstack-cinder-volume	(systemd:openstack-cinder-volume):	Started controller-0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


[stack@undercloud-0 ~]$ ansible controller -b -mshell -a'ss -anp | grep 4369'
 [WARNING]: Found both group and host with same name: undercloud

controller-0 | SUCCESS | rc=0 >>
tcp    LISTEN     0      128       *:4369                  *:*                   users:(("epmd",pid=7753,fd=3))
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:55935              
tcp    TIME-WAIT  0      0      127.0.0.1:50839              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:60795              127.0.0.1:4369               
tcp    ESTAB      0      0      127.0.0.1:4369               127.0.0.1:41355               users:(("epmd",pid=7753,fd=5))
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:52420              
tcp    TIME-WAIT  0      0      127.0.0.1:49803              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:33027              
tcp    TIME-WAIT  0      0      127.0.0.1:33871              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:39955              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:49085              
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:45503              
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:33533              
tcp    TIME-WAIT  0      0      127.0.0.1:34434              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:51121              127.0.0.1:4369               
tcp    ESTAB      0      0      127.0.0.1:41355              127.0.0.1:4369                users:(("beam.smp",pid=20843,fd=44))
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:52936              
tcp    TIME-WAIT  0      0      127.0.0.1:46813              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:52773              
tcp    TIME-WAIT  0      0      127.0.0.1:58945              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:46137              
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:44826              
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:60743              
tcp    TIME-WAIT  0      0      127.0.0.1:45234              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:50195              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:38399              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:59794              
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:35890              
tcp    LISTEN     0      128      :::4369                 :::*                   users:(("epmd",pid=7753,fd=4))

controller-2 | SUCCESS | rc=0 >>
tcp    LISTEN     0      128       *:4369                  *:*                   users:(("epmd",pid=10762,fd=3))
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:33546              
tcp    TIME-WAIT  0      0      127.0.0.1:58205              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:58565              127.0.0.1:4369               
tcp    ESTAB      0      0      127.0.0.1:34438              127.0.0.1:4369                users:(("beam.smp",pid=19277,fd=44))
tcp    TIME-WAIT  0      0      127.0.0.1:59929              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:52958              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:35858              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:59041              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:55288              
tcp    TIME-WAIT  0      0      127.0.0.1:39314              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:37688              
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:60427              
tcp    TIME-WAIT  0      0      127.0.0.1:42312              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:35171              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:43145              
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:44075              
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:33722              
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:59552              
tcp    ESTAB      0      0      127.0.0.1:4369               127.0.0.1:34438               users:(("epmd",pid=10762,fd=5))
tcp    TIME-WAIT  0      0      127.0.0.1:35897              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:60203              
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:52215              
tcp    TIME-WAIT  0      0      127.0.0.1:50348              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:33201              
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:32860              
tcp    TIME-WAIT  0      0      127.0.0.1:38523              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:40896              
tcp    LISTEN     0      128      :::4369                 :::*                   users:(("epmd",pid=10762,fd=4))

controller-1 | SUCCESS | rc=0 >>
tcp    LISTEN     0      128       *:4369                  *:*                   users:(("epmd",pid=9725,fd=3))
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:48883              
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:45415              
tcp    ESTAB      0      0      127.0.0.1:4369               127.0.0.1:41776               users:(("epmd",pid=9725,fd=5))
tcp    TIME-WAIT  0      0      127.0.0.1:39204              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:39352              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:51954              
tcp    TIME-WAIT  0      0      127.0.0.1:43407              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:47398              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:48606              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:56653              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:49594              
tcp    TIME-WAIT  0      0      127.0.0.1:53935              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:58469              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:41873              
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:51399              
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:54985              
tcp    TIME-WAIT  0      0      127.0.0.1:41007              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:60205              
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:49194              
tcp    TIME-WAIT  0      0      127.0.0.1:54824              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:40072              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:50862              
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:39001              
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:56765              
tcp    ESTAB      0      0      127.0.0.1:41776              127.0.0.1:4369                users:(("beam.smp",pid=18278,fd=44))
tcp    TIME-WAIT  0      0      127.0.0.1:57691              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:45378              
tcp    LISTEN     0      128      :::4369                 :::*                   users:(("epmd",pid=9725,fd=4))

Comment 10 pkomarov 2018-11-19 17:11:12 UTC

more checks are needed

Comment 13 errata-xmlrpc 2018-11-26 18:00:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3674